Skip to content

Latest commit

 

History

History
47 lines (38 loc) · 4.17 KB

File metadata and controls

47 lines (38 loc) · 4.17 KB

DocumentVersionMetadata

Schema for document_version.system_metadata JSONB field. Tracks S3 URLs for generated artifacts, pipeline execution state, and document statistics. Convention-based paths (images, page screenshots) are derived from document_id/document_version_id via s3_paths helpers, using a flat S3 layout: documents/{document_id}/{document_version_id}/... Internal conversion artifact paths (standard_pipeline_json_s3, high_accuracy_*_s3) are excluded from API responses via Field(exclude=True) so we don't expose underlying technology names to external consumers.

Properties

Name Type Description Notes
source_s3 str S3 URL to the source document (set by API on upload) [optional]
cleaned_source_s3 str S3 URL to watermark-removed source document [optional]
fast_plaintext_s3 str S3 URL to the fast plaintext export of the document [optional]
hash str Base64-encoded SHA256 hash of the uploaded source file [optional]
pipeline_state PipelineState Current state of the ingestion pipeline workflow [optional]
total_pages int Total number of pages in the document [optional]
total_sections int Total number of sections created [optional]
total_chunks int Total number of chunks created [optional]
total_formulas int Total formula cells in the workbook (XLSX only) [optional]
xlsx_parse_result_s3 str S3 URI to the full XLSX parse result JSON containing dependency graph, named ranges, and KPI catalog [optional]
xlsx_named_ranges List[Dict[str, object]] Named ranges defined in the workbook (name, ref_string, scope) [optional]
xlsx_kpi_catalog List[Dict[str, object]] KPI (Key Performance Indicator) cells detected by the XLSX parser. Each entry contains a label, computed value, cell address, and driver cell references. Applicable to financial models and operational spreadsheets; not populated for template spreadsheets that lack computed KPI cells. [optional]
citation_anchors List[XlsxCellAnchorOutputOrDocxParagraphAnchorOutput] In-file citation anchors for agent-generated .xlsx/.docx deliverables. Each anchor binds an in-file location (cell or paragraph) to the chunk IDs cited there. Populated by save_document during upload; ``null`` for versions ingested before this field shipped or for files re-uploaded outside the agent flow. FE enriches chunks via /v1/chunks/bulk. [optional]
information_statistics InformationStatistics Aggregate statistics for the document version (tokens, chunk counts, depth) [optional]
quota_charged bool True once the conversion activity successfully consumed PAGE quota [optional] [default to False]
quota_page_count int Page quantity charged at conversion start; 0 if not yet charged [optional] [default to 0]
quota_idempotency_key str Stable consume key (matches workflow_id); 'UNSET' for pre-Phase-2 docs so refund logic short-circuits [optional] [default to 'UNSET']
file_md5 str MD5 of source bytes; 'UNSET' for pre-Phase-2 docs, real hex digest after first prep run [optional] [default to 'UNSET']

Example

from ksapi.models.document_version_metadata import DocumentVersionMetadata

# TODO update the JSON string below
json = "{}"
# create an instance of DocumentVersionMetadata from a JSON string
document_version_metadata_instance = DocumentVersionMetadata.from_json(json)
# print the JSON string representation of the object
print(DocumentVersionMetadata.to_json())

# convert the object into a dict
document_version_metadata_dict = document_version_metadata_instance.to_dict()
# create an instance of DocumentVersionMetadata from a dict
document_version_metadata_from_dict = DocumentVersionMetadata.from_dict(document_version_metadata_dict)

[Back to Model list] [Back to API list] [Back to README]