Context
FOCUS 1.4 has a requirement to include ALL native columns from the source data. For multi-cloud ingestion into FinOps hubs (Azure, AWS, GCP, OCI, Alibaba, etc.), this will significantly bloat `Costs_raw` and other `_raw` tables. We can't predefine every possible column from every provider.
Surfaced in: PR #2126 review
Goal
Research options for ingesting arbitrary extra columns from FOCUS-shaped exports without manually predefining every column on the `_raw` table.
Candidate approaches
-
Auto-accept new columns on ingestion — let Kusto's update policy / ingestion mapping create columns dynamically as they appear in incoming parquet. Pros: native columns survive verbatim, queryable directly. Cons: schema drift over time, harder to govern, may break Fabric ingestion.
-
Single `x_NativeColumns` JSON column — collapse all unmapped fields into one `dynamic` column. Pros: bounded schema, easy to add new providers. Cons: less efficient to query than typed columns, requires JSON unpacking in transforms.
-
Hybrid — keep known FOCUS columns typed + an `x_NativeColumns` blob for the rest. Pros: best of both. Cons: most complex transform logic.
Context
FOCUS 1.4 has a requirement to include ALL native columns from the source data. For multi-cloud ingestion into FinOps hubs (Azure, AWS, GCP, OCI, Alibaba, etc.), this will significantly bloat `Costs_raw` and other `_raw` tables. We can't predefine every possible column from every provider.
Surfaced in: PR #2126 review
Goal
Research options for ingesting arbitrary extra columns from FOCUS-shaped exports without manually predefining every column on the `_raw` table.
Candidate approaches
Auto-accept new columns on ingestion — let Kusto's update policy / ingestion mapping create columns dynamically as they appear in incoming parquet. Pros: native columns survive verbatim, queryable directly. Cons: schema drift over time, harder to govern, may break Fabric ingestion.
Single `x_NativeColumns` JSON column — collapse all unmapped fields into one `dynamic` column. Pros: bounded schema, easy to add new providers. Cons: less efficient to query than typed columns, requires JSON unpacking in transforms.
Hybrid — keep known FOCUS columns typed + an `x_NativeColumns` blob for the rest. Pros: best of both. Cons: most complex transform logic.