Skip to content

Research: Handle FOCUS 1.4 native-columns requirement to avoid raw table bloat #2159

@flanakin

Description

@flanakin

Context

FOCUS 1.4 has a requirement to include ALL native columns from the source data. For multi-cloud ingestion into FinOps hubs (Azure, AWS, GCP, OCI, Alibaba, etc.), this will significantly bloat `Costs_raw` and other `_raw` tables. We can't predefine every possible column from every provider.

Surfaced in: PR #2126 review

Goal

Research options for ingesting arbitrary extra columns from FOCUS-shaped exports without manually predefining every column on the `_raw` table.

Candidate approaches

  1. Auto-accept new columns on ingestion — let Kusto's update policy / ingestion mapping create columns dynamically as they appear in incoming parquet. Pros: native columns survive verbatim, queryable directly. Cons: schema drift over time, harder to govern, may break Fabric ingestion.

  2. Single `x_NativeColumns` JSON column — collapse all unmapped fields into one `dynamic` column. Pros: bounded schema, easy to add new providers. Cons: less efficient to query than typed columns, requires JSON unpacking in transforms.

  3. Hybrid — keep known FOCUS columns typed + an `x_NativeColumns` blob for the rest. Pros: best of both. Cons: most complex transform logic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions