Skip to content

Contribute dbt-fabric to the Fabric Toolbox#552

Open
sdebruyn wants to merge 2 commits into
microsoft:mainfrom
sdebruyn:add-dbt-fabric
Open

Contribute dbt-fabric to the Fabric Toolbox#552
sdebruyn wants to merge 2 commits into
microsoft:mainfrom
sdebruyn:add-dbt-fabric

Conversation

@sdebruyn
Copy link
Copy Markdown

@sdebruyn sdebruyn commented May 19, 2026

This PR adds tools/dbt-fabric/: one dbt adapter for both Microsoft Fabric compute engines (Data Warehouse and Lakehouse) in a single Python package.

I wrote most of the code that's now in microsoft/dbt-fabric. When Microsoft adopted the repository I kept maintaining a fork because customers were asking for things the official repo wasn't shipping. That fork (dbt-fabric-samdebruyn on PyPI) is what multiple organizations are running in production today.

I'm bringing it to the toolbox because the toolbox's multi-contributor model — the Fabric product team, the CAT team, and the community sharing maintenance — fits a dbt adapter better than a single-maintainer setup. The dbt ecosystem moves quickly (new dbt-core minors, community-package releases, dbt-tests-adapter Base* classes every cycle), and a shared codebase under the toolbox is better positioned to keep up.


What this gives users today

One pip install dbt-fabric and both Fabric engines work. No separate dbt-fabricspark package, no system ODBC driver to install: the bundled mssql-python driver handles the Data Warehouse side and ships ODBC Driver 18 + unixODBC inside the wheel.

On top of that, a long list of features the official adapters don't ship:

Microsoft Purview integration via API. A {{ purview_sync() }} macro that pushes model and column documentation, plus ref() and source() lineage, directly into Purview through the REST API. persist_docs-aware: models marked persist_docs: false are skipped, granular relation: true, columns: false only syncs what you asked for. No Purview scan configuration needed.

Python models on both engines. Standard model(dbt, spark) API with PySpark on both Data Warehouse and Lakehouse. microsoft/dbt-fabric doesn't support Python models at all; microsoft/dbt-fabricspark only supports them on the Lakehouse.

Compatibility with nine community packages, continuously tested. dbt-utils, dbt-date, dbt-codegen, dbt-expectations, dbt-audit-helper, dbt-external-tables, dbt-profiler, dbt-artifacts, and dbt-project-evaluator ship Postgres- or Snowflake-flavoured macros that fail on Fabric. This contribution writes the adapter-specific overrides through dbt's dispatch system and runs an integration test for each package on every PR. Neither official adapter ships overrides or tests for any of these.

And the rest:

Every PR runs against real Fabric, and every release ships after the full integration suite has gone green.


Issues filed upstream

I tried contributing some of these fixes back to microsoft/dbt-fabric first, but the review-to-merge turnaround on PRs was long enough that I couldn't keep momentum that way. The fork picked up what the upstream couldn't absorb at that pace, and the gap has compounded since.

The issues and PRs listed below were filed in a single recent pass — a fresh attempt at backporting the bugfixes and smaller refactors now that the gap is documented. Backporting the larger feature work (Python models, the unified profile schema and shared auth stack, the community-package dispatch overrides, the dbt-spark inheritance for the Lakehouse adapter) isn't viable yet: it sits on top of dbt-core surface and a consolidated BaseFabricAdapter that the official repos don't have. Those backports only become tractable once the upstream baseline is closer to current.

microsoft/dbt-fabric

microsoft/dbt-fabricspark

The structural cause is that the adapter is staffed as a sideline of a product role, with PyPI ownership on a personal account rather than a Microsoft organisational identity. A few of the issues read as AI-assisted PRs merged without dbt-domain review — a missing-review-step problem, not an AI problem. The recurring pattern across the rest is the same shape: features built next to dbt's conventional mechanisms instead of through them — dispatch sidestepped, profile keys camelCased, hooks reimplemented as atexit — the kind of mismatch you notice when you use dbt daily. The fix is shared maintenance under the toolbox with organisational package ownership and reviewers who use the adapter.


Why this stays maintainable long-term

The maintenance cost of a dbt adapter scales with two things: how much you reimplement that dbt's ecosystem already gives you, and how many private mechanisms you have to keep in sync with dbt-core across releases. The architecture here keeps both close to zero.

The Lakehouse adapter in this contribution inherits from dbt-spark. dbt-spark ships the Spark materializations, incremental strategies, Spark-aware column type handling, constraint handling, the Spark Python-model API, and a SparkAdapter base class. Spark-based adapters inherit from it and write only the parts specific to their engine — dbt-databricks does this.

microsoft/dbt-fabricspark doesn't. It's a standalone SQLAdapter, so every macro, materialization, type rule, incremental strategy, and Python-model path has to be implemented and maintained by hand. Python's multiple inheritance lets the FabricSpark adapter extend SparkAdapter and a shared BaseFabricAdapter at the same time, getting the dbt-spark machinery and the cross-adapter Fabric code in one class.

One auth stack, one Fabric API client, and one Livy session layer across both adapters. Auth, Fabric REST API access, workspace resolution, profile validation, and Livy session handling are the same problem on both engines — the Lakehouse runs everything through Livy, and the Data Warehouse needs the same Livy machinery for Python models.

microsoft/dbt-fabric and microsoft/dbt-fabricspark each maintain their own token logic, API client, workspace resolution, and profile validation, and they're already out of sync. The DW adapter uses workspace_id / workspace_name / access_token (snake_case). The Lakehouse adapter uses workspaceid / lakehouseid / accessToken (camelCase). The default auth method differs (ActiveDirectoryDefault vs CLI). A user running both can't share a profile structure. Every auth-related change has to be implemented twice.

This contribution has one FabricTokenProvider covering all 11 auth methods for both adapter types. One FabricApiClient for workspaces, warehouses, lakehouses, Livy, and snapshots. One Python-model submission path. One profile schema. A bug fix is a one-place change.

Test suite built on dbt-tests-adapter. Every dbt-core minor ships new Base* test classes that codify what an adapter needs to do to be compatible. Bumping dbt-tests-adapter is how this adapter picks up coverage for new dbt-core features automatically. About 430 adapter test classes here plus an extra ~110 community-package tests, all running against real Fabric. PRs run on Python 3.13; the full Python 3.11/3.12/3.13 matrix runs weekly.

Single dbt adapter package covering both Microsoft Fabric compute
engines: Fabric Data Warehouse (T-SQL via the bundled mssql-python
driver) and Fabric Lakehouse (Spark SQL via Livy sessions). Ships
adapter source, integration test suite, documentation source,
packaging metadata, and contributor guide.

See the pull request description for the full feature list, the
upstream-issues backlog this addresses, and the architectural
rationale for inheriting from dbt-spark on the Lakehouse side and
sharing one auth stack across both adapter types.
@sdebruyn sdebruyn changed the title Add dbt-fabric adapter under tools/dbt-fabric/ Contribute dbt-fabric to the Fabric Toolbox May 19, 2026
@sdebruyn sdebruyn marked this pull request as ready for review May 19, 2026 09:12
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant