Contribute dbt-fabric to the Fabric Toolbox#552
Open
sdebruyn wants to merge 2 commits into
Open
Conversation
Single dbt adapter package covering both Microsoft Fabric compute engines: Fabric Data Warehouse (T-SQL via the bundled mssql-python driver) and Fabric Lakehouse (Spark SQL via Livy sessions). Ships adapter source, integration test suite, documentation source, packaging metadata, and contributor guide. See the pull request description for the full feature list, the upstream-issues backlog this addresses, and the architectural rationale for inheriting from dbt-spark on the Lakehouse side and sharing one auth stack across both adapter types.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds
tools/dbt-fabric/: one dbt adapter for both Microsoft Fabric compute engines (Data Warehouse and Lakehouse) in a single Python package.I wrote most of the code that's now in
microsoft/dbt-fabric. When Microsoft adopted the repository I kept maintaining a fork because customers were asking for things the official repo wasn't shipping. That fork (dbt-fabric-samdebruynon PyPI) is what multiple organizations are running in production today.I'm bringing it to the toolbox because the toolbox's multi-contributor model — the Fabric product team, the CAT team, and the community sharing maintenance — fits a dbt adapter better than a single-maintainer setup. The dbt ecosystem moves quickly (new dbt-core minors, community-package releases,
dbt-tests-adapterBase*classes every cycle), and a shared codebase under the toolbox is better positioned to keep up.What this gives users today
One
pip install dbt-fabricand both Fabric engines work. No separatedbt-fabricsparkpackage, no system ODBC driver to install: the bundledmssql-pythondriver handles the Data Warehouse side and ships ODBC Driver 18 + unixODBC inside the wheel.On top of that, a long list of features the official adapters don't ship:
Microsoft Purview integration via API. A
{{ purview_sync() }}macro that pushes model and column documentation, plusref()andsource()lineage, directly into Purview through the REST API.persist_docs-aware: models markedpersist_docs: falseare skipped, granularrelation: true, columns: falseonly syncs what you asked for. No Purview scan configuration needed.Python models on both engines. Standard
model(dbt, spark)API with PySpark on both Data Warehouse and Lakehouse.microsoft/dbt-fabricdoesn't support Python models at all;microsoft/dbt-fabricsparkonly supports them on the Lakehouse.Compatibility with nine community packages, continuously tested. dbt-utils, dbt-date, dbt-codegen, dbt-expectations, dbt-audit-helper, dbt-external-tables, dbt-profiler, dbt-artifacts, and dbt-project-evaluator ship Postgres- or Snowflake-flavoured macros that fail on Fabric. This contribution writes the adapter-specific overrides through dbt's dispatch system and runs an integration test for each package on every PR. Neither official adapter ships overrides or tests for any of these.
And the rest:
{{ create_or_update_fabric_warehouse_snapshot(...) }}) usable fromon-run-start/on-run-end, anypost-hook, ordbt run-operation.dbt-external-tablescompatibility via dispatch, soOPENROWSET-backed files are regularsource()references in the lineage graph.cluster_byas a standard model config.dbt docs generateoutput.TokenCredentialclasses.FabricTokenProvidercovering both adapter types, so the same profile structure works for DW and Lakehouse.Every PR runs against real Fabric, and every release ships after the full integration suite has gone green.
Issues filed upstream
I tried contributing some of these fixes back to
microsoft/dbt-fabricfirst, but the review-to-merge turnaround on PRs was long enough that I couldn't keep momentum that way. The fork picked up what the upstream couldn't absorb at that pace, and the gap has compounded since.The issues and PRs listed below were filed in a single recent pass — a fresh attempt at backporting the bugfixes and smaller refactors now that the gap is documented. Backporting the larger feature work (Python models, the unified profile schema and shared auth stack, the community-package dispatch overrides, the
dbt-sparkinheritance for the Lakehouse adapter) isn't viable yet: it sits on top of dbt-core surface and a consolidatedBaseFabricAdapterthat the official repos don't have. Those backports only become tractable once the upstream baseline is closer to current.microsoft/dbt-fabricvarchar(8000)silently truncates long-text string columns (PR: microsoft/dbt-fabric#385)_make_match_kwargsoverride) (PR: microsoft/dbt-fabric#375)apply_grantsre-issues GRANTs on every run; query misses Entra-principal grants (PR: microsoft/dbt-fabric#376)run_hooksemitscommit;(PR: microsoft/dbt-fabric#392)fabric__create_table_aswraps CTAS inEXEC('...')and manually escapes the query label (PR: microsoft/dbt-fabric#378)get_responsereturns hardcoded"OK", discards cursor messages + statement IDs (PR: microsoft/dbt-fabric#390)FabricAdapter.quote()doesn't escape]— T-SQL injection vector (PR: microsoft/dbt-fabric#400)pyodbcpooling silently disabled; the right fix is landing themssql-pythonPR (PR: microsoft/dbt-fabric#350)fabric__get_use_database_sqlemits invalidUSE [None];fromdrop_schema_named(PR: microsoft/dbt-fabric#396)--full-refreshdrop-then-recreate risks data loss on creation failure (PR: microsoft/dbt-fabric#398)fabric__get_incremental_microbatch_sqlignoresunique_key(PR: microsoft/dbt-fabric#394)fabric__snapshot_merge_sqlUPDATE+INSERT instead of native MERGEdelete_warehouse_snapshotis areturn Truestubapply_labelmacro emits debuglog()on every call; should usequery_headercheck_for_nested_ctemacro parses SQL in Jinja (categorically wrong)login_timeout=getattr(...)is a no-op on every call siteatexit+open()should be a Jinja macrodelete_condition/delete_not_matched_by_sourceonincrementaladd_query'sretryable_exceptions)_TOKENglobal — thread-safety and lifecycle issuesuv,ruff, drop EOL Python)dbt docs generateshould include catalog rowcounts (already delivered by this contribution)microsoft/dbt-fabricspark__exit__methods returnTrue(silent exception swallowing) (PR: microsoft/dbt-fabricspark#193)expires_on = 1845972874(year 2028) inint_testsauth path bypasses all token refresh_getLivySQL:re.DOTALLpassed as positionalcountcaps comment-stripping at 16 (PR: microsoft/dbt-fabricspark#195)close()lifecycle and usesatexitdbt-sparkinstead of being a standaloneSQLAdapterbotocore/boto3DEBUG logging at module import time (PR: microsoft/dbt-fabricspark#198)_parse_retry_afterduplicated 4× with deprecateddatetime.utcnow()(PR: microsoft/dbt-fabricspark#200)The structural cause is that the adapter is staffed as a sideline of a product role, with PyPI ownership on a personal account rather than a Microsoft organisational identity. A few of the issues read as AI-assisted PRs merged without dbt-domain review — a missing-review-step problem, not an AI problem. The recurring pattern across the rest is the same shape: features built next to dbt's conventional mechanisms instead of through them — dispatch sidestepped, profile keys camelCased, hooks reimplemented as
atexit— the kind of mismatch you notice when you use dbt daily. The fix is shared maintenance under the toolbox with organisational package ownership and reviewers who use the adapter.Why this stays maintainable long-term
The maintenance cost of a dbt adapter scales with two things: how much you reimplement that dbt's ecosystem already gives you, and how many private mechanisms you have to keep in sync with dbt-core across releases. The architecture here keeps both close to zero.
The Lakehouse adapter in this contribution inherits from
dbt-spark.dbt-sparkships the Spark materializations, incremental strategies, Spark-aware column type handling, constraint handling, the Spark Python-model API, and aSparkAdapterbase class. Spark-based adapters inherit from it and write only the parts specific to their engine —dbt-databricksdoes this.microsoft/dbt-fabricsparkdoesn't. It's a standaloneSQLAdapter, so every macro, materialization, type rule, incremental strategy, and Python-model path has to be implemented and maintained by hand. Python's multiple inheritance lets the FabricSpark adapter extendSparkAdapterand a sharedBaseFabricAdapterat the same time, getting the dbt-spark machinery and the cross-adapter Fabric code in one class.One auth stack, one Fabric API client, and one Livy session layer across both adapters. Auth, Fabric REST API access, workspace resolution, profile validation, and Livy session handling are the same problem on both engines — the Lakehouse runs everything through Livy, and the Data Warehouse needs the same Livy machinery for Python models.
microsoft/dbt-fabricandmicrosoft/dbt-fabricsparkeach maintain their own token logic, API client, workspace resolution, and profile validation, and they're already out of sync. The DW adapter usesworkspace_id/workspace_name/access_token(snake_case). The Lakehouse adapter usesworkspaceid/lakehouseid/accessToken(camelCase). The default auth method differs (ActiveDirectoryDefaultvsCLI). A user running both can't share a profile structure. Every auth-related change has to be implemented twice.This contribution has one
FabricTokenProvidercovering all 11 auth methods for both adapter types. OneFabricApiClientfor workspaces, warehouses, lakehouses, Livy, and snapshots. One Python-model submission path. One profile schema. A bug fix is a one-place change.Test suite built on
dbt-tests-adapter. Every dbt-core minor ships newBase*test classes that codify what an adapter needs to do to be compatible. Bumpingdbt-tests-adapteris how this adapter picks up coverage for new dbt-core features automatically. About 430 adapter test classes here plus an extra ~110 community-package tests, all running against real Fabric. PRs run on Python 3.13; the full Python 3.11/3.12/3.13 matrix runs weekly.