Skip to content

feat: add MSSQL/Fabric support to data-parity skill#705

Open
suryaiyer95 wants to merge 8 commits intofeat/data-parity-skill-improvementsfrom
feat/mssql-fabric-data-parity
Open

feat: add MSSQL/Fabric support to data-parity skill#705
suryaiyer95 wants to merge 8 commits intofeat/data-parity-skill-improvementsfrom
feat/mssql-fabric-data-parity

Conversation

@suryaiyer95
Copy link
Copy Markdown
Contributor

@suryaiyer95 suryaiyer95 commented Apr 13, 2026

Summary

  • SQL Server driver: full connect/execute/listSchemas/listTables/describeTable/close with T-SQL TOP injection, sys.* catalog queries, and row flattening for unnamed columns
  • Azure AD authentication: 7 flows (default, password, access-token, service-principal-secret, msi-vm, msi-app-service) with shorthand aliases (cli, default, password, service-principal, msi)
  • Dialect mapping: sqlserver/mssqltsql, fabricfabric with DATETRUNC() and CONVERT(DATE, ..., 23) for date partitioning
  • mssql v12 upgrade: ConnectionPool isolation for concurrent connections, unnamed-column array flattening, synthetic column name fallback
  • SKILL.md documentation: Fabric connection config, Azure AD auth types, algorithm behavior, schema inspection queries
  • 28 driver unit tests + 9 dialect mapping tests

Key files

Area Files
Driver packages/drivers/src/sqlserver.ts, packages/drivers/test/sqlserver-unit.test.ts
Dialect mapping packages/opencode/src/altimate/native/connections/data-diff.ts
Tool registration packages/opencode/src/altimate/tools/data-diff.ts
Skill docs .opencode/skills/data-parity/SKILL.md

Test plan

  • bun test packages/drivers/test/sqlserver-unit.test.ts — 28 tests pass
  • bun test packages/opencode/test/altimate/data-diff-dialect.test.ts — 9 tests pass
  • Integration test against real MSSQL/Fabric instance
  • Verify Azure AD auth flows with actual Azure credentials

🤖 Generated with Claude Code


Summary by cubic

Adds end-to-end MSSQL and Microsoft Fabric support to the data-parity skill, including Azure AD auth and dialect mapping so the Rust engine runs diffs correctly without silent truncation. Also upgrades mssql to v12 with isolated pools and correct result shaping.

  • New Features

    • MSSQL/Fabric support in data-parity with data_diff (auto, joindiff, hashdiff, profile, cascade) and partitioning; map warehouse types to Rust dialects (sqlserver/mssqltsql, fabricfabric); use DATETRUNC() and CONVERT(DATE, ..., 23) for date partitions.
    • SQL Server driver with full connect/execute/introspect/close, Azure AD auth (password, default, access-token, service-principal, msi-vm/app-service; shorthands: default, password, service-principal, msi), ConnectionPool isolation, T‑SQL TOP injection with ExecuteOptions.noLimit, and fabric registration + normalize aliases.
  • Bug Fixes

    • Prevent silent diff truncation by allowing noLimit to bypass injected TOP.
    • Correctly detect SQL vs table names (tables named "select"/"with" no longer misclassified).
    • Preserve underscore-prefixed columns and flatten unnamed column arrays to restore positional values.

Written for commit 811c2be. Summary will update on new commits.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 91dade00-da06-468f-beaf-4d1c7ae69897

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/mssql-fabric-data-parity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 30 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:266">
P2: Missing single-quote escaping on `partitionValue` in the date-mode branches. The categorical mode (6 lines above) escapes with `.replace(/'/g, "''")`, but none of the date-mode `switch` cases do. Apply the same escaping for consistency and defense in depth.</violation>

<violation number="2" location="packages/opencode/src/altimate/native/connections/data-diff.ts:489">
P2: `partitionColumn` is not identifier-quoted in `buildPartitionDiscoverySQL`, but it is quoted in `buildPartitionWhereClause`. If the column name is a reserved word (e.g. `date`, `order`), the discovery query will produce a syntax error. Quote the column consistently between both functions.</violation>
</file>

<file name="packages/opencode/src/altimate/native/connections/registry.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/registry.ts:125">
P2: `fabric` is missing from the `PASSWORD_DRIVERS` set. Since it maps to the same sqlserver driver as `sqlserver`/`mssql` (both present in the set), `fabric` should also be included to get the same friendly error when `password` is not a string.</violation>
</file>

<file name="packages/opencode/src/altimate/tools/data-diff.ts">

<violation number="1" location="packages/opencode/src/altimate/tools/data-diff.ts:206">
P2: When `d.values` is nullish, `d.values?.join(" | ")` evaluates to `undefined`, which the template literal coerces to the string `"undefined"`. The output would read e.g. `[source only] undefined`. Use a fallback to produce a sensible message.</violation>
</file>

<file name="packages/drivers/src/sqlserver.ts">

<violation number="1" location="packages/drivers/src/sqlserver.ts:148">
P1: `flattenRow` flattens all array values, but only the empty-string key (`""`) holds mssql's merged unnamed columns. A legitimate array column value (e.g. from JSON aggregation) will be incorrectly spread, corrupting the row data and misaligning columns. Restrict flattening to the `""` key only.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

expr = dateTruncExpr(granularity!, partitionColumn, dialect)
} else {
// categorical — raw distinct values, no transformation
expr = partitionColumn
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: partitionColumn is not identifier-quoted in buildPartitionDiscoverySQL, but it is quoted in buildPartitionWhereClause. If the column name is a reserved word (e.g. date, order), the discovery query will produce a syntax error. Quote the column consistently between both functions.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/native/connections/data-diff.ts, line 489:

<comment>`partitionColumn` is not identifier-quoted in `buildPartitionDiscoverySQL`, but it is quoted in `buildPartitionWhereClause`. If the column name is a reserved word (e.g. `date`, `order`), the discovery query will produce a syntax error. Quote the column consistently between both functions.</comment>

<file context>
@@ -0,0 +1,872 @@
+    expr = dateTruncExpr(granularity!, partitionColumn, dialect)
+  } else {
+    // categorical — raw distinct values, no transformation
+    expr = partitionColumn
+  }
+
</file context>
Fix with Cubic

tableFilter = `table_name = '${esc(parts[0])}'`
}

switch (dialect) {
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Missing single-quote escaping on partitionValue in the date-mode branches. The categorical mode (6 lines above) escapes with .replace(/'/g, "''"), but none of the date-mode switch cases do. Apply the same escaping for consistency and defense in depth.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/native/connections/data-diff.ts, line 266:

<comment>Missing single-quote escaping on `partitionValue` in the date-mode branches. The categorical mode (6 lines above) escapes with `.replace(/'/g, "''")`, but none of the date-mode `switch` cases do. Apply the same escaping for consistency and defense in depth.</comment>

<file context>
@@ -0,0 +1,872 @@
+    tableFilter = `table_name = '${esc(parts[0])}'`
+  }
+
+  switch (dialect) {
+    case "clickhouse":
+      // Returns: name, type, default_type, default_expression, ...
</file context>
Fix with Cubic

lines.push(` Sample differences (first ${Math.min(diffRows.length, 5)}):`)
for (const d of diffRows.slice(0, 5)) {
const label = d.sign === "-" ? "source only" : "target only"
lines.push(` [${label}] ${d.values?.join(" | ")}`)
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: When d.values is nullish, d.values?.join(" | ") evaluates to undefined, which the template literal coerces to the string "undefined". The output would read e.g. [source only] undefined. Use a fallback to produce a sensible message.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/tools/data-diff.ts, line 206:

<comment>When `d.values` is nullish, `d.values?.join(" | ")` evaluates to `undefined`, which the template literal coerces to the string `"undefined"`. The output would read e.g. `[source only] undefined`. Use a fallback to produce a sensible message.</comment>

<file context>
@@ -0,0 +1,257 @@
+      lines.push(`  Sample differences (first ${Math.min(diffRows.length, 5)}):`)
+      for (const d of diffRows.slice(0, 5)) {
+        const label = d.sign === "-" ? "source only" : "target only"
+        lines.push(`    [${label}] ${d.values?.join(" | ")}`)
+      }
+    }
</file context>
Fix with Cubic

suryaiyer95 and others added 8 commits April 13, 2026 10:36
Bun's runtime never fires native addon async callbacks, so the async
`new duckdb.Database(path, opts, callback)` form would hit the 2-second
timeout fallback on every connection attempt.

Switch to the synchronous constructor form `new duckdb.Database(path)` /
`new duckdb.Database(path, opts)` which throws on error and completes
immediately in both Node and bun runtimes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The async callback form with 2s fallback was already working correctly
at e3df5a4. The timeout was caused by a missing duckdb .node binary,
not a bun incompatibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `warehouseTypeToDialect()` mapping: sqlserver→tsql, mssql→tsql,
  fabric→fabric, postgresql→postgres, mariadb→mysql. Fixes critical
  serde mismatch where Rust engine rejects raw warehouse type names.
- Update both `resolveDialect()` functions to use the mapping
- Add MSSQL/Fabric cases to `dateTruncExpr()` — DATETRUNC(DAY, col)
- Add locale-safe date literal casting via CONVERT(DATE, ..., 23)
- Register `fabric` in DRIVER_MAP (reuses sqlserver TDS driver)
- Add `fabric` normalize aliases in normalize.ts
- Add 15 SQL Server driver unit tests (TOP injection, truncation,
  schema introspection, connection lifecycle, result format)
- Add 9 dialect mapping unit tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Support all 7 Azure AD / Entra ID auth types in `sqlserver.ts`:
  `azure-active-directory-password`, `access-token`, `service-principal-secret`,
  `msi-vm`, `msi-app-service`, `azure-active-directory-default`, `token-credential`
- Force TLS encryption for all Azure AD connections
- Dynamic import of `@azure/identity` for `DefaultAzureCredential`
- Add normalize aliases for Azure AD config fields (`authentication`,
  `azure_tenant_id`, `azure_client_id`, `azure_client_secret`, `access_token`)
- Add `fabric: SQLSERVER_ALIASES` to DRIVER_ALIASES
- Add 10 Azure AD unit tests covering all auth flows, encryption,
  and `DefaultAzureCredential` with managed identity

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…LL.md

- Add SQL Server / Fabric schema inspection query in Step 2
- Add "SQL Server and Microsoft Fabric" section with:
  - Supported configurations table (sqlserver, mssql, fabric)
  - Fabric connection guide with Azure AD auth types
  - Algorithm behavior notes (joindiff vs hashdiff selection)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rscore column filter

- **Azure AD auth**: Pass `azure-active-directory-*` types directly to tedious
  instead of constructing `DefaultAzureCredential` ourselves. Tedious imports
  `@azure/identity` internally and creates credentials — avoids bun CJS/ESM
  `isTokenCredential` boundary issue that caused "not an instance of the token
  credential class" errors.
- **Auth shorthands**: Map `CLI`, `default`, `password`, `service-principal`,
  `msi`, `managed-identity` to their full tedious type names.
- **Column filter**: Remove `_.startsWith("_")` filter from `execute()` result
  columns — it stripped legitimate aliases like `_p` used by partition discovery,
  causing partitioned diffs to return empty results.
- **Tests**: Remove `@azure/identity` mock (no longer imported by driver),
  update auth assertions, add shorthand mapping tests, fix column filter test.
- **Verified**: All 97 driver tests pass. Full data-diff pipeline tested against
  real MSSQL server (profile, joindiff, auto, where_clause, partitioned).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lattening

- Upgrade `mssql` from v11 to v12 (`tedious` 18 → 19)
- Use explicit `ConnectionPool` instead of global `mssql.connect()` to
  isolate multiple simultaneous connections
- Flatten unnamed column arrays — `mssql` merges unnamed columns (e.g.
  `SELECT COUNT(*), SUM(...)`) into a single array under the empty-string
  key; restore positional column values
- Proper column name resolution: compare `namedKeys.length` against
  flattened row length, fall back to synthetic `col_0`, `col_1`, etc.
- Update test mock to export `ConnectionPool` class and `createMockPool`

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions

Use ternary expressions (`x ? {...} : {}`) instead of short-circuit
(`x && {...}`) to avoid spreading a boolean value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@suryaiyer95 suryaiyer95 force-pushed the feat/mssql-fabric-data-parity branch from 403477e to 811c2be Compare April 13, 2026 17:36
@anandgupta42
Copy link
Copy Markdown
Contributor

@suryaiyer95 can you address code review comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants