feat: add MSSQL/Fabric support to data-parity skill#705
feat: add MSSQL/Fabric support to data-parity skill#705suryaiyer95 wants to merge 8 commits intofeat/data-parity-skill-improvementsfrom
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
5 issues found across 30 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:266">
P2: Missing single-quote escaping on `partitionValue` in the date-mode branches. The categorical mode (6 lines above) escapes with `.replace(/'/g, "''")`, but none of the date-mode `switch` cases do. Apply the same escaping for consistency and defense in depth.</violation>
<violation number="2" location="packages/opencode/src/altimate/native/connections/data-diff.ts:489">
P2: `partitionColumn` is not identifier-quoted in `buildPartitionDiscoverySQL`, but it is quoted in `buildPartitionWhereClause`. If the column name is a reserved word (e.g. `date`, `order`), the discovery query will produce a syntax error. Quote the column consistently between both functions.</violation>
</file>
<file name="packages/opencode/src/altimate/native/connections/registry.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/registry.ts:125">
P2: `fabric` is missing from the `PASSWORD_DRIVERS` set. Since it maps to the same sqlserver driver as `sqlserver`/`mssql` (both present in the set), `fabric` should also be included to get the same friendly error when `password` is not a string.</violation>
</file>
<file name="packages/opencode/src/altimate/tools/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/tools/data-diff.ts:206">
P2: When `d.values` is nullish, `d.values?.join(" | ")` evaluates to `undefined`, which the template literal coerces to the string `"undefined"`. The output would read e.g. `[source only] undefined`. Use a fallback to produce a sensible message.</violation>
</file>
<file name="packages/drivers/src/sqlserver.ts">
<violation number="1" location="packages/drivers/src/sqlserver.ts:148">
P1: `flattenRow` flattens all array values, but only the empty-string key (`""`) holds mssql's merged unnamed columns. A legitimate array column value (e.g. from JSON aggregation) will be incorrectly spread, corrupting the row data and misaligning columns. Restrict flattening to the `""` key only.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| expr = dateTruncExpr(granularity!, partitionColumn, dialect) | ||
| } else { | ||
| // categorical — raw distinct values, no transformation | ||
| expr = partitionColumn |
There was a problem hiding this comment.
P2: partitionColumn is not identifier-quoted in buildPartitionDiscoverySQL, but it is quoted in buildPartitionWhereClause. If the column name is a reserved word (e.g. date, order), the discovery query will produce a syntax error. Quote the column consistently between both functions.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/native/connections/data-diff.ts, line 489:
<comment>`partitionColumn` is not identifier-quoted in `buildPartitionDiscoverySQL`, but it is quoted in `buildPartitionWhereClause`. If the column name is a reserved word (e.g. `date`, `order`), the discovery query will produce a syntax error. Quote the column consistently between both functions.</comment>
<file context>
@@ -0,0 +1,872 @@
+ expr = dateTruncExpr(granularity!, partitionColumn, dialect)
+ } else {
+ // categorical — raw distinct values, no transformation
+ expr = partitionColumn
+ }
+
</file context>
| tableFilter = `table_name = '${esc(parts[0])}'` | ||
| } | ||
|
|
||
| switch (dialect) { |
There was a problem hiding this comment.
P2: Missing single-quote escaping on partitionValue in the date-mode branches. The categorical mode (6 lines above) escapes with .replace(/'/g, "''"), but none of the date-mode switch cases do. Apply the same escaping for consistency and defense in depth.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/native/connections/data-diff.ts, line 266:
<comment>Missing single-quote escaping on `partitionValue` in the date-mode branches. The categorical mode (6 lines above) escapes with `.replace(/'/g, "''")`, but none of the date-mode `switch` cases do. Apply the same escaping for consistency and defense in depth.</comment>
<file context>
@@ -0,0 +1,872 @@
+ tableFilter = `table_name = '${esc(parts[0])}'`
+ }
+
+ switch (dialect) {
+ case "clickhouse":
+ // Returns: name, type, default_type, default_expression, ...
</file context>
| lines.push(` Sample differences (first ${Math.min(diffRows.length, 5)}):`) | ||
| for (const d of diffRows.slice(0, 5)) { | ||
| const label = d.sign === "-" ? "source only" : "target only" | ||
| lines.push(` [${label}] ${d.values?.join(" | ")}`) |
There was a problem hiding this comment.
P2: When d.values is nullish, d.values?.join(" | ") evaluates to undefined, which the template literal coerces to the string "undefined". The output would read e.g. [source only] undefined. Use a fallback to produce a sensible message.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/tools/data-diff.ts, line 206:
<comment>When `d.values` is nullish, `d.values?.join(" | ")` evaluates to `undefined`, which the template literal coerces to the string `"undefined"`. The output would read e.g. `[source only] undefined`. Use a fallback to produce a sensible message.</comment>
<file context>
@@ -0,0 +1,257 @@
+ lines.push(` Sample differences (first ${Math.min(diffRows.length, 5)}):`)
+ for (const d of diffRows.slice(0, 5)) {
+ const label = d.sign === "-" ? "source only" : "target only"
+ lines.push(` [${label}] ${d.values?.join(" | ")}`)
+ }
+ }
</file context>
Bun's runtime never fires native addon async callbacks, so the async `new duckdb.Database(path, opts, callback)` form would hit the 2-second timeout fallback on every connection attempt. Switch to the synchronous constructor form `new duckdb.Database(path)` / `new duckdb.Database(path, opts)` which throws on error and completes immediately in both Node and bun runtimes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The async callback form with 2s fallback was already working correctly at e3df5a4. The timeout was caused by a missing duckdb .node binary, not a bun incompatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `warehouseTypeToDialect()` mapping: sqlserver→tsql, mssql→tsql, fabric→fabric, postgresql→postgres, mariadb→mysql. Fixes critical serde mismatch where Rust engine rejects raw warehouse type names. - Update both `resolveDialect()` functions to use the mapping - Add MSSQL/Fabric cases to `dateTruncExpr()` — DATETRUNC(DAY, col) - Add locale-safe date literal casting via CONVERT(DATE, ..., 23) - Register `fabric` in DRIVER_MAP (reuses sqlserver TDS driver) - Add `fabric` normalize aliases in normalize.ts - Add 15 SQL Server driver unit tests (TOP injection, truncation, schema introspection, connection lifecycle, result format) - Add 9 dialect mapping unit tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Support all 7 Azure AD / Entra ID auth types in `sqlserver.ts`: `azure-active-directory-password`, `access-token`, `service-principal-secret`, `msi-vm`, `msi-app-service`, `azure-active-directory-default`, `token-credential` - Force TLS encryption for all Azure AD connections - Dynamic import of `@azure/identity` for `DefaultAzureCredential` - Add normalize aliases for Azure AD config fields (`authentication`, `azure_tenant_id`, `azure_client_id`, `azure_client_secret`, `access_token`) - Add `fabric: SQLSERVER_ALIASES` to DRIVER_ALIASES - Add 10 Azure AD unit tests covering all auth flows, encryption, and `DefaultAzureCredential` with managed identity Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…LL.md - Add SQL Server / Fabric schema inspection query in Step 2 - Add "SQL Server and Microsoft Fabric" section with: - Supported configurations table (sqlserver, mssql, fabric) - Fabric connection guide with Azure AD auth types - Algorithm behavior notes (joindiff vs hashdiff selection) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rscore column filter
- **Azure AD auth**: Pass `azure-active-directory-*` types directly to tedious
instead of constructing `DefaultAzureCredential` ourselves. Tedious imports
`@azure/identity` internally and creates credentials — avoids bun CJS/ESM
`isTokenCredential` boundary issue that caused "not an instance of the token
credential class" errors.
- **Auth shorthands**: Map `CLI`, `default`, `password`, `service-principal`,
`msi`, `managed-identity` to their full tedious type names.
- **Column filter**: Remove `_.startsWith("_")` filter from `execute()` result
columns — it stripped legitimate aliases like `_p` used by partition discovery,
causing partitioned diffs to return empty results.
- **Tests**: Remove `@azure/identity` mock (no longer imported by driver),
update auth assertions, add shorthand mapping tests, fix column filter test.
- **Verified**: All 97 driver tests pass. Full data-diff pipeline tested against
real MSSQL server (profile, joindiff, auto, where_clause, partitioned).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lattening - Upgrade `mssql` from v11 to v12 (`tedious` 18 → 19) - Use explicit `ConnectionPool` instead of global `mssql.connect()` to isolate multiple simultaneous connections - Flatten unnamed column arrays — `mssql` merges unnamed columns (e.g. `SELECT COUNT(*), SUM(...)`) into a single array under the empty-string key; restore positional column values - Proper column name resolution: compare `namedKeys.length` against flattened row length, fall back to synthetic `col_0`, `col_1`, etc. - Update test mock to export `ConnectionPool` class and `createMockPool` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions
Use ternary expressions (`x ? {...} : {}`) instead of short-circuit
(`x && {...}`) to avoid spreading a boolean value.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
403477e to
811c2be
Compare
|
@suryaiyer95 can you address code review comments? |
Summary
connect/execute/listSchemas/listTables/describeTable/closewith T-SQLTOPinjection,sys.*catalog queries, and row flattening for unnamed columnsdefault,password,access-token,service-principal-secret,msi-vm,msi-app-service) with shorthand aliases (cli,default,password,service-principal,msi)sqlserver/mssql→tsql,fabric→fabricwithDATETRUNC()andCONVERT(DATE, ..., 23)for date partitioningConnectionPoolisolation for concurrent connections, unnamed-column array flattening, synthetic column name fallbackKey files
packages/drivers/src/sqlserver.ts,packages/drivers/test/sqlserver-unit.test.tspackages/opencode/src/altimate/native/connections/data-diff.tspackages/opencode/src/altimate/tools/data-diff.ts.opencode/skills/data-parity/SKILL.mdTest plan
bun test packages/drivers/test/sqlserver-unit.test.ts— 28 tests passbun test packages/opencode/test/altimate/data-diff-dialect.test.ts— 9 tests pass🤖 Generated with Claude Code
Summary by cubic
Adds end-to-end MSSQL and Microsoft Fabric support to the data-parity skill, including Azure AD auth and dialect mapping so the Rust engine runs diffs correctly without silent truncation. Also upgrades
mssqlto v12 with isolated pools and correct result shaping.New Features
data_diff(auto, joindiff, hashdiff, profile, cascade) and partitioning; map warehouse types to Rust dialects (sqlserver/mssql→tsql,fabric→fabric); useDATETRUNC()andCONVERT(DATE, ..., 23)for date partitions.default,password,service-principal,msi),ConnectionPoolisolation, T‑SQLTOPinjection withExecuteOptions.noLimit, andfabricregistration + normalize aliases.Bug Fixes
noLimitto bypass injectedTOP.Written for commit 811c2be. Summary will update on new commits.