Skip to content

feat(warehouse): Phase 2 — BigQuery connection management#395

Open
Shrotriya-lalit wants to merge 11 commits into
Openpanel-dev:mainfrom
Shrotriya-lalit:feat/bigquery-connector-phase2
Open

feat(warehouse): Phase 2 — BigQuery connection management#395
Shrotriya-lalit wants to merge 11 commits into
Openpanel-dev:mainfrom
Shrotriya-lalit:feat/bigquery-connector-phase2

Conversation

@Shrotriya-lalit

@Shrotriya-lalit Shrotriya-lalit commented Jun 15, 2026

Copy link
Copy Markdown

Stacks on PR #391 (Phase 1 — schema, types, validation). Review Phase 2-only changes at:
feat/bigquery-connector-phase1..feat/bigquery-connector-phase2
After #391 merges, this PR's diff will automatically show only Phase 2 changes.

Summary

Delivers the full connection lifecycle for the BigQuery warehouse connector. Users can connect a GCP Service Account, test connectivity, rotate credentials, and disconnect — all from a new Warehouse Sources settings tab.

  • 8 tRPC procedures: listConnections, connect, testConnection, updateConnection, disconnect, listDatasets, listTables, getTableSchema
  • packages/db/src/warehouse.ts: withTimeout, buildBigQueryClient, getWarehouseClient, and dataset/table/schema helpers (all with 10 s GCP timeouts, timer-leak-free)
  • Frontend: AddConnectionDialog with zodResolver client-side validation, RotateKeyDialog, TestStatusBadge, provider type badge via PROVIDER_LABELS, skeleton loading, error state, disconnect confirmation with live sync count
  • Security: serviceAccountJson added to logger redact patterns (private key never written to logs); rate limits on connect/updateConnection (5/min) and testConnection (10/min); listConnections bounded at take: 100; running-sync guard in disconnect blocks mid-run cascade deletes
  • Schema: cancelled added to WarehouseSyncRunStatus enum (migration 20260615000001, own transaction)
  • Tests: vitest suites for mapBigQueryError and all Zod warehouse validators

Design rules enforced

  • Credentials tested against live GCP before any DB write (fail fast, no orphaned rows)
  • Every mutation: getProjectAccessfindUniqueprojectId match (3-step ownership)
  • configEncrypted never returned in any tRPC response — only safe display fields selected
  • gcpRegion preserved on key rotation (merged from existing encrypted config if not supplied)
  • Provider type change and GCP project ID change blocked on updateConnection
  • Idempotent disconnect (catches P2025 race)

Test plan

  • Valid SA JSON → connection saved, displayEmail shown on card
  • Invalid JSON / wrong type field → Zod error before API call
  • Valid JSON, missing BigQuery permission → mapBigQueryError message shown, no DB row
  • testConnection on revoked SA → card shows red ✗ + error message
  • Rotate key → existing syncs preserved, new displayEmail shown
  • Attempt GCP project ID change via rotate form → blocked with clear error
  • Disconnect with 2 syncs → dialog warns "2 syncs will be removed"
  • Double-click Disconnect → second call returns success (idempotent)
  • connect 6× in 60 s → 6th returns 429 rate limit error
  • disconnect while sync running → blocked with "A sync is currently running" message

Summary by CodeRabbit

  • New Features
    • Added a “Warehouse Sources” tab to project settings for managing data warehouse connections
    • Enabled BigQuery connection management: add, test, rotate service account key, and disconnect
    • Added connection status tracking with clear “never tested”, success/failure badges, and relative “tested X ago” timing
    • Introduced warehouse exploration capabilities (datasets, tables, and schema lookup) for the warehouse connections experience

Phase 1 of the BigQuery warehouse connector. Adds Prisma models for
BigQueryConnection, BigQuerySync and BigQuerySyncRun with four supporting
enums, two Prisma migrations, typed JSON column via PrismaJson namespace,
Zod schemas (zBigQuerySyncConfig + column mapping variants) in
@openpanel/validation, and the @google-cloud/bigquery dependency.
- Add zBqColRef validator for column references (supports dot-notation
  for STRUCT nested fields like user.profile.email)
- Add mappingType discriminator to both mapping schemas so the union is
  discriminated and TypeScript can narrow the type cleanly
- Add superRefine cross-validation: append mode events syncs must declare
  an insertTime column (the TIMESTAMP cursor)
- Update plan with verified BigQuery Node.js client type mappings:
  INT64 needs wrapIntegers:true, TIMESTAMP/.value not .toISOString(),
  DATETIME has no timezone, BYTES→Buffer, Big for NUMERIC/BIGNUMERIC
Real-world orgs connect multiple data sources to one project (e.g.
jm-ebg and jm-ebg-cdp on the same ROAS project). The original
@unique on projectId wrongly enforced a single connection per project.

- Remove @unique from BigQueryConnection.projectId
- Add name String field (user label, e.g. "CDP Source")
- Replace with @@unique([projectId, name]) — names unique within project
- Change Project.bigQueryConnection? → bigQueryConnections[]
Schema additions:
- BigQueryConnection: gcpRegion (GDPR region compliance), lastTestedAt/lastTestStatus (connection health)
- BigQuerySync: lastSyncStatus typed as enum (was String?), errorRetryCount+isErrorPaused circuit breaker, partitionFilter for cost-safe full-refresh on partitioned tables
- BigQuerySyncRun: rowCount BigInt (INT max ~2.1B insufficient), bytesProcessed for cost tracking

Zod additions:
- zGcpProjectId: GCP project ID format regex (rejects project numbers and display names)
- zBqIdentifier: dataset/table name validator (no hyphens, per BQ naming rules)
- zServiceAccountJson: SA JSON structure check (rejects authorized_user creds before encryption)
- zBigQueryConnectionCreate: connection creation schema with name/region/SA JSON
- zBigQuerySyncConfig: dataset/tableName now use zBqIdentifier, partitionFilter field added
…d dependency upgrade

- Upgrade @google-cloud/bigquery from ^7.9.1 to ^8.3.1 (latest stable)
- Add FK bigquery_sync_runs.projectId -> projects(id) (was missing, allowing orphan runs)
- Add composite FK bigquery_syncs(projectId, connectionId) -> bigquery_connections(projectId, id)
  to prevent cross-tenant data: a sync can no longer reference a connection from a different project
- Add UNIQUE INDEX bigquery_connections(projectId, id) to back the composite FK
- Add CHECK(char_length(name) > 0) on bigquery_connections to enforce non-empty names at DB level
- Backfill any dev rows with empty name using concat('connection_', id)
…Zod validators

Adds the foundational schema for a multi-provider Data Warehouse Connector
(BigQuery first, extensible to Snowflake/Redshift/Databricks/Postgres).

Three shared Prisma tables:
- warehouse_connections: one row per named connection, any provider
- warehouse_syncs: one sync job per connection
- warehouse_sync_runs: one run record per execution

Key design decisions:
- configEncrypted (AES-256-GCM) + displayIdentifier/displayEmail for
  UI display without decryption
- Composite FK warehouse_syncs(projectId,connectionId) →
  warehouse_connections(projectId,id) blocks cross-tenant exploit at DB level
- Three sync modes: append (cursor), full (reload + stale cleanup), onetime (backfill)
- cursorOverlapMinutes (default 10) rewinds append cursor to catch late-arriving rows
- syncDelayMinutes (default 0) delays cron execution to allow BQ pipelines to land
- Performance indexes on all FK columns (PostgreSQL does not create these automatically)
- jsonProperties column mapping flattens a JSON column into event/profile properties

Validation (packages/validation/src/index.ts):
- zBigQueryWarehouseConfig + zWarehouseConfig discriminated union
- zBigQuerySyncConfig with superRefine cross-field rules:
  schedule required for append/full, insertTime required for append,
  eventName/eventNameStatic mutually exclusive

Migrations applied: 20260610115042–20260610115046
21/21 Zod validator probes pass. Zero schema drift (prisma migrate diff).
…v#391

- Rename PrismaJson type alias IWarehouseColumnMapping →
  IPrismaWarehouseColumnMapping to match the established IPrisma* prefix
  convention used by all other types in the namespace
- Update schema.prisma annotation to match (/// [IPrismaWarehouseColumnMapping])
- Add @@unique([id, projectId]) on WarehouseSync to back a composite FK
- Replace single-column syncId FK on WarehouseSyncRun with composite
  (syncId, projectId) → warehouse_syncs(id, projectId) to enforce that
  a run's projectId always matches its parent sync's projectId — closes
  the same cross-tenant gap that the existing composite FK on WarehouseSync
  already closes one level up
Before adding the (syncId, projectId) composite FK, normalize any runs
whose projectId doesn't match their parent sync's projectId. Without this
the ADD CONSTRAINT fails if mismatched rows exist. Tables are dev-only now
so this is a no-op, but makes the migration safe to apply to any environment.
Validation fixes (packages/validation/src/index.ts):
- Gap 1: reject whitespace-only connection names (Zod .refine + btrim DB check)
- Gap 2: require createdAt cursor for profiles in append mode (superRefine)
- Gap 4: block SQL injection tokens in partitionFilter (-- /* */ ;)
- Gap B: export IBigQueryWarehouseConfig named type

Schema + migrations:
- migration 20260611000001: tighten name_nonempty_check to char_length(btrim(name)) > 0
- migration 20260611000002: add lastTestError String? to warehouse_connections
  so testConnection and connect can surface a human-readable failure reason
  (permission denied, project not found, key revoked) alongside the boolean status

Regenerated Prisma client to include lastTestError field.
Adds the full connection lifecycle for the warehouse connector:
connect, test, rotate credentials, disconnect. Ships a settings tab
so users can manage connections from the dashboard.

Key implementation details:
- tRPC router (8 procedures): listConnections, connect, testConnection,
  updateConnection, disconnect, listDatasets, listTables, getTableSchema
- warehouse.ts data service: withTimeout, buildBigQueryClient,
  getWarehouseClient, list* helpers — all with 10 s BQ timeouts
- Credentials tested against live GCP before any DB write (fail fast)
- gcpRegion preserved on key rotation (merged from existing encrypted config)
- Type-change and project-ID-change blocked on updateConnection
- Idempotent disconnect (catches P2002/P2025 races)
- Running-sync guard in disconnect — blocks mid-run cascade delete
- Rate limiting: connect/updateConnection max 5/min, testConnection max 10/min
- listConnections bounded at take: 100
- serviceAccountJson added to logger redact patterns (private key never logged)
- cancelled added to WarehouseSyncRunStatus enum (migration 20260615000001)
- Frontend: AddConnectionDialog with zodResolver client-side validation,
  RotateKeyDialog, TestStatusBadge, provider type badge via PROVIDER_LABELS,
  error state, skeleton loading, disconnect confirmation with sync count
- Vitest suites for mapBigQueryError, zWarehouseConfig, and validation schemas
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a complete BigQuery warehouse connector to the platform. Introduces Zod validation schemas for service account credentials, warehouse configs, and sync settings; a Prisma schema and 14 incremental migrations building and hardening warehouse_connections, warehouse_syncs, and warehouse_sync_runs tables; a BigQuery data service layer with client initialization, dataset/table enumeration, and schema retrieval; a tRPC warehouseRouter for connection lifecycle management; and a Warehouse Sources settings UI tab with connection creation, testing, key rotation, and disconnection.

Changes

Warehouse Connector Feature (BigQuery Phase 1)

Layer / File(s) Summary
Validation schemas and types
packages/validation/src/index.ts, packages/validation/src/warehouse.test.ts, packages/db/src/types.ts
Adds Zod validators for GCP project IDs, BigQuery identifiers, service account JSON structure with required fields (private_key, client_email, project_id), zWarehouseConfig discriminated union, named connection creation with XSS/SQL-injection sanitization, event/profile column mappings using dot-notation paths, and zBigQuerySyncConfig with cross-field superRefine constraints (schedule requirements by mode, cursor columns by mapping type, partition-filter comment/injection rejection). Exports all inferred TypeScript types and registers IPrismaWarehouseColumnMapping in the global PrismaJson namespace. Full test coverage for valid acceptance and invalid rejection cases.
Prisma schema and migrations
packages/db/prisma/schema.prisma, packages/db/prisma/migrations/*, packages/db/package.json
Extends the Prisma schema with WarehouseConnection, WarehouseSync, and WarehouseSyncRun models; five enums for warehouse type, sync mapping/mode/schedule, and run status; warehouse relations on Project; and a description field on EventMeta. Fourteen SQL migrations progressively build and harden the warehouse tables: initial BigQuery tables with enums and FKs, multi-connection per-project naming, production hardening (gcpRegion, health tracking, retry fields), referential integrity (composite FKs for tenant isolation), provider-agnostic restructure (drop BigQuery tables, create generic warehouse schema), security FK enforcement, performance indexes, failure counting, onetime sync mode and nullable schedule, cursor overlap/delay columns, composite sync-run FK, whitespace-trimmed name validation, last-test error storage, and cancelled run status. Adds @google-cloud/bigquery ^8.3.1 dependency.
BigQuery data service layer
packages/db/src/warehouse.ts, packages/db/src/warehouse.test.ts, packages/db/index.ts, packages/logger/index.ts
Implements withTimeout promise wrapper with timer cleanup, mapBigQueryError to convert GCP error messages to user-facing strings (permission, not found, timeout, connectivity, etc.), buildBigQueryClient to construct authenticated BigQuery clients from validated config, getWarehouseClient to fetch/verify/decrypt/validate connections from Prisma, listWarehouseDatasets and listWarehouseTables for dataset/table enumeration with timeouts, and getWarehouseTableSchema to fetch and normalize table field metadata. Exports the module from the packages/db barrel. Adds serviceaccountjson to the logger's sensitive-key redaction list. Tests cover BigQuery error message classification and gcpRegion preservation through config round-trips during key rotation.
tRPC warehouseRouter
packages/trpc/src/routers/warehouse.ts, packages/trpc/src/root.ts
Implements assertConnectionOwnership helper; listConnections query with pagination and sync counts; connect mutation with rate-limiting, pre-persistence credential verification via timed dataset listing, service-account JSON parsing to extract displayEmail, config encryption, and duplicate-name error mapping; testConnection with success/failure last-test field updates and error persistence; updateConnection with immutability guards (provider type, GCP project ID for BigQuery), conditional gcpRegion merge from existing config, credential re-verification, and display field updates; disconnect mutation that checks for active sync runs and prevents deletion; read-only listDatasets, listTables, and getTableSchema queries with identifier validation and BigQuery error mapping. Registers the router under warehouse in appRouter.
Warehouse settings UI
apps/start/src/routes/_app.$organizationId.$projectId.settings._tabs.tsx, apps/start/src/routes/_app.$organizationId.$projectId.settings._tabs.warehouse.tsx
Adds a "Warehouse Sources" tab entry to project settings. Implements the complete warehouse settings page: AddConnectionDialog with provider selector (BigQuery enabled, others disabled), react-hook-form Zod-backed form for name and service account JSON, and "Connect & Verify" mutation; RotateKeyDialog for credential rotation with new service-account JSON input and pending state; TestStatusBadge for never-tested/success (relative time)/failure states; ConnectionCard displaying metadata and test/rotate/disconnect action buttons with loading indicators and confirmed disconnect flow; and WarehouseSettings container with loading/error/empty-state and list rendering via tRPC.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant WarehouseSettings as WarehouseSettings UI
  participant warehouseRouter as tRPC warehouseRouter
  participant warehouse_ts as warehouse.ts (DB layer)
  participant BigQuery as BigQuery API
  participant Prisma as Prisma (PostgreSQL)

  rect rgba(100, 150, 255, 0.5)
    Note over User,Prisma: Connect new warehouse
    User->>WarehouseSettings: Fill form & click "Connect & Verify"
    WarehouseSettings->>warehouseRouter: warehouse.connect(name, config, projectId)
    warehouseRouter->>warehouse_ts: buildBigQueryClient(config)
    warehouseRouter->>warehouse_ts: withTimeout(listWarehouseDatasets, 10s)
    warehouse_ts->>BigQuery: client.getDatasets()
    BigQuery-->>warehouse_ts: datasets
    warehouseRouter->>Prisma: create warehouse_connections (encrypted config)
    Prisma-->>warehouseRouter: connection record
    warehouseRouter-->>WarehouseSettings: connection record
    WarehouseSettings->>WarehouseSettings: invalidate listConnections query, close dialog
  end

  rect rgba(100, 200, 100, 0.5)
    Note over User,Prisma: Test existing connection
    User->>WarehouseSettings: Click "Test" on connection card
    WarehouseSettings->>warehouseRouter: warehouse.testConnection(connectionId)
    warehouseRouter->>Prisma: findUnique connection (ownership check)
    warehouseRouter->>warehouse_ts: listWarehouseDatasets(connectionId, projectId)
    warehouse_ts->>BigQuery: client.getDatasets()
    BigQuery-->>warehouse_ts: datasets or error
    warehouseRouter->>Prisma: update lastTestedAt / lastTestStatus / lastTestError
    warehouseRouter-->>WarehouseSettings: success or failure result
    WarehouseSettings->>WarehouseSettings: invalidate listConnections query, show toast
  end

  rect rgba(200, 100, 100, 0.5)
    Note over User,Prisma: Disconnect warehouse
    User->>WarehouseSettings: Click "Disconnect" on connection card
    WarehouseSettings->>WarehouseSettings: alert confirmation
    User->>WarehouseSettings: Confirm disconnect
    WarehouseSettings->>warehouseRouter: warehouse.disconnect(connectionId)
    warehouseRouter->>Prisma: check for running warehouse_sync_runs
    warehouseRouter->>Prisma: delete warehouse_connections
    warehouseRouter-->>WarehouseSettings: {ok: true}
    WarehouseSettings->>WarehouseSettings: invalidate listConnections query, show toast
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 A bunny hops to BigQuery's land,
With service accounts clutched in paw and hand,
Encrypted configs tucked away so neat,
Composite FKs making tenants discreet,
"Connect & Verify!" the warren cheers aloud —
The warehouse tab now shines among the cloud! ☁️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: implementing Phase 2 of BigQuery warehouse connector with connection management functionality.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
packages/trpc/src/routers/warehouse.ts (1)

343-343: ⚡ Quick win

Use shared zBqIdentifier instead of duplicating regexes for dataset/table inputs.

These two route inputs currently re-implement identifier validation and can drift from the canonical warehouse validator (including length constraints). Reuse zBqIdentifier from @openpanel/validation to keep router and schema contracts aligned.

Proposed refactor
 import {
+  zBqIdentifier,
   zWarehouseConfig,
   zWarehouseConnectionCreate,
 } from '`@openpanel/validation`';
@@
-    .input(zConnectionOwnership.extend({ dataset: z.string().regex(/^[a-zA-Z0-9_]+$/, 'Invalid dataset name') }))
+    .input(zConnectionOwnership.extend({ dataset: zBqIdentifier }))
@@
       zConnectionOwnership.extend({
-        dataset: z.string().regex(/^[a-zA-Z0-9_]+$/, 'Invalid dataset name'),
-        tableName: z.string().regex(/^[a-zA-Z0-9_]+$/, 'Invalid table name'),
+        dataset: zBqIdentifier,
+        tableName: zBqIdentifier,
       }),

Also applies to: 367-370

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/trpc/src/routers/warehouse.ts` at line 343, Remove the inline regex
validation for dataset identifiers in the warehouse router inputs. Replace the
z.string().regex() pattern with the shared zBqIdentifier validator imported from
`@openpanel/validation`. This same refactoring needs to be applied wherever
dataset or table identifier validation is duplicated (including the additional
occurrences mentioned in the comment). This ensures all identifier validation
stays consistent with the canonical warehouse validator including length
constraints, and eliminates schema drift between the router definitions and the
shared validation contract.
packages/db/src/warehouse.test.ts (1)

127-139: 💤 Low value

Static analysis false positive: test fixtures contain intentionally fake credentials.

The Betterleaks tool flagged the private_key values as potential secrets, but these are clearly test fixtures with placeholder values like "NEWKEY" and truncated headers. The keys are intentionally fake and pose no security risk.

If desired, you could make the test nature even more explicit:

🧹 Optional: Make test credentials more obviously fake
 const NEW_SA = JSON.stringify({
   type: 'service_account',
   project_id: 'my-project',
-  private_key: '-----BEGIN RSA PRIVATE KEY-----\nNEWKEY',
+  private_key: '-----BEGIN RSA PRIVATE KEY-----\nFAKE_TEST_KEY_DO_NOT_USE',
   client_email: 'sa-new@my-project.iam.gserviceaccount.com',
 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/src/warehouse.test.ts` around lines 127 - 139, The static
analysis tool is flagging the private_key values in the VALID_SA and NEW_SA test
fixture constants as potential secrets, even though these are clearly fake test
credentials. To prevent false positives, make the test credentials more
obviously fake by replacing the placeholder key values with even more explicit
fake values that clearly indicate they are test fixtures (for example, using
values like "FAKE_TEST_KEY" or "TEST_PRIVATE_KEY_NOT_REAL" instead of truncated
RSA headers), ensuring the static analysis tools recognize these as intentional
test data rather than actual credentials.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/db/prisma/migrations/20260608091000_bigquery_multi_connection/migration.sql`:
- Around line 5-9: The migration adds a name column with an empty string
default, but the follow-up unique constraint (referenced on line 12) will fail
if multiple connections already exist for the same project since they would all
have the same empty name value. Add clear comments in the migration file
explaining that the follow-up migration
(20260608140000_bigquery_referential_integrity) must be applied immediately
after to backfill unique names before the constraint is enforced, and document
in the PR description whether this rollout is intended for dev-only or
single-connection deployments, or if multi-connection projects exist in any
environment, confirm the immediate application requirement to avoid constraint
violation errors.

In `@packages/db/prisma/migrations/20260608120000_bigquery_harden/migration.sql`:
- Line 9: The ALTER TABLE statement for `lastSyncStatus` attempts a direct cast
to the `BigQuerySyncRunStatus` enum type, but this will fail if there are
existing text values in the database that don't match valid enum values. Replace
the simple USING clause that casts to the enum with a CASE expression that
evaluates the current `lastSyncStatus` value and either maps invalid values to a
valid default enum value or handles them appropriately. This ensures legacy or
unexpected text values are normalized before the cast occurs, allowing the
migration to complete successfully.

In `@packages/validation/src/index.ts`:
- Around line 733-741: The code casts the parsed JSON to Record<string, unknown>
without first checking if the value is actually an object, causing property
access on null or non-object values (like sa.type) to throw a TypeError. Add a
type guard after the cast to the variable sa to verify it is a non-null object
(using typeof sa === 'object' && sa !== null) before accessing any properties.
If the guard fails, add a validation issue via ctx.addIssue with an appropriate
error message, ensuring validation errors are reported instead of uncaught
TypeErrors.

---

Nitpick comments:
In `@packages/db/src/warehouse.test.ts`:
- Around line 127-139: The static analysis tool is flagging the private_key
values in the VALID_SA and NEW_SA test fixture constants as potential secrets,
even though these are clearly fake test credentials. To prevent false positives,
make the test credentials more obviously fake by replacing the placeholder key
values with even more explicit fake values that clearly indicate they are test
fixtures (for example, using values like "FAKE_TEST_KEY" or
"TEST_PRIVATE_KEY_NOT_REAL" instead of truncated RSA headers), ensuring the
static analysis tools recognize these as intentional test data rather than
actual credentials.

In `@packages/trpc/src/routers/warehouse.ts`:
- Line 343: Remove the inline regex validation for dataset identifiers in the
warehouse router inputs. Replace the z.string().regex() pattern with the shared
zBqIdentifier validator imported from `@openpanel/validation`. This same
refactoring needs to be applied wherever dataset or table identifier validation
is duplicated (including the additional occurrences mentioned in the comment).
This ensures all identifier validation stays consistent with the canonical
warehouse validator including length constraints, and eliminates schema drift
between the router definitions and the shared validation contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a5616f5b-645f-473e-92c8-264641e3fee6

📥 Commits

Reviewing files that changed from the base of the PR and between 241d259 and 94fffe5.

⛔ Files ignored due to path filters (2)
  • packages/db/src/generated/empty is excluded by !**/generated/**
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (27)
  • apps/start/src/routes/_app.$organizationId.$projectId.settings._tabs.tsx
  • apps/start/src/routes/_app.$organizationId.$projectId.settings._tabs.warehouse.tsx
  • packages/db/index.ts
  • packages/db/package.json
  • packages/db/prisma/migrations/20260607120000_add_event_meta_description/migration.sql
  • packages/db/prisma/migrations/20260608090217_add_bigquery_connector/migration.sql
  • packages/db/prisma/migrations/20260608091000_bigquery_multi_connection/migration.sql
  • packages/db/prisma/migrations/20260608120000_bigquery_harden/migration.sql
  • packages/db/prisma/migrations/20260608140000_bigquery_referential_integrity/migration.sql
  • packages/db/prisma/migrations/20260610115042_warehouse_restructure/migration.sql
  • packages/db/prisma/migrations/20260610115043_warehouse_security_fks/migration.sql
  • packages/db/prisma/migrations/20260610115044_warehouse_phase1_finalize/migration.sql
  • packages/db/prisma/migrations/20260610115045_warehouse_onetime_mode/migration.sql
  • packages/db/prisma/migrations/20260610115046_warehouse_sync_overlap_delay/migration.sql
  • packages/db/prisma/migrations/20260611000000_warehouse_run_composite_fk/migration.sql
  • packages/db/prisma/migrations/20260611000001_warehouse_name_btrim_check/migration.sql
  • packages/db/prisma/migrations/20260611000002_warehouse_connection_last_test_error/migration.sql
  • packages/db/prisma/migrations/20260615000001_warehouse_cancelled_status/migration.sql
  • packages/db/prisma/schema.prisma
  • packages/db/src/types.ts
  • packages/db/src/warehouse.test.ts
  • packages/db/src/warehouse.ts
  • packages/logger/index.ts
  • packages/trpc/src/root.ts
  • packages/trpc/src/routers/warehouse.ts
  • packages/validation/src/index.ts
  • packages/validation/src/warehouse.test.ts

Comment on lines +5 to +9
-- Add name column (default '' for any pre-existing rows during dev)
ALTER TABLE "public"."bigquery_connections" ADD COLUMN "name" TEXT NOT NULL DEFAULT '';

-- Remove the default now that column exists
ALTER TABLE "public"."bigquery_connections" ALTER COLUMN "name" DROP DEFAULT;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for migration rollout documentation or pre-migration data checks

rg -n -C3 -i 'rollout|backfill|existing.*connection|pre.*migration.*check' \
  --type=md --type=sql

Repository: Openpanel-dev/openpanel

Length of output: 3441


🏁 Script executed:

cat -n packages/db/prisma/migrations/20260608091000_bigquery_multi_connection/migration.sql

Repository: Openpanel-dev/openpanel

Length of output: 813


Constraint creation would fail with multiple pre-existing connections per project.

The CREATE UNIQUE INDEX on line 12 would violate uniqueness if multiple connections already exist for a single project, since they would all have name=''. The follow-up migration (20260608140000_bigquery_referential_integrity) handles this by backfilling empty names with unique values before the constraint is enforced database-wide.

Ensure the rollout strategy is documented: if this is intended for dev-only or single-connection deployments, clarify it in PR description or migration notes. If multi-connection projects exist in any environment where this runs, confirm the follow-up migration is applied immediately after to avoid constraint violation errors.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/db/prisma/migrations/20260608091000_bigquery_multi_connection/migration.sql`
around lines 5 - 9, The migration adds a name column with an empty string
default, but the follow-up unique constraint (referenced on line 12) will fail
if multiple connections already exist for the same project since they would all
have the same empty name value. Add clear comments in the migration file
explaining that the follow-up migration
(20260608140000_bigquery_referential_integrity) must be applied immediately
after to backfill unique names before the constraint is enforced, and document
in the PR description whether this rollout is intended for dev-only or
single-connection deployments, or if multi-connection projects exist in any
environment, confirm the immediate application requirement to avoid constraint
violation errors.

ALTER TABLE "public"."bigquery_connections" ADD COLUMN "lastTestStatus" BOOLEAN;

-- BigQuerySync: typed status enum, circuit-breaker fields, partition filter
ALTER TABLE "public"."bigquery_syncs" ALTER COLUMN "lastSyncStatus" TYPE "public"."BigQuerySyncRunStatus" USING "lastSyncStatus"::"public"."BigQuerySyncRunStatus";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard the enum cast against legacy text values.

lastSyncStatus was previously unconstrained TEXT, so any non-null value outside the enum aborts this migration. Normalize invalid values before the cast or use a guarded USING CASE.

Suggested migration hardening
+UPDATE "public"."bigquery_syncs"
+SET "lastSyncStatus" = NULL
+WHERE "lastSyncStatus" IS NOT NULL
+  AND "lastSyncStatus" NOT IN ('pending', 'running', 'completed', 'failed');
+
 ALTER TABLE "public"."bigquery_syncs" ALTER COLUMN "lastSyncStatus" TYPE "public"."BigQuerySyncRunStatus" USING "lastSyncStatus"::"public"."BigQuerySyncRunStatus";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ALTER TABLE "public"."bigquery_syncs" ALTER COLUMN "lastSyncStatus" TYPE "public"."BigQuerySyncRunStatus" USING "lastSyncStatus"::"public"."BigQuerySyncRunStatus";
UPDATE "public"."bigquery_syncs"
SET "lastSyncStatus" = NULL
WHERE "lastSyncStatus" IS NOT NULL
AND "lastSyncStatus" NOT IN ('pending', 'running', 'completed', 'failed');
ALTER TABLE "public"."bigquery_syncs" ALTER COLUMN "lastSyncStatus" TYPE "public"."BigQuerySyncRunStatus" USING "lastSyncStatus"::"public"."BigQuerySyncRunStatus";
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/prisma/migrations/20260608120000_bigquery_harden/migration.sql`
at line 9, The ALTER TABLE statement for `lastSyncStatus` attempts a direct cast
to the `BigQuerySyncRunStatus` enum type, but this will fail if there are
existing text values in the database that don't match valid enum values. Replace
the simple USING clause that casts to the enum with a CASE expression that
evaluates the current `lastSyncStatus` value and either maps invalid values to a
valid default enum value or handles them appropriately. This ensures legacy or
unexpected text values are normalized before the cast occurs, allowing the
migration to complete successfully.

Comment on lines +733 to +741
const sa = parsed as Record<string, unknown>;
if (sa.type !== 'service_account') {
ctx.addIssue({
code: z.ZodIssueCode.custom,
message:
'Must be a service account key (type: "service_account"). Application Default Credentials are not supported.',
});
}
if (!sa.private_key || typeof sa.private_key !== 'string') {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd packages/validation/src && wc -l index.ts

Repository: Openpanel-dev/openpanel

Length of output: 80


🏁 Script executed:

sed -n '690,750p' packages/validation/src/index.ts

Repository: Openpanel-dev/openpanel

Length of output: 2150


🏁 Script executed:

sed -n '1,750p' packages/validation/src/index.ts | grep -n "zServiceAccountJson" | head -20

Repository: Openpanel-dev/openpanel

Length of output: 108


🏁 Script executed:

sed -n '718,780p' packages/validation/src/index.ts

Repository: Openpanel-dev/openpanel

Length of output: 2070


Guard against null/non-object values before accessing properties in zServiceAccountJson.

After JSON.parse, the code casts directly to Record<string, unknown> without checking if the parsed value is actually an object. When the JSON input is the string "null", JSON.parse returns the literal null value, causing property access like sa.type to throw a TypeError instead of reporting a validation error.

Add a type guard before accessing object properties:

Proposed fix
     const sa = parsed as Record<string, unknown>;
+    if (typeof parsed !== 'object' || parsed === null || Array.isArray(parsed)) {
+      ctx.addIssue({
+        code: z.ZodIssueCode.custom,
+        message: 'Service account JSON must be an object',
+      });
+      return;
+    }
+    const sa = parsed as Record<string, unknown>;
     if (sa.type !== 'service_account') {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const sa = parsed as Record<string, unknown>;
if (sa.type !== 'service_account') {
ctx.addIssue({
code: z.ZodIssueCode.custom,
message:
'Must be a service account key (type: "service_account"). Application Default Credentials are not supported.',
});
}
if (!sa.private_key || typeof sa.private_key !== 'string') {
if (typeof parsed !== 'object' || parsed === null || Array.isArray(parsed)) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
message: 'Service account JSON must be an object',
});
return;
}
const sa = parsed as Record<string, unknown>;
if (sa.type !== 'service_account') {
ctx.addIssue({
code: z.ZodIssueCode.custom,
message:
'Must be a service account key (type: "service_account"). Application Default Credentials are not supported.',
});
}
if (!sa.private_key || typeof sa.private_key !== 'string') {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/validation/src/index.ts` around lines 733 - 741, The code casts the
parsed JSON to Record<string, unknown> without first checking if the value is
actually an object, causing property access on null or non-object values (like
sa.type) to throw a TypeError. Add a type guard after the cast to the variable
sa to verify it is a non-null object (using typeof sa === 'object' && sa !==
null) before accessing any properties. If the guard fails, add a validation
issue via ctx.addIssue with an appropriate error message, ensuring validation
errors are reported instead of uncaught TypeErrors.

Replace duplicate inline regexes in listTables/getTableSchema with the
shared zBqIdentifier validator (adds min/max length bounds the regex lacked).
Replace real-looking private_key fragments in test fixtures with an
obviously fake string to silence static secret scanners.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/db/src/warehouse.test.ts (1)

130-137: ⚡ Quick win

Use non-PEM fixture values for private_key to avoid secret-scanner noise.

Line 130 and Line 137 currently use PEM-shaped values, which are flagged as potential leaked keys by scanners even in tests. Replace them with clearly synthetic non-PEM placeholders so CI/security tooling stays signal-rich.

Suggested diff
-  private_key: '-----BEGIN RSA PRIVATE KEY-----\nFAKE_TEST_KEY_DO_NOT_USE',
+  private_key: 'fake_test_private_key_not_a_real_pem',
...
-  private_key: '-----BEGIN RSA PRIVATE KEY-----\nFAKE_TEST_KEY_DO_NOT_USE',
+  private_key: 'fake_test_private_key_not_a_real_pem',
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/src/warehouse.test.ts` around lines 130 - 137, Replace the
PEM-formatted private_key values in both test fixtures to use non-PEM
placeholder strings instead. In the first fixture definition (around line 130)
and in the NEW_SA constant definition (around line 137), change the private_key
from '-----BEGIN RSA PRIVATE KEY-----\nFAKE_TEST_KEY_DO_NOT_USE' to a clearly
synthetic non-PEM value like 'fake-test-key-do-not-use' or similar placeholder
that won't match PEM patterns and trigger secret-scanner alerts during CI.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/db/src/warehouse.test.ts`:
- Around line 130-137: Replace the PEM-formatted private_key values in both test
fixtures to use non-PEM placeholder strings instead. In the first fixture
definition (around line 130) and in the NEW_SA constant definition (around line
137), change the private_key from '-----BEGIN RSA PRIVATE
KEY-----\nFAKE_TEST_KEY_DO_NOT_USE' to a clearly synthetic non-PEM value like
'fake-test-key-do-not-use' or similar placeholder that won't match PEM patterns
and trigger secret-scanner alerts during CI.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6c2f0564-d734-46ef-a434-69ad62993a1e

📥 Commits

Reviewing files that changed from the base of the PR and between 94fffe5 and 96d6d13.

📒 Files selected for processing (2)
  • packages/db/src/warehouse.test.ts
  • packages/trpc/src/routers/warehouse.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/trpc/src/routers/warehouse.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant