Skip to content

TaskRun optimizations: dropping FKs and some indexes#3309

Merged
matt-aitken merged 5 commits intomainfrom
taskrun-optimizations
Apr 1, 2026
Merged

TaskRun optimizations: dropping FKs and some indexes#3309
matt-aitken merged 5 commits intomainfrom
taskrun-optimizations

Conversation

@matt-aitken
Copy link
Copy Markdown
Member

Summary

  • Drop all 8 foreign key constraints on TaskRun. The run listing path is now fully ClickHouse-backed so we no longer need Postgres to enforce referential integrity on this table. The FK constraints add write
    overhead on every insert/update with no remaining benefit. Prisma queries are unaffected.
  • Remove PostgresRunsRepository and its associated feature flag (runsListRepository), which was the last remaining code path querying TaskRun directly for list/count operations.
  • Drop three indexes that were only useful for the Postgres run list path and have no remaining query consumers:
    • TaskRun_runtimeEnvironmentId_id_idx — was the cursor pagination index for PostgresRunsRepository; superseded by the (runtimeEnvironmentId, createdAt DESC) composite index
    • TaskRun_scheduleId_idx — redundant with the (scheduleId, createdAt DESC) composite index; no direct Postgres queries filter by scheduleId alone
    • TaskRun_rootTaskRunId_idx — no queries filter TaskRun by rootTaskRunId as a WHERE clause anywhere in the codebase

All index drops use CONCURRENTLY IF EXISTS to avoid table locks in production.

Test plan

  • pnpm run db:migrate:deploy applies all migrations cleanly
  • pnpm run typecheck --filter webapp passes
  • Run list pages load correctly in the dashboard (ClickHouse path)
  • Scheduled task runs still trigger and appear correctly

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 1, 2026

⚠️ No Changeset found

Latest commit: a511151

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6770b6d7-270a-40aa-933c-700bf4e90fd4

📥 Commits

Reviewing files that changed from the base of the PR and between 3a5bc57 and a511151.

📒 Files selected for processing (1)
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: typecheck / typecheck
🧰 Additional context used
📓 Path-based instructions (10)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
apps/webapp/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Access all environment variables through the env export of env.server.ts instead of directly accessing process.env in the Trigger.dev webapp

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: When importing from @trigger.dev/core in the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webapp

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

**/*.ts: Use typecheck to verify changes in apps and internal packages (apps/*, internal-packages/*), not build - building proves almost nothing about correctness
When writing Trigger.dev tasks, always import from @trigger.dev/sdk. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob
Add crumbs as you write code - mark lines with // @Crumbs or wrap blocks in `// `#region` `@crumbs for agentcrumbs debug tracing, then strip before merge

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
apps/**/*

📄 CodeRabbit inference engine (CLAUDE.md)

When modifying only server components (apps/webapp/, apps/supervisor/, etc.) with no package changes, add a .server-changes/ file instead of a changeset

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
apps/webapp/**/*.server.{ts,tsx}

📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)

apps/webapp/**/*.server.{ts,tsx}: Environment variables must be accessed via the env export from app/env.server.ts and never use process.env directly
Always use findFirst instead of findUnique in Prisma queries due to implicit DataLoader batching issues and performance concerns

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
apps/webapp/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)

Use named constants for sentinel/placeholder values instead of raw string literals scattered across comparisons

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: internal-packages/database/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:17.177Z
Learning: Applies to internal-packages/database/**/prisma/migrations/*/*.sql : Clean up generated Prisma migrations by removing extraneous lines for junction tables (`_BackgroundWorkerToBackgroundWorkerFile`, `_BackgroundWorkerToTaskQueue`, `_TaskRunToTaskRunTag`, `_WaitpointRunConnections`, `_completedWaitpoints`) and indexes (`SecretStore_key_idx`, various `TaskRun` indexes) unless explicitly added
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: internal-packages/database/prisma/migrations/20260318114244_add_prompt_friendly_id/migration.sql:5-5
Timestamp: 2026-03-22T13:49:23.474Z
Learning: In `internal-packages/database/prisma/migrations/**/*.sql`: When a column and its index are added in a follow-up migration file but the parent table itself was introduced in the same PR (i.e., no production rows exist yet), a plain `CREATE INDEX` / `CREATE UNIQUE INDEX` (without CONCURRENTLY) is safe and does not require splitting into a separate migration. The CONCURRENTLY requirement only applies when the table already has existing data in production.
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: internal-packages/database/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:17.177Z
Learning: New code should always target Prisma RunEngineVersion V2 (run-engine + redis-worker), not V1 (legacy MarQS + Graphile)
📚 Learning: 2026-03-22T13:51:25.797Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/presenters/v3/PromptPresenter.server.ts:100-141
Timestamp: 2026-03-22T13:51:25.797Z
Learning: In the triggerdotdev/trigger.dev codebase, the ClickHouse server is configured with UTC as its timezone. Therefore, `toStartOfHour(start_time)` (without an explicit timezone argument) in ClickHouse queries correctly returns UTC-formatted strings that align with JavaScript `toISOString()`-derived UTC bucket keys (e.g., in `apps/webapp/app/presenters/v3/PromptPresenter.server.ts`). Do not flag this as a timezone mismatch bug.

Applied to files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
📚 Learning: 2026-03-22T13:51:25.797Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/presenters/v3/PromptPresenter.server.ts:100-141
Timestamp: 2026-03-22T13:51:25.797Z
Learning: In the triggerdotdev/trigger.dev codebase, the ClickHouse server is configured with UTC timezone. Therefore, `toStartOfHour(start_time)` (without an explicit timezone argument) in ClickHouse queries returns UTC-formatted strings, which correctly align with JavaScript `toISOString()`-derived UTC bucket keys. Do not flag this pattern as a timezone mismatch bug.

Applied to files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
📚 Learning: 2025-07-12T18:06:04.133Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.

Applied to files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
📚 Learning: 2025-06-14T08:07:46.625Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 2175
File: apps/webapp/app/services/environmentMetricsRepository.server.ts:202-207
Timestamp: 2025-06-14T08:07:46.625Z
Learning: In apps/webapp/app/services/environmentMetricsRepository.server.ts, the ClickHouse methods (getTaskActivity, getCurrentRunningStats, getAverageDurations) intentionally do not filter by the `tasks` parameter at the ClickHouse level, even though the tasks parameter is accepted by the public methods. This is done on purpose as there is not much benefit from adding that filtering at the ClickHouse layer.

Applied to files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
📚 Learning: 2026-03-29T19:16:28.864Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3291
File: apps/webapp/app/v3/featureFlags.ts:53-65
Timestamp: 2026-03-29T19:16:28.864Z
Learning: When reviewing TypeScript code that uses Zod v3, treat `z.coerce.*()` schemas as their direct Zod type (e.g., `z.coerce.boolean()` returns a `ZodBoolean` with `_def.typeName === "ZodBoolean"`) rather than a `ZodEffects`. Only `.preprocess()`, `.refine()`/`.superRefine()`, and `.transform()` are expected to wrap schemas in `ZodEffects`. Therefore, in reviewers’ logic like `getFlagControlType`, do not flag/unblock failures that require unwrapping `ZodEffects` when the input schema is a `z.coerce.*` schema.

Applied to files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
🔇 Additional comments (1)
apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts (1)

1232-1232: Good fix for scheduled-run span retrieval.

Line 1232’s 60s buffer is a solid correction and should prevent early pre-queued scheduled spans from being excluded.


Walkthrough

This PR removes the feature-flag-driven runs repository selection and the runsListRepository flag, deletes the Postgres runs repository implementation, and updates the runs service to call the ClickHouse repository exclusively (removing fallback and resiliency logic). It adds several Prisma SQL migrations to drop TaskRun foreign keys and indexes and removes TaskRun index directives from the Prisma schema. Additionally, it tightens the admin feature-flags defaults and increases ClickHouse span-query buffer in the event repository from 1s to 60s.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description is comprehensive, detailing the rationale, changes made, and test plan, though it does not follow the provided template structure with checklist and sections. Consider following the repository template: add the checklist items, structure with Testing/Changelog/Screenshots sections, and link to the related issue number.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main changes: dropping foreign keys and indexes on TaskRun for optimization purposes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch taskrun-optimizations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

The schedule engine pre-queues runs ~25s before exactScheduleTime so they're ready to execute on the dot, but overrideCreatedAt stamps the TaskRun's createdAt with the future scheduled time. The getSpan query used createdAt as the window start with only a 1s buffer, causing spans to fall outside the query window for scheduled runs.

The same issue was fixed for getTraceSummary in df4ab97 but getSpan was missed. Applying the same 60s buffer.
@matt-aitken matt-aitken merged commit 0e14b6d into main Apr 1, 2026
43 of 45 checks passed
@matt-aitken matt-aitken deleted the taskrun-optimizations branch April 1, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants