Skip to content

fix: make InvalidLinkReference fix atomic on PostgreSQL#2709

Open
dkindlund wants to merge 1 commit intoteableio:developfrom
dkindlund:fix/atomic-link-integrity-fix
Open

fix: make InvalidLinkReference fix atomic on PostgreSQL#2709
dkindlund wants to merge 1 commit intoteableio:developfrom
dkindlund:fix/atomic-link-integrity-fix

Conversation

@dkindlund
Copy link
Contributor

Problem

The /link-fix endpoint InvalidLinkReference repair has a race condition under concurrent write load. The current two-step approach:

  1. checkLinks() - SELECT to detect desynced record IDs
  2. fixLinks(recordIds) - UPDATE only those specific records

Between steps 1 and 2, concurrent writes can:

  • Create new desyncs not in the recordIds list (missed until next run)
  • Worsen existing desyncs (JSONB changes between detection and fix)
  • Overwrite the fix immediately after it is applied

In production under sustained concurrent API load (200-800 writes/hr), we have observed full JSONB array wipes (e.g., 9,781 entries reduced to 0) caused by the read-modify-write race in the link update path. The integrity fix endpoint, meant to repair these desyncs, is vulnerable to the same concurrent writes because of this detection-fix gap.

Solution

Add atomicFixLinks() to IntegrityQueryPostgres that combines detection and fix into a single UPDATE ... WHERE __id IN (SELECT ...) statement. Detection and fix execute under the same MVCC snapshot, eliminating the application-level gap entirely.

The service (LinkFieldIntegrityService.checkAndFix) tries the atomic method first. If the database engine does not support it (e.g., SQLite), it falls back gracefully to the existing two-step approach. SQLite behavior is completely unchanged.

What the /link-fix endpoint covers

For context, the endpoint handles 10 integrity issue types. This PR improves only InvalidLinkReference:

Issue Type What it detects Fix applied
InvalidLinkReference JSONB link column diverged from junction/FK source of truth Rebuilds JSONB from junction/FK data (THIS PR)
MissingRecordReference Junction table has rows pointing to deleted records Deletes orphaned junction rows
ForeignTableNotFound Link field references a deleted table No auto-fix
ForeignKeyHostTableNotFound Junction table is missing No auto-fix
ForeignKeyNotFound Missing FK columns in junction table Recreates columns, backfills from JSONB
SelfKeyNotFound Missing self-reference key in junction No auto-fix
SymmetricFieldNotFound Bidirectional link missing counterpart Converts to one-way link
ReferenceFieldNotFound Referenced record was deleted Deletes orphaned reference
UniqueIndexNotFound Missing unique constraint for OneOne links Creates the index
EmptyString Text fields with empty strings instead of NULL Converts to NULL

Relationship types handled

The atomic fix handles all four relationship types:

  • ManyMany (isMultiValue=true): Rebuilds JSONB array from junction table
  • OneMany (isMultiValue=true): Same as ManyMany
  • ManyOne (isMultiValue=false, FK in same table): Rebuilds single JSONB object from FK column
  • OneOne (isMultiValue=false, FK in host table): Rebuilds via cross-table join

Files changed (3 files, +174 lines)

  • abstract.ts: Added atomicFixLinks() with default null return
  • integrity-query.postgres.ts: PostgreSQL implementation of atomicFixLinks()
  • link-field.service.ts: checkAndFix() tries atomic first, falls back to two-step

Related issues / PRs

## Problem

The /link-fix endpoint's InvalidLinkReference repair has a race condition
under concurrent write load. The current two-step approach:

1. checkLinks() — SELECT to detect desynced record IDs
2. fixLinks(recordIds) — UPDATE only those specific records

Between steps 1 and 2, concurrent writes can:
- Create NEW desyncs not in the recordIds list (missed until next run)
- Worsen existing desyncs (JSONB changes between detection and fix)
- Overwrite the fix immediately after it's applied

In production under sustained concurrent API load (200-800 writes/hr),
we've observed full JSONB array wipes (e.g., 9,781 entries reduced to 0)
caused by the read-modify-write race in Teable's link update path. The
integrity fix endpoint, meant to repair these desyncs, is vulnerable to
the same concurrent writes because of this detection-fix gap.

## Solution

Add atomicFixLinks() to IntegrityQueryPostgres that combines detection
and fix into a single UPDATE ... WHERE __id IN (SELECT ...) statement.
Detection and fix execute under the same MVCC snapshot, eliminating the
application-level gap entirely.

The service (LinkFieldIntegrityService.checkAndFix) tries the atomic
method first. If the database engine doesn't support it (e.g., SQLite),
it falls back to the existing two-step approach. This is a safe,
backwards-compatible change — SQLite behavior is completely unchanged.

## What the /link-fix endpoint covers

For context, the endpoint handles 10 integrity issue types. This PR
improves only InvalidLinkReference. Here is the full list:

| Issue Type | What it detects | Fix applied |
|---|---|---|
| InvalidLinkReference | JSONB link column diverged from junction/FK source of truth | Rebuilds JSONB from junction/FK data (THIS PR) |
| MissingRecordReference | Junction table has rows pointing to deleted records | Deletes orphaned junction rows |
| ForeignTableNotFound | Link field references a deleted table | No auto-fix (requires manual intervention) |
| ForeignKeyHostTableNotFound | Junction table is missing | No auto-fix |
| ForeignKeyNotFound | Missing FK columns in junction table | Recreates columns, backfills from JSONB |
| SelfKeyNotFound | Missing self-reference key in junction | No auto-fix |
| SymmetricFieldNotFound | Bidirectional link missing its counterpart | Converts to one-way link |
| ReferenceFieldNotFound | Referenced record was deleted | Deletes orphaned reference |
| UniqueIndexNotFound | Missing unique constraint for OneOne links | Creates the index |
| EmptyString | Text fields have empty strings instead of NULL | Converts to NULL |

## Relationship types handled

The atomic fix handles all four relationship types:
- ManyMany (isMultiValue=true): Rebuilds JSONB array from junction table
- OneMany (isMultiValue=true): Same as ManyMany
- ManyOne (isMultiValue=false, FK in same table): Rebuilds single JSONB object from FK column
- OneOne (isMultiValue=false, FK in host table): Rebuilds via cross-table join

## Files changed

- abstract.ts: Added atomicFixLinks() with default null return
- integrity-query.postgres.ts: PostgreSQL implementation of atomicFixLinks()
- link-field.service.ts: checkAndFix() tries atomic first, falls back to two-step

## Related issues

- teableio#2680 (DataLoader cache invalidation under concurrency)
- teableio#2676 (Sort record IDs in lockForeignRecords)
- teableio#2677 (Wrap simpleUpdateRecords with transaction/timeout/retry)
- teableio#2679 (Add foreign record locking to ManyMany, OneMany, OneOne)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant