Skip to content

feat(dao): add operator_port_cache table#5967

Open
Xiao-zhen-Liu wants to merge 4 commits into
apache:mainfrom
Xiao-zhen-Liu:cache-table
Open

feat(dao): add operator_port_cache table#5967
Xiao-zhen-Liu wants to merge 4 commits into
apache:mainfrom
Xiao-zhen-Liu:cache-table

Conversation

@Xiao-zhen-Liu

@Xiao-zhen-Liu Xiao-zhen-Liu commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Adds the operator_port_cache table that records a materialized output port
result so it can be reused across executions. It is keyed by
(workflow_id, global_port_id, cache_key) and stores the JSON the cache key was
computed from, the result location, an optional tuple count and source execution
id, and a database-managed updated_at. The foreign key to workflow(wid) is
ON DELETE CASCADE. The stored JSON (cache_key_json) lets a lookup confirm a
hash match by comparing the full JSON, so a hash collision never reuses the wrong
result.

The change is additive: a new table in sql/texera_ddl.sql (fresh installs) plus
a Liquibase migration sql/updates/26.sql registered in sql/changelog.xml
(existing deployments). No code reads or writes the table yet; the cache read/write
logic and its tests land with the cache service that uses it, following the
convention of testing a table through its consumer (as feedback is tested via
FeedbackResourceSpec).

Any related issues, documentation, discussions?

Closes #5969. Part of the storage foundation #5882 (umbrella #5881). Design discussion: #5880.

How was this PR tested?

Verified the schema directly against Postgres: the migration applies cleanly, the
columns and primary key (workflow_id, global_port_id, cache_key) are correct,
the foreign key's delete rule is CASCADE, the schema file and the migration
define identical columns/keys, and changelog.xml is well-formed and registers
26.sql. The generated jOOQ classes build from the table. The table's runtime
behavior is exercised by the cache service tests in the follow-up PR.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8 (Claude Code)

Adds the operator_port_cache table (texera_ddl.sql + Liquibase migration sql/updates/26.sql), keyed by (workflow_id, global_port_id, cache_key) with ON DELETE CASCADE to workflow. The cache read/write logic and its tests land with the cache service that uses it. Part of apache#5882.
@github-actions github-actions Bot added the ddl-change Changes to the TexeraDB DDL label Jun 28, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @aicam
    You can notify them by mentioning @aicam in a comment.

@codecov-commenter

codecov-commenter commented Jun 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.65%. Comparing base (a24d1d1) to head (8e71ebb).
⚠️ Report is 32 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5967      +/-   ##
============================================
+ Coverage     56.28%   56.65%   +0.37%     
- Complexity     2992     3023      +31     
============================================
  Files          1120     1121       +1     
  Lines         43217    43294      +77     
  Branches       4662     4667       +5     
============================================
+ Hits          24326    24530     +204     
+ Misses        17472    17325     -147     
- Partials       1419     1439      +20     
Flag Coverage Δ *Carryforward flag
access-control-service 70.00% <ø> (ø)
agent-service 44.95% <ø> (ø) Carriedforward from a89dbd4
amber 58.64% <ø> (+0.84%) ⬆️
computing-unit-managing-service 0.00% <ø> (ø)
config-service 52.30% <ø> (+0.74%) ⬆️
file-service 62.81% <ø> (+3.79%) ⬆️
frontend 49.33% <ø> (ø) Carriedforward from a89dbd4
notebook-migration-service 78.57% <ø> (ø)
pyamber 90.20% <ø> (ø) Carriedforward from a89dbd4
python 90.76% <ø> (ø) Carriedforward from a89dbd4
workflow-compiling-service 55.14% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Xiao-zhen-Liu

Copy link
Copy Markdown
Contributor Author

@carloea2 could you help review this one when you get a chance? Thanks!

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

⚠️ Benchmark changes need a look

🟢 0 better · 🔴 3 worse · ⚪ 12 noise (<±5%) · 0 without baseline

Compared against main a24d1d1 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🔴 bs=10 sw=10 sl=64 376 0.229 24,770/37,608/37,608 us 🔴 +5.1% / 🔴 +149.5%
🔴 bs=100 sw=10 sl=64 779 0.475 127,712/149,927/149,927 us 🔴 +5.5% / 🔴 +39.3%
bs=1000 sw=10 sl=64 915 0.558 1,089,736/1,141,300/1,141,300 us ⚪ within ±5% / 🔴 +11.0%
Baseline details

Latest main a24d1d1 from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 376 tuples/sec 393 tuples/sec 777.62 tuples/sec -4.3% -51.6%
bs=10 sw=10 sl=64 MB/s 0.229 MB/s 0.24 MB/s 0.475 MB/s -4.6% -51.8%
bs=10 sw=10 sl=64 p50 24,770 us 24,108 us 12,612 us +2.7% +96.4%
bs=10 sw=10 sl=64 p95 37,608 us 35,777 us 15,070 us +5.1% +149.5%
bs=10 sw=10 sl=64 p99 37,608 us 35,777 us 18,360 us +5.1% +104.8%
bs=100 sw=10 sl=64 throughput 779 tuples/sec 818 tuples/sec 988.31 tuples/sec -4.8% -21.2%
bs=100 sw=10 sl=64 MB/s 0.475 MB/s 0.499 MB/s 0.603 MB/s -4.8% -21.3%
bs=100 sw=10 sl=64 p50 127,712 us 121,033 us 101,066 us +5.5% +26.4%
bs=100 sw=10 sl=64 p95 149,927 us 144,950 us 107,594 us +3.4% +39.3%
bs=100 sw=10 sl=64 p99 149,927 us 144,950 us 115,830 us +3.4% +29.4%
bs=1000 sw=10 sl=64 throughput 915 tuples/sec 908 tuples/sec 1,019 tuples/sec +0.8% -10.2%
bs=1000 sw=10 sl=64 MB/s 0.558 MB/s 0.554 MB/s 0.622 MB/s +0.7% -10.3%
bs=1000 sw=10 sl=64 p50 1,089,736 us 1,097,356 us 986,982 us -0.7% +10.4%
bs=1000 sw=10 sl=64 p95 1,141,300 us 1,152,889 us 1,028,491 us -1.0% +11.0%
bs=1000 sw=10 sl=64 p99 1,141,300 us 1,152,889 us 1,058,493 us -1.0% +7.8%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,532.61,200,128000,376,0.229,24769.58,37607.52,37607.52
1,100,10,64,20,2568.28,2000,1280000,779,0.475,127711.79,149927.22,149927.22
2,1000,10,64,20,21862.33,20000,12800000,915,0.558,1089736.19,1141300.33,1141300.33

Comment thread sql/updates/26.sql Outdated
Comment thread sql/updates/26.sql Outdated
Comment thread sql/updates/26.sql
Comment thread sql/texera_ddl.sql Outdated
@Yicong-Huang

Copy link
Copy Markdown
Contributor

@Xiao-zhen-Liu please link issue properly

Address review: result implies a direction, storage_uri is clearer. tuple_count is kept (immutable per row, populated at materialization, read by the coordinator alongside the cache lookup so cached-region stats need no extra storage round-trip).
@Xiao-zhen-Liu

Copy link
Copy Markdown
Contributor Author

Thanks @Yicong-Huang — replies inline. Renamed result_uri -> storage_uri. Two I kept, with reasoning inline: tuple_count (a cache row is immutable so it can't drift, and the coordinator reads it alongside the cache lookup so cached-region stats need no extra storage round-trip) and the PK without execution_id (the cache is reused across executions, keyed by cache_key). Re-requesting your review.

Address review: spell out that cache_key is the hash/lookup key and cache_key_json is the JSON it was computed from (collision check); that a changed upstream computation (e.g. operator version) yields a new cache_key and a new row rather than overwriting; and why tuple_count is kept.

@Yicong-Huang Yicong-Huang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks

Comment thread sql/updates/26.sql Outdated
Comment thread sql/updates/26.sql Outdated
…ey_hash

Address review (Carlos, Yicong): make the hash explicit. cache_key_hash is the SHA-256 hash / lookup key; cache_key_json stays as the JSON it was computed from.
@Xiao-zhen-Liu Xiao-zhen-Liu enabled auto-merge June 30, 2026 16:23
@Xiao-zhen-Liu Xiao-zhen-Liu added this pull request to the merge queue Jun 30, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ddl-change Changes to the TexeraDB DDL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add the operator_port_cache table

4 participants