feat(dao): add operator_port_cache table by Xiao-zhen-Liu · Pull Request #5967 · apache/texera

Xiao-zhen-Liu · 2026-06-28T17:29:59Z

What changes were proposed in this PR?

Adds the operator_port_cache table that records a materialized output port
result so it can be reused across executions. It is keyed by
(workflow_id, global_port_id, cache_key) and stores the JSON the cache key was
computed from, the result location, an optional tuple count and source execution
id, and a database-managed updated_at. The foreign key to workflow(wid) is
ON DELETE CASCADE. The stored JSON (cache_key_json) lets a lookup confirm a
hash match by comparing the full JSON, so a hash collision never reuses the wrong
result.

The change is additive: a new table in sql/texera_ddl.sql (fresh installs) plus
a Liquibase migration sql/updates/26.sql registered in sql/changelog.xml
(existing deployments). No code reads or writes the table yet; the cache read/write
logic and its tests land with the cache service that uses it, following the
convention of testing a table through its consumer (as feedback is tested via
FeedbackResourceSpec).

Any related issues, documentation, discussions?

Closes #5969. Part of the storage foundation #5882 (umbrella #5881). Design discussion: #5880.

How was this PR tested?

Verified the schema directly against Postgres: the migration applies cleanly, the
columns and primary key (workflow_id, global_port_id, cache_key) are correct,
the foreign key's delete rule is CASCADE, the schema file and the migration
define identical columns/keys, and changelog.xml is well-formed and registers
26.sql. The generated jOOQ classes build from the table. The table's runtime
behavior is exercised by the cache service tests in the follow-up PR.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8 (Claude Code)

Adds the operator_port_cache table (texera_ddl.sql + Liquibase migration sql/updates/26.sql), keyed by (workflow_id, global_port_id, cache_key) with ON DELETE CASCADE to workflow. The cache read/write logic and its tests land with the cache service that uses it. Part of apache#5882.

github-actions · 2026-06-28T17:30:16Z

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

Contributors with relevant context: @aicam
You can notify them by mentioning @aicam in a comment.

codecov-commenter · 2026-06-28T17:32:29Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.65%. Comparing base (a24d1d1) to head (8e71ebb).
⚠️ Report is 32 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #5967      +/-   ##
============================================
+ Coverage     56.28%   56.65%   +0.37%     
- Complexity     2992     3023      +31     
============================================
  Files          1120     1121       +1     
  Lines         43217    43294      +77     
  Branches       4662     4667       +5     
============================================
+ Hits          24326    24530     +204     
+ Misses        17472    17325     -147     
- Partials       1419     1439      +20

Flag	Coverage Δ		*Carryforward flag
access-control-service	`70.00% <ø> (ø)`
agent-service	`44.95% <ø> (ø)`		Carriedforward from a89dbd4
amber	`58.64% <ø> (+0.84%)`	⬆️
computing-unit-managing-service	`0.00% <ø> (ø)`
config-service	`52.30% <ø> (+0.74%)`	⬆️
file-service	`62.81% <ø> (+3.79%)`	⬆️
frontend	`49.33% <ø> (ø)`		Carriedforward from a89dbd4
notebook-migration-service	`78.57% <ø> (ø)`
pyamber	`90.20% <ø> (ø)`		Carriedforward from a89dbd4
python	`90.76% <ø> (ø)`		Carriedforward from a89dbd4
workflow-compiling-service	`55.14% <ø> (ø)`

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Xiao-zhen-Liu · 2026-06-28T17:35:10Z

@carloea2 could you help review this one when you get a chance? Thanks!

github-actions · 2026-06-28T17:35:43Z

⚠️ Benchmark changes need a look

🟢 0 better · 🔴 3 worse · ⚪ 12 noise (<±5%) · 0 without baseline

Compared against main a24d1d1 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

	config	throughput	MB/s	latency	max Δ latest / 7d
🔴	bs=10 sw=10 sl=64	376	0.229	24,770/37,608/37,608 us	🔴 +5.1% / 🔴 +149.5%
🔴	bs=100 sw=10 sl=64	779	0.475	127,712/149,927/149,927 us	🔴 +5.5% / 🔴 +39.3%
⚪	bs=1000 sw=10 sl=64	915	0.558	1,089,736/1,141,300/1,141,300 us	⚪ within ±5% / 🔴 +11.0%

Baseline details

Latest main a24d1d1 from same runner

config	metric	PR	latest main	7d avg	Δ latest	Δ 7d
bs=10 sw=10 sl=64	throughput	376 tuples/sec	393 tuples/sec	777.62 tuples/sec	-4.3%	-51.6%
bs=10 sw=10 sl=64	MB/s	0.229 MB/s	0.24 MB/s	0.475 MB/s	-4.6%	-51.8%
bs=10 sw=10 sl=64	p50	24,770 us	24,108 us	12,612 us	+2.7%	+96.4%
bs=10 sw=10 sl=64	p95	37,608 us	35,777 us	15,070 us	+5.1%	+149.5%
bs=10 sw=10 sl=64	p99	37,608 us	35,777 us	18,360 us	+5.1%	+104.8%
bs=100 sw=10 sl=64	throughput	779 tuples/sec	818 tuples/sec	988.31 tuples/sec	-4.8%	-21.2%
bs=100 sw=10 sl=64	MB/s	0.475 MB/s	0.499 MB/s	0.603 MB/s	-4.8%	-21.3%
bs=100 sw=10 sl=64	p50	127,712 us	121,033 us	101,066 us	+5.5%	+26.4%
bs=100 sw=10 sl=64	p95	149,927 us	144,950 us	107,594 us	+3.4%	+39.3%
bs=100 sw=10 sl=64	p99	149,927 us	144,950 us	115,830 us	+3.4%	+29.4%
bs=1000 sw=10 sl=64	throughput	915 tuples/sec	908 tuples/sec	1,019 tuples/sec	+0.8%	-10.2%
bs=1000 sw=10 sl=64	MB/s	0.558 MB/s	0.554 MB/s	0.622 MB/s	+0.7%	-10.3%
bs=1000 sw=10 sl=64	p50	1,089,736 us	1,097,356 us	986,982 us	-0.7%	+10.4%
bs=1000 sw=10 sl=64	p95	1,141,300 us	1,152,889 us	1,028,491 us	-1.0%	+11.0%
bs=1000 sw=10 sl=64	p99	1,141,300 us	1,152,889 us	1,058,493 us	-1.0%	+7.8%

Raw CSV

config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,532.61,200,128000,376,0.229,24769.58,37607.52,37607.52
1,100,10,64,20,2568.28,2000,1280000,779,0.475,127711.79,149927.22,149927.22
2,1000,10,64,20,21862.33,20000,12800000,915,0.558,1089736.19,1141300.33,1141300.33

Yicong-Huang · 2026-06-29T00:31:26Z

@Xiao-zhen-Liu please link issue properly

Address review: result implies a direction, storage_uri is clearer. tuple_count is kept (immutable per row, populated at materialization, read by the coordinator alongside the cache lookup so cached-region stats need no extra storage round-trip).

Xiao-zhen-Liu · 2026-06-29T17:05:59Z

Thanks @Yicong-Huang — replies inline. Renamed result_uri -> storage_uri. Two I kept, with reasoning inline: tuple_count (a cache row is immutable so it can't drift, and the coordinator reads it alongside the cache lookup so cached-region stats need no extra storage round-trip) and the PK without execution_id (the cache is reused across executions, keyed by cache_key). Re-requesting your review.

Address review: spell out that cache_key is the hash/lookup key and cache_key_json is the JSON it was computed from (collision check); that a changed upstream computation (e.g. operator version) yields a new cache_key and a new row rather than overwriting; and why tuple_count is kept.

Yicong-Huang

LGTM! thanks

…ey_hash Address review (Carlos, Yicong): make the hash explicit. cache_key_hash is the SHA-256 hash / lookup key; cache_key_json stays as the JSON it was computed from.

github-actions Bot assigned Xiao-zhen-Liu Jun 28, 2026

github-actions Bot added the ddl-change Changes to the TexeraDB DDL label Jun 28, 2026

Xiao-zhen-Liu mentioned this pull request Jun 28, 2026

Add operator output port cache storage (table and cache key) #5882

Open

6 tasks

Xiao-zhen-Liu requested a review from Yicong-Huang June 28, 2026 17:35

Xiao-zhen-Liu mentioned this pull request Jun 28, 2026

Add the operator_port_cache table #5969

Open

6 tasks

Yicong-Huang requested changes Jun 28, 2026

View reviewed changes

Comment thread sql/updates/26.sql Outdated

Comment thread sql/updates/26.sql Outdated

Comment thread sql/updates/26.sql

Comment thread sql/texera_ddl.sql Outdated

Xiao-zhen-Liu requested a review from Yicong-Huang June 29, 2026 17:06

Yicong-Huang approved these changes Jun 30, 2026

View reviewed changes

carloea2 reviewed Jun 30, 2026

View reviewed changes

Comment thread sql/updates/26.sql Outdated

Comment thread sql/updates/26.sql Outdated

refactor(dao): rename operator_port_cache cache_key column to cache_k…

8e71ebb

…ey_hash Address review (Carlos, Yicong): make the hash explicit. cache_key_hash is the SHA-256 hash / lookup key; cache_key_json stays as the JSON it was computed from.

Xiao-zhen-Liu enabled auto-merge June 30, 2026 16:23

Xiao-zhen-Liu added this pull request to the merge queue Jun 30, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(dao): add operator_port_cache table#5967

feat(dao): add operator_port_cache table#5967
Xiao-zhen-Liu wants to merge 4 commits into
apache:mainfrom
Xiao-zhen-Liu:cache-table

Xiao-zhen-Liu commented Jun 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

codecov-commenter commented Jun 28, 2026 •

edited

Loading

Uh oh!

Xiao-zhen-Liu commented Jun 28, 2026

Uh oh!

github-actions Bot commented Jun 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Yicong-Huang commented Jun 29, 2026

Uh oh!

Xiao-zhen-Liu commented Jun 29, 2026

Uh oh!

Yicong-Huang left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Xiao-zhen-Liu commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

github-actions Bot commented Jun 28, 2026

Automated Reviewer Suggestions

Uh oh!

codecov-commenter commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Xiao-zhen-Liu commented Jun 28, 2026

Uh oh!

github-actions Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Benchmark changes need a look

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Yicong-Huang commented Jun 29, 2026

Uh oh!

Xiao-zhen-Liu commented Jun 29, 2026

Uh oh!

Yicong-Huang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Xiao-zhen-Liu commented Jun 28, 2026 •

edited

Loading

codecov-commenter commented Jun 28, 2026 •

edited

Loading

github-actions Bot commented Jun 28, 2026 •

edited

Loading