test(amber): move live-catalog iceberg specs to the integration suite#6045
test(amber): move live-catalog iceberg specs to the integration suite#6045mengw15 wants to merge 1 commit into
Conversation
Automated Reviewer SuggestionsBased on the
|
There was a problem hiding this comment.
Pull request overview
Moves Iceberg specs that require a live Iceberg catalog out of common/workflow-core unit tests and into Amber’s integration test suite, so the amber unit CI job no longer implicitly depends on a catalog being provisioned (preparing for the catalog default flip in #6034).
Changes:
- Add
@IntegrationTesttagging toIcebergDocumentSpec,IcebergDocumentConsoleMessagesSpec, andIcebergTableStatsSpecso they run only under the integration filter. - Relocate the shared
VirtualDocumentSpecbase intoamber/src/test/integration/...alongside the Iceberg integration specs. - Keep package names unchanged so this is effectively a move/retag for CI routing rather than a behavior change.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| amber/src/test/integration/org/apache/texera/amber/storage/result/iceberg/IcebergTableStatsSpec.scala | Tags the spec as @IntegrationTest so it’s routed to the integration job. |
| amber/src/test/integration/org/apache/texera/amber/storage/result/iceberg/IcebergDocumentSpec.scala | Tags the spec as @IntegrationTest so it’s routed to the integration job. |
| amber/src/test/integration/org/apache/texera/amber/storage/result/iceberg/IcebergDocumentConsoleMessagesSpec.scala | Tags the spec as @IntegrationTest so it’s routed to the integration job. |
| amber/src/test/integration/org/apache/texera/amber/core/storage/model/VirtualDocumentSpec.scala | Provides the shared VirtualDocument test behavior used by the moved Iceberg specs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
| config | throughput | MB/s | latency | max Δ latest / 7d | |
|---|---|---|---|---|---|
| 🔴 | bs=10 sw=10 sl=64 | 424 | 0.259 | 22,528/31,994/31,994 us | 🔴 +9.7% / 🔴 +112.3% |
| 🔴 | bs=100 sw=10 sl=64 | 952 | 0.581 | 102,397/131,553/131,553 us | 🔴 +7.7% / 🔴 +22.3% |
| ⚪ | bs=1000 sw=10 sl=64 | 1,091 | 0.666 | 924,186/962,377/962,377 us | ⚪ within ±5% / 🟢 -9.1% |
Baseline details
Latest main 104bcc4 from same runner
| config | metric | PR | latest main | 7d avg | Δ latest | Δ 7d |
|---|---|---|---|---|---|---|
| bs=10 sw=10 sl=64 | throughput | 424 tuples/sec | 456 tuples/sec | 777.62 tuples/sec | -7.0% | -45.5% |
| bs=10 sw=10 sl=64 | MB/s | 0.259 MB/s | 0.278 MB/s | 0.475 MB/s | -6.8% | -45.4% |
| bs=10 sw=10 sl=64 | p50 | 22,528 us | 20,532 us | 12,612 us | +9.7% | +78.6% |
| bs=10 sw=10 sl=64 | p95 | 31,994 us | 31,645 us | 15,070 us | +1.1% | +112.3% |
| bs=10 sw=10 sl=64 | p99 | 31,994 us | 31,645 us | 18,360 us | +1.1% | +74.3% |
| bs=100 sw=10 sl=64 | throughput | 952 tuples/sec | 946 tuples/sec | 988.31 tuples/sec | +0.6% | -3.7% |
| bs=100 sw=10 sl=64 | MB/s | 0.581 MB/s | 0.578 MB/s | 0.603 MB/s | +0.5% | -3.7% |
| bs=100 sw=10 sl=64 | p50 | 102,397 us | 104,522 us | 101,066 us | -2.0% | +1.3% |
| bs=100 sw=10 sl=64 | p95 | 131,553 us | 122,193 us | 107,594 us | +7.7% | +22.3% |
| bs=100 sw=10 sl=64 | p99 | 131,553 us | 122,193 us | 115,830 us | +7.7% | +13.6% |
| bs=1000 sw=10 sl=64 | throughput | 1,091 tuples/sec | 1,087 tuples/sec | 1,019 tuples/sec | +0.4% | +7.0% |
| bs=1000 sw=10 sl=64 | MB/s | 0.666 MB/s | 0.664 MB/s | 0.622 MB/s | +0.3% | +7.0% |
| bs=1000 sw=10 sl=64 | p50 | 924,186 us | 915,216 us | 986,982 us | +1.0% | -6.4% |
| bs=1000 sw=10 sl=64 | p95 | 962,377 us | 979,515 us | 1,028,491 us | -1.7% | -6.4% |
| bs=1000 sw=10 sl=64 | p99 | 962,377 us | 979,515 us | 1,058,493 us | -1.7% | -9.1% |
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,472.07,200,128000,424,0.259,22528.48,31994.08,31994.08
1,100,10,64,20,2099.79,2000,1280000,952,0.581,102396.51,131553.33,131553.33
2,1000,10,64,20,18333.05,20000,12800000,1091,0.666,924186.28,962377.48,962377.48There was a problem hiding this comment.
does VirtualDocument live in common?
There was a problem hiding this comment.
Yes, VirtualDocument lives in common/workflow-core (core.storage.model). VirtualDocumentSpec is its shared test base and is used only by the two iceberg document specs in this move — they extend VirtualDocumentSpec[Tuple], and amber's test scope sees workflow-core's main but not its test sources, so it had to move with them or they wouldn't compile here.
If you'd rather keep it next to VirtualDocument in common, the alternative is to leave it there and add amber.dependsOn(WorkflowCore % "test->test") (the pattern amber already uses for DAO/Auth). Which do you prefer?
There was a problem hiding this comment.
Ideally, test should live in the same package of src code.
For short term, you can expose common test to amber, as currently we only setup integration test for amber. If needed, we could further introduce integration test for other packages, and generalize amber-integration to integration CI job.
There was a problem hiding this comment.
Done — kept VirtualDocumentSpec in common/workflow-core (next to VirtualDocument) and exposed common's test scope to amber via WorkflowCore % "test->test", so the integration specs still reach it. Only the three live-catalog specs move to amber/src/test/integration. Thanks!
There was a problem hiding this comment.
Update: WorkflowCore % "test->test" runs into a classpath conflict — WorkflowCore's test scope pulls Hadoop → com.sun.jersey 1.19, whose bundled JAX-RS 1.x Response (no getHeaderString) shadows amber's javax.ws.rs-api:2.1.1 and breaks the existing HuggingFaceModelResourceSpec. (Fine for the light DAO/Auth test scopes, not the heavy WorkflowCore one.)
So for now VirtualDocumentSpec stays in amber — moved to amber/src/test/scala as a generic AnyFlatSpec harness (not an integration test); Test/compile and amber-integration are green. Longer term, generalizing amber-integration into a shared integration job (as you suggested) would let it move back to common. Thanks!
2f879d4 to
285dd05
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6045 +/- ##
============================================
- Coverage 56.90% 56.32% -0.58%
+ Complexity 3024 2983 -41
============================================
Files 1129 1129
Lines 43794 43794
Branches 4743 4743
============================================
- Hits 24920 24669 -251
- Misses 17399 17671 +272
+ Partials 1475 1454 -21
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
IcebergDocumentSpec, IcebergDocumentConsoleMessagesSpec and IcebergTableStatsSpec resolve a live Iceberg catalog in their per-test setup (DocumentFactory / IcebergCatalogInstance.getInstance()), so every case needs a catalog. Move them from common/workflow-core's test sources into amber/src/test/integration -- where the @IntegrationTEST filter (AMBER_TEST_FILTER) applies and IcebergRestCatalogIntegrationSpec already lives -- and tag the three concrete specs @IntegrationTEST. Tagging in place would not work: common/workflow-core/build.sbt does not read AMBER_TEST_FILTER. The shared VirtualDocumentSpec base the first two extend moves with them, into amber/src/test/scala -- it is a generic AnyFlatSpec harness (not an integration test), so it sits in the regular test sources rather than the integration set. IcebergUtilSpec stays in common/workflow-core: it has no catalog setup and its cases are pure conversions plus one REST failure check documented as passing with or without Lakekeeper reachable, so it must keep running in the amber unit job. Part of apache#6034.
285dd05 to
cc55ed0
Compare
What changes were proposed in this PR?
Three Iceberg specs —
IcebergDocumentSpec,IcebergDocumentConsoleMessagesSpec, andIcebergTableStatsSpec— resolve a live Iceberg catalog in their per-test setup (beforeEach/beforeAllcreate documents throughDocumentFactory/IcebergCatalogInstance.getInstance()), so none of their cases can run without a catalog. Today the amber unit job runs them against the postgres catalog it provisions; once that job stops provisioning one they would fail.This PR moves them from
common/workflow-core/src/test/scala/...toamber/src/test/integration/...— where the@IntegrationTestfilter (AMBER_TEST_FILTER) already applies andIcebergRestCatalogIntegrationSpecalready lives — and tags the three concrete specs@IntegrationTest.Tagging in place is not sufficient:
common/workflow-core/build.sbtdoes not readAMBER_TEST_FILTER, so@IntegrationTestwould be ignored there.The shared
VirtualDocumentSpecbase (extended by the first two specs) moves with them, intoamber/src/test/scala— it is a genericAnyFlatSpecharness rather than an integration test, so it lives in amber's regular test sources, not the integration set. (Keeping it incommonand exposing it viaWorkflowCore % "test->test"was tried but reverted: WorkflowCore's test scope pulls Hadoop →com.sun.jersey1.19, whose bundled JAX-RS 1.xResponseshadows amber'sjavax.ws.rs-api:2.1.1and breaksHuggingFaceModelResourceSpec. Longer term, generalizingamber-integrationinto a sharedintegrationCI job would let the specs live incommonnext to the code they exercise.)IcebergUtilSpecintentionally stays incommon/workflow-core. Unlike the three above itextends AnyFlatSpecwith no catalog setup and never callsgetInstance(); its cases are pure Texera↔Iceberg type/schema/tuple conversions plus one REST-endpoint failure check whose own comment documents it as passing "with or without Lakekeeper reachable." It is a unit test and must keep running in the amber unit job.Any related issues, documentation, discussions?
Part of #6034 (Task 1 of 3). Does not close it — Tasks 2 (flip the
storage.confcatalog default torest) and 3 (mirror prod in the CI integration suite) follow, and Task 1 must land first.How was this PR tested?
Test-file relocation with no production-code change, so verification is at the compile/placement level:
sbt WorkflowExecutionService/Test/compileandsbt WorkflowCore/Test/compilepass locally (the specs move cleanly and amber's test scope compiles).sbt scalafmtCheckAllpasses.build / amberandbuild / amber-integration(ubuntu + macos) are green on CI, compiling the integration source set and running the specs against a live catalog.amber-integrationjob.Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-8)