[CATALYST-262] Stage Stitch job JAR via Unity Catalog volume by punit-naik-amp · Pull Request #91 · amperity/chuck-data

punit-naik-amp · 2026-05-18T12:33:37Z

Summary

Consumer-side wire-up for the chuck-api change in CATALYST-253. When the cluster init script lives in a Unity Catalog volume, we now generate a timestamped JOB_JAR_VOL_PATH co-located with it, export it in spark_env_vars, and switch the Run_Stitch task's library JAR from the local file:///opt/amperity/job.jar to that volume path. The chuck-api init script copies the downloaded JAR into JOB_JAR_VOL_PATH, and Databricks' synchronous library load blocks Run_Stitch until the JAR materializes there — no separate preflight task needed.

Changes

chuck_data/clients/databricks.py
- New _generate_jar_volume_path(init_script_path) helper returning <volume-dir>/job-YYYYMMDD-HHMMSS.jar.
- _build_libraries accepts an optional main_jar_path override so the Run_Stitch library jar can point at the volume path.
- submit_job_run populates JOB_JAR_VOL_PATH in spark_env_vars and passes the volume path to _build_libraries when the init script is a Volumes path. S3 init scripts (Redshift) keep the local file:/// jar — there is no Unity Catalog volume to stage to.
tests/unit/test_workspace_and_init_scripts.py
- New tests for the volumes path (timestamped jar + env var + library consistency) and the S3 path (no JOB_JAR_VOL_PATH, library jar unchanged).

Testing

python -m pytest tests/unit — 1221 passed.

Rollout

Standard release. The change requires the matching chuck-api init script (CATALYST-253), which is already merged.

Mirror the chuck-api side (CATALYST-253): when the cluster init script lives in a Unity Catalog volume, generate a timestamped JOB_JAR_VOL_PATH co-located with it, export it in spark_env_vars, and switch the Run_Stitch task's library jar from the local file:///opt/amperity/job.jar to that volume path. The chuck-api init script now copies the downloaded JAR into JOB_JAR_VOL_PATH, and Databricks' synchronous library load blocks Run_Stitch until the JAR materializes there -- no separate preflight task required. S3 init scripts (Redshift) keep the local file:// jar since there is no Unity Catalog volume to stage to.

punit-naik-amp added 2 commits May 18, 2026 18:00

Apply black formatting

30e9c05

punit-naik-amp requested a review from pragyan-amp May 18, 2026 12:36

punit-naik-amp self-assigned this May 18, 2026

pragyan-amp approved these changes May 18, 2026

View reviewed changes

punit-naik-amp merged commit 0c88774 into main May 18, 2026
2 checks passed

punit-naik-amp deleted the catalyst-262-stage-stitch-jar branch May 18, 2026 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CATALYST-262] Stage Stitch job JAR via Unity Catalog volume#91

[CATALYST-262] Stage Stitch job JAR via Unity Catalog volume#91
punit-naik-amp merged 2 commits into
mainfrom
catalyst-262-stage-stitch-jar

punit-naik-amp commented May 18, 2026 •

edited by atlassian Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

punit-naik-amp commented May 18, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Rollout

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

punit-naik-amp commented May 18, 2026 •

edited by atlassian Bot

Loading