Skip to content

[CATALYST-262] Stage Stitch job JAR via Unity Catalog volume#91

Merged
punit-naik-amp merged 2 commits into
mainfrom
catalyst-262-stage-stitch-jar
May 18, 2026
Merged

[CATALYST-262] Stage Stitch job JAR via Unity Catalog volume#91
punit-naik-amp merged 2 commits into
mainfrom
catalyst-262-stage-stitch-jar

Conversation

@punit-naik-amp
Copy link
Copy Markdown
Contributor

@punit-naik-amp punit-naik-amp commented May 18, 2026

Summary

Jira ticket: CATALYST-262

Consumer-side wire-up for the chuck-api change in CATALYST-253. When the cluster init script lives in a Unity Catalog volume, we now generate a timestamped JOB_JAR_VOL_PATH co-located with it, export it in spark_env_vars, and switch the Run_Stitch task's library JAR from the local file:///opt/amperity/job.jar to that volume path. The chuck-api init script copies the downloaded JAR into JOB_JAR_VOL_PATH, and Databricks' synchronous library load blocks Run_Stitch until the JAR materializes there — no separate preflight task needed.

Changes

  • chuck_data/clients/databricks.py
    • New _generate_jar_volume_path(init_script_path) helper returning <volume-dir>/job-YYYYMMDD-HHMMSS.jar.
    • _build_libraries accepts an optional main_jar_path override so the Run_Stitch library jar can point at the volume path.
    • submit_job_run populates JOB_JAR_VOL_PATH in spark_env_vars and passes the volume path to _build_libraries when the init script is a Volumes path. S3 init scripts (Redshift) keep the local file:/// jar — there is no Unity Catalog volume to stage to.
  • tests/unit/test_workspace_and_init_scripts.py
    • New tests for the volumes path (timestamped jar + env var + library consistency) and the S3 path (no JOB_JAR_VOL_PATH, library jar unchanged).

Testing

  • python -m pytest tests/unit — 1221 passed.

Rollout

Standard release. The change requires the matching chuck-api init script (CATALYST-253), which is already merged.

Mirror the chuck-api side (CATALYST-253): when the cluster init script
lives in a Unity Catalog volume, generate a timestamped JOB_JAR_VOL_PATH
co-located with it, export it in spark_env_vars, and switch the
Run_Stitch task's library jar from the local file:///opt/amperity/job.jar
to that volume path. The chuck-api init script now copies the downloaded
JAR into JOB_JAR_VOL_PATH, and Databricks' synchronous library load
blocks Run_Stitch until the JAR materializes there -- no separate
preflight task required.

S3 init scripts (Redshift) keep the local file:// jar since there is
no Unity Catalog volume to stage to.
@punit-naik-amp punit-naik-amp self-assigned this May 18, 2026
@punit-naik-amp punit-naik-amp merged commit 0c88774 into main May 18, 2026
2 checks passed
@punit-naik-amp punit-naik-amp deleted the catalyst-262-stage-stitch-jar branch May 18, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants