[CATALYST-262] Stage Stitch job JAR via Unity Catalog volume#91
Merged
Conversation
Mirror the chuck-api side (CATALYST-253): when the cluster init script lives in a Unity Catalog volume, generate a timestamped JOB_JAR_VOL_PATH co-located with it, export it in spark_env_vars, and switch the Run_Stitch task's library jar from the local file:///opt/amperity/job.jar to that volume path. The chuck-api init script now copies the downloaded JAR into JOB_JAR_VOL_PATH, and Databricks' synchronous library load blocks Run_Stitch until the JAR materializes there -- no separate preflight task required. S3 init scripts (Redshift) keep the local file:// jar since there is no Unity Catalog volume to stage to.
pragyan-amp
approved these changes
May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Jira ticket: CATALYST-262
Consumer-side wire-up for the chuck-api change in CATALYST-253. When the cluster init script lives in a Unity Catalog volume, we now generate a timestamped
JOB_JAR_VOL_PATHco-located with it, export it inspark_env_vars, and switch theRun_Stitchtask's library JAR from the localfile:///opt/amperity/job.jarto that volume path. The chuck-api init script copies the downloaded JAR intoJOB_JAR_VOL_PATH, and Databricks' synchronous library load blocksRun_Stitchuntil the JAR materializes there — no separate preflight task needed.Changes
chuck_data/clients/databricks.py_generate_jar_volume_path(init_script_path)helper returning<volume-dir>/job-YYYYMMDD-HHMMSS.jar._build_librariesaccepts an optionalmain_jar_pathoverride so the Run_Stitch library jar can point at the volume path.submit_job_runpopulatesJOB_JAR_VOL_PATHinspark_env_varsand passes the volume path to_build_librarieswhen the init script is a Volumes path. S3 init scripts (Redshift) keep the localfile:///jar — there is no Unity Catalog volume to stage to.tests/unit/test_workspace_and_init_scripts.pyJOB_JAR_VOL_PATH, library jar unchanged).Testing
python -m pytest tests/unit— 1221 passed.Rollout
Standard release. The change requires the matching chuck-api init script (CATALYST-253), which is already merged.