Skip to content

use per-job tmp dirs to avoid cross-job interference#38

Open
dzautner wants to merge 1 commit into
masterfrom
fix/per-job-tmp-dirs
Open

use per-job tmp dirs to avoid cross-job interference#38
dzautner wants to merge 1 commit into
masterfrom
fix/per-job-tmp-dirs

Conversation

@dzautner
Copy link
Copy Markdown
Contributor

The template uses hardcoded /tmp/pip-packages and /tmp/lm-eval paths. When multiple SLURM jobs land on the same node they share /tmp, so one job can delete another's dependencies mid-run.

Fix: suffix both paths with $SLURM_JOB_ID and clean up only own dirs at job end.

Tested on LUMI and works as expected.

@avirtane-amd
Copy link
Copy Markdown

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants