Skip to content

Draft: Update feat/dnase-2.7 with changes from main#58

Draft
jemma-nelson wants to merge 214 commits into
feat/dnase-2.7from
main
Draft

Draft: Update feat/dnase-2.7 with changes from main#58
jemma-nelson wants to merge 214 commits into
feat/dnase-2.7from
main

Conversation

@jemma-nelson
Copy link
Copy Markdown
Contributor

This is a draft PR to see all the changes that would be pulled in. Primary motivation is for commit c86a544.

jemma-nelson and others added 27 commits April 18, 2022 16:07
Logic now matches that seen in the rest of our pipeline - prefer using
the alignment's sample_name, and fall back to constructing it manually
only when necessary. This should resolve the collation issues that have
been dogging us this year.
CopyComplete.txt is a better signal that a flowcell is ready for
processing than RTAComplete.txt.
Older sequencers did not create CopyComplete.txt, I believe.
hpcz-2 was decommissioned, switching default queue for this.
We will re-enable this once we get the fastq deadline hit
Accidentally duplicated the input specification during a git merge
Now that we're regularly copying data over rather than symlinking it, it
makes sense to remove the work directory in these cases.
Actually tested this time.
Make it more obvious when and where an alignment cannot be set up
With this fix, we should set `unset LIBRARY_KIT_METHOD` correctly in our
bash scripts
Fix: alignprocess.py: library kits are optional
@jemma-nelson
Copy link
Copy Markdown
Contributor Author

There may not be a path forward on merging this, and that's fine. We would not want to introduce significant changes to the DNase pipeline, as it is frozen here for a reprocessing effort.

jemma-nelson and others added 30 commits October 21, 2024 08:56
This was the cause of those pesky "Project_Lab/Sample_LP.../"
directories that were causing us to duplicate work.
Alignprocess.py skips library pools
If this is missing, use the default analysis dir.
We don't use the output from this anymore, preferring to run the megamap
pipeline or other analyses as appropriate.
- Rename scripts/flowcells/link_nextseq.py -> rename_fastq_files.py to
  reflect that it now moves files rather than creating symlinks. Update
  all references in setup.sh and fix internal verbiage (create_links ->
  rename_files, 'symlinks' -> 'renames' in help/docstrings).

- Add a bcl_output/ subdirectory under analysis_dir as the landing zone
  for all bcl-convert raw output. Create it with chmod 700 *before*
  submitting the bcl-convert job, so permissions are set before any files
  exist. Remove the post-hoc chmod in __COPY__.

- Update all fastq_dir assignments and bcl-convert --output-dir flags to
  write into analysis_dir/bcl_output/ instead of analysis_dir/ directly.

- Update novaseq_link_command glob from 'fastq-withmask-*' to
  'bcl_output/fastq-withmask-*' and non-NovaSeq link_command -i arg from
  'fastq' to 'bcl_output/fastq'.

- Remove the rm -rf of intermediate fastq dirs from __COPY__: those
  directories retain valuable stats, reports, and logs from bcl-convert.
feat: write bcl2fastq output directly to /flowcells/, eliminating 3TB cross-filesystem rsync
Replaces the self-chaining wait_for_copycomplete.sh approach with a
scrontab job that scans for recently-completed flowcells hourly and
auto-launches setup/processing when CopyComplete.txt appears.

Fixes the silent chain-death failure mode that caused flowcell 23K3FVLT3
to miss auto-processing (SE-5098).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add scrontab-based flowcell watchdog script
BCL logs&reports should be world-readable
The c-FLOWCELL job (`c-<flowcell>` in the SLURM chain) rsyncs the
sequencer's `InterOp/` directory into the staging area. On NovaSeq X
runs this is ~20 GB containing several individual `.bin` files of
3-4 GB each. `rsync -avP` peak memory grew larger than the 1000 MiB
ReqMem, causing OOM kills.

Observed 2026-05-24 with flowcell 22YCT7LT4:
- c-22YCT7LT4 (jobid 15334889): OUT_OF_MEMORY, MaxRSS=1021536K
- sister flowcell c-22YCGCLT4 (jobid 15334891): COMPLETED, MaxRSS=1021504K

Both flowcells operated at the very edge of the 1 GiB limit; the
sister flowcell completed by 32K of headroom.

4 GiB matches the existing collate job (line 825 of setup.sh) and
gives ~4x headroom for future InterOp size growth on bigger
flowcells.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bump c-FLOWCELL copy job memory from 1000M to 4000M
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants