Draft: Update feat/dnase-2.7 with changes from main by jemma-nelson · Pull Request #58 · StamLab/stampipes

jemma-nelson · 2022-11-28T20:42:37Z

This is a draft PR to see all the changes that would be pulled in. Primary motivation is for commit c86a544.

Logic now matches that seen in the rest of our pipeline - prefer using the alignment's sample_name, and fall back to constructing it manually only when necessary. This should resolve the collation issues that have been dogging us this year.

CopyComplete.txt is a better signal that a flowcell is ready for processing than RTAComplete.txt. Older sequencers did not create CopyComplete.txt, I believe.

hpcz-2 was decommissioned, switching default queue for this.

We will re-enable this once we get the fastq deadline hit

Alt-seq

Accidentally duplicated the input specification during a git merge

Now that we're regularly copying data over rather than symlinking it, it makes sense to remove the work directory in these cases.

Actually tested this time.

Make it more obvious when and where an alignment cannot be set up

With this fix, we should set `unset LIBRARY_KIT_METHOD` correctly in our bash scripts

Fix: alignprocess.py: library kits are optional

jemma-nelson · 2022-11-28T20:53:05Z

There may not be a path forward on merging this, and that's fine. We would not want to introduce significant changes to the DNase pipeline, as it is frozen here for a reprocessing effort.

This was the cause of those pesky "Project_Lab/Sample_LP.../" directories that were causing us to duplicate work.

Alignprocess.py skips library pools

If this is missing, use the default analysis dir.

We don't use the output from this anymore, preferring to run the megamap pipeline or other analyses as appropriate.

- Rename scripts/flowcells/link_nextseq.py -> rename_fastq_files.py to reflect that it now moves files rather than creating symlinks. Update all references in setup.sh and fix internal verbiage (create_links -> rename_files, 'symlinks' -> 'renames' in help/docstrings). - Add a bcl_output/ subdirectory under analysis_dir as the landing zone for all bcl-convert raw output. Create it with chmod 700 *before* submitting the bcl-convert job, so permissions are set before any files exist. Remove the post-hoc chmod in __COPY__. - Update all fastq_dir assignments and bcl-convert --output-dir flags to write into analysis_dir/bcl_output/ instead of analysis_dir/ directly. - Update novaseq_link_command glob from 'fastq-withmask-*' to 'bcl_output/fastq-withmask-*' and non-NovaSeq link_command -i arg from 'fastq' to 'bcl_output/fastq'. - Remove the rm -rf of intermediate fastq dirs from __COPY__: those directories retain valuable stats, reports, and logs from bcl-convert.

…fastq)

feat: write bcl2fastq output directly to /flowcells/, eliminating 3TB cross-filesystem rsync

Feat/bcl convert

Replaces the self-chaining wait_for_copycomplete.sh approach with a scrontab job that scans for recently-completed flowcells hourly and auto-launches setup/processing when CopyComplete.txt appears. Fixes the silent chain-death failure mode that caused flowcell 23K3FVLT3 to miss auto-processing (SE-5098). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add scrontab-based flowcell watchdog script

BCL logs&reports should be world-readable

The c-FLOWCELL job (`c-<flowcell>` in the SLURM chain) rsyncs the sequencer's `InterOp/` directory into the staging area. On NovaSeq X runs this is ~20 GB containing several individual `.bin` files of 3-4 GB each. `rsync -avP` peak memory grew larger than the 1000 MiB ReqMem, causing OOM kills. Observed 2026-05-24 with flowcell 22YCT7LT4: - c-22YCT7LT4 (jobid 15334889): OUT_OF_MEMORY, MaxRSS=1021536K - sister flowcell c-22YCGCLT4 (jobid 15334891): COMPLETED, MaxRSS=1021504K Both flowcells operated at the very edge of the 1 GiB limit; the sister flowcell completed by 32K of headroom. 4 GiB matches the existing collate job (line 825 of setup.sh) and gives ~4x headroom for future InterOp size growth on bigger flowcells. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bump c-FLOWCELL copy job memory from 1000M to 4000M

jemma-nelson and others added 27 commits April 18, 2022 16:07

laneprocess.py uses correct SAMPLE_NAME

4052204

Logic now matches that seen in the rest of our pipeline - prefer using the alignment's sample_name, and fall back to constructing it manually only when necessary. This should resolve the collation issues that have been dogging us this year.

apply fix to right file, mark old file deprecated

621b345

Use CopyComplete.txt to start processing

58603ec

CopyComplete.txt is a better signal that a flowcell is ready for processing than RTAComplete.txt. Older sequencers did not create CopyComplete.txt, I believe.

Switch initial flowcell processing to hpcz-1

536ed63

hpcz-2 was decommissioned, switching default queue for this.

chore: update default queue names for Altius

3b99fd3

Add module for bcl2fastq - contains samplesheet generation

4c82c7c

Add test of alt-seq pipeline

ca3a64d

Refine altseq.nf and add process_altseq.bash

3858ef6

Connect altseq with LIMS

b636f55

altseq script optimizations - better caching

e1c3ae7

Altseq - version 1.0.0

27d61d6

Don't use scratch space for bcl2fastq & merge_fq

2a3b1ec

Use production LIMS instead of staging

24c9b4a

Altseq - skip running alignment for now

212a21b

We will re-enable this once we get the fastq deadline hit

Altseq - handle pools with same pool barcodes

bf322cb

setup.sh uses processing_information endpoint again

1341024

Merge pull request #56 from StamLab/alt-seq

ec1df14

Alt-seq

fix: encode_cram_no_ref now works again

cd3c06a

Accidentally duplicated the input specification during a git merge

fix for altseq setup.sh processing

d20c2e1

nextflow_clean script proceeds w/o output symlinks

c273b87

Now that we're regularly copying data over rather than symlinking it, it makes sense to remove the work directory in these cases.

!fixup c273b87 - missed a simple bug.

69ebe3c

Actually tested this time.

Improve alignprocess.py error logging

fb8bd40

Make it more obvious when and where an alignment cannot be set up

Fix alignprocess.py when library_kit_method=null

85b565f

With this fix, we should set `unset LIBRARY_KIT_METHOD` correctly in our bash scripts

fixup: Can't use f-strings in current python ver

b2cdab7

Merge pull request #57 from StamLab/fix/align_process_library_kits

c86a544

Fix: alignprocess.py: library kits are optional

Config: Add 137 to retry-with-more-mem exit codes

0248da8

fix/rna-agg: two typos in anaquin processing

4760b6d

jemma-nelson added 2 commits December 4, 2022 16:44

altseq - use better publishing strategy

4cbfafc

Add basic analysis

d70fdc3

jemma-nelson and others added 30 commits October 21, 2024 08:56

Alignprocess.py skips library pools

2638c0d

This was the cause of those pesky "Project_Lab/Sample_LP.../" directories that were causing us to duplicate work.

Merge pull request #80 from StamLab/fix/skip_lp_alignments

2d1dd24

Alignprocess.py skips library pools

aggregateprocess.py: fix typo-induced bug

857cdc8

fix: handle missing project_share_directory

30e533b

If this is missing, use the default analysis dir.

Fix setup.sh for miniseq on new cluster

8fec6d1

link_nextseq.py supports R3 & R4 fastq files

f44ddc0

Fix collate/fastq/upload for up to 4 reads

3245526

Fixup for fastqc.bash

d28784a

Disable pool processing

d6a371e

We don't use the output from this anymore, preferring to run the megamap pipeline or other analyses as appropriate.

WIP: use bcl-convert

047dc98

fix: remove unused rel_path variable in link_nextseq.py

d42d1a2

fix: replace bcl-convert with bcl2fastq in comments (branch uses bcl2…

fb3c5a7

…fastq)

fix: update GUIDEseq index-swap paths to bcl_output/fastq/

5651e75

address PR comments

1930c14

make rename_fastq_files --dry-run clearer

daa93bf

Merge pull request #82 from StamLab/feat/direct-flowcell-output

28cd283

feat: write bcl2fastq output directly to /flowcells/, eliminating 3TB cross-filesystem rsync

fix for lanes with only barcode2

a08c47e

Merge remote-tracking branch 'origin/main' into feat/bcl-convert

791bfdd

use --reverse-barcode2 for novaseq

ebf6323

use compression level 4

4596fae

Merge pull request #83 from StamLab/feat/bcl-convert

b16b927

Feat/bcl convert

Fix for upload with updated LIMS LP endpoint

9b9132f

fix sbatch headers; remove redirection

1ae439a

Merge pull request #87 from StamLab/feat/slurmcron-scheduling

dc9993f

Add scrontab-based flowcell watchdog script

BCL logs&reports should be world-readable

8f6d083

Merge pull request #90 from StamLab/fix/bcl-log-permissions

65a2754

BCL logs&reports should be world-readable

Merge pull request #91 from StamLab/fix/c-flowcell-mem-bump

897026a

Bump c-FLOWCELL copy job memory from 1000M to 4000M

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Update feat/dnase-2.7 with changes from main#58

Draft: Update feat/dnase-2.7 with changes from main#58
jemma-nelson wants to merge 214 commits into
feat/dnase-2.7from
main

jemma-nelson commented Nov 28, 2022

Uh oh!

jemma-nelson commented Nov 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jemma-nelson commented Nov 28, 2022

Uh oh!

jemma-nelson commented Nov 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants