TAFFISH app for SOAPnuke, BGI's integrated quality control and preprocessing tool for high-throughput sequencing data.
This package builds upstream SOAPnuke 2.1.9 from source and exposes the
upstream SOAPnuke executable through the versioned taf-soapnuke command.
The TAFFISH package name is lowercase soapnuke; the upstream project and
primary binary keep the original SOAPnuke capitalization.
Install from the public TAFFISH Hub index:
taf update
taf install soapnukeInstall the exact release:
taf install soapnuke 2.1.9-r1For local testing before the app is published to the public index:
taf install --from .Show TAFFISH app help:
taf-soapnuke --helpShow the TAFFISH package version:
taf-soapnuke --versionShow the upstream SOAPnuke version:
taf-soapnuke SOAPnuke -v
taf-soapnuke soapnuke -v
taf-soapnuke -- -vShow upstream module help:
taf-soapnuke SOAPnuke filter -h
taf-soapnuke SOAPnuke filterHts -h
taf-soapnuke SOAPnuke filterStLFR -h
taf-soapnuke SOAPnuke filtersRNA -h
taf-soapnuke SOAPnuke filterMeta -hRun paired-end FASTQ filtering:
taf-soapnuke SOAPnuke filter \
-1 reads_1.fq.gz \
-2 reads_2.fq.gz \
-C clean_1.fq.gz \
-D clean_2.fq.gz \
-o soapnuke_out \
-T 8 \
-l 5 \
-q 0.5 \
-n 0.05 \
-4 30Run single-end FASTQ filtering:
taf-soapnuke SOAPnuke filter \
-1 reads.fq.gz \
-C clean.fq.gz \
-o soapnuke_out \
-T 4 \
-l 5 \
-q 0.5 \
-n 0.05 \
-4 30Run BAM/CRAM preprocessing through the htslib-enabled module:
taf-soapnuke SOAPnuke filterHts \
-1 input.bam \
-2 clean.bam \
-o soapnuke_hts_out \
-T 4 \
-l 5 \
-q 0.5 \
-n 0.05 \
-4 30For CRAM input or CRAM output, pass the reference with -E as described by
upstream help:
taf-soapnuke SOAPnuke filterHts \
-E reference.fa \
-1 input.cram \
-2 clean.cram \
-o soapnuke_hts_outRun the upstream plotting scripts after SOAPnuke writes statistics files:
taf-soapnuke Rscript /opt/soapnuke/Rscripts/Q20Q30.R \
soapnuke_out/Distribution_of_Q20_Q30_bases_by_read_position_1.txt \
soapnuke_out/Distribution_of_Q20_Q30_bases_by_read_position_2.txt \
q20_q30.pngThis is a normal TAFFISH tool app with command_mode = true.
SOAPnuke itself uses module names such as filter, filterHts,
filterStLFR, filtersRNA, and filterMeta. Prefer the explicit executable
form:
taf-soapnuke SOAPnuke filter ...
taf-soapnuke SOAPnuke filterHts ...Do not rely on:
taf-soapnuke filter ...In command mode, a non-option first argument can be interpreted as a container
executable name rather than as a SOAPnuke module. The explicit
taf-soapnuke SOAPnuke ... form is unambiguous.
For option-leading arguments to the default upstream command, this also works:
taf-soapnuke -- -v
taf-soapnuke -- filter -hThe container also provides a lowercase soapnuke symlink to the same upstream
binary. It is a convenience alias, not a separate implementation.
name: soapnuke
command: taf-soapnuke
version: 2.1.9-r1
kind: tool
image: ghcr.io/taffish/soapnuke:2.1.9-r1
upstream: BGI-flexlab/SOAPnuke
upstream tag: SOAPnuke2.1.9
The container image is built from docker/Dockerfile. It starts from
debian:12-slim, downloads the official upstream SOAPnuke2.1.9 source
archive from GitHub, verifies the SHA256 checksum, and builds the binary with
USEHTS=true.
The image includes these user-facing commands:
SOAPnuke
soapnuke
samtools
Rscript
SOAPnuke is the upstream executable. soapnuke is a lowercase symlink for
typing convenience. samtools and the htslib runtime are included for the
filterHts BAM/CRAM module and for small local BAM/CRAM checks. Rscript is
included so the upstream plotting scripts under /opt/soapnuke/Rscripts can
be used directly.
The runtime also contains the upstream README, ChangeLog, and GPLv3 license
text under /opt/soapnuke/share.
This app intentionally does not bundle reference genomes, adapter databases, contaminant databases, sequencing platform presets beyond upstream defaults, or downstream workflow orchestration. SOAPnuke's own modules and options are available as-is through the upstream command.
The image is built and validated for:
linux/amd64
linux/arm64
Packaged upstream modules:
filter normal FASTQ preprocessing
filterHts BAM/CRAM preprocessing, built with htslib support
filterStLFR stLFR FASTQ preprocessing
filtersRNA small RNA FASTQ preprocessing
filterMeta metagenomic FASTQ preprocessing
The upstream -v option prints the runtime version but exits non-zero. Smoke
tests and examples capture the version text through grep; this is upstream
behavior, not a TAFFISH wrapper change.
FASTQ inputs may be plain text or gzip-compressed. Paired-end inputs must use
matching compression formats, and paired clean outputs must both be .gz or
both be plain text, matching upstream validation.
The smoke tests are independent and run without network access. They check:
- command presence for
SOAPnuke, lowercasesoapnuke,samtools,Rscript,gawk,gzip,ldd, andsh - upstream runtime version
2.1.9 - help banners for
filter,filterHts,filterStLFR,filtersRNA, andfilterMeta - dynamic linkage to htslib and zlib
- a tiny paired-end
filterrun that writes clean FASTQ plus statistics - tiny
filtersRNAandfilterMetaFASTQ runs, including gzip input/output - a tiny
filterHtsBAM run generated with bundledsamtools - availability of all three upstream R plotting scripts
These tests verify packaging, command availability, and small real execution paths. They are not a substitute for biological validation on production read sets, platform-specific adapter schemes, or full downstream QC review.
- Upstream repository: https://github.com/BGI-flexlab/SOAPnuke
- Release: https://github.com/BGI-flexlab/SOAPnuke/releases/tag/SOAPnuke2.1.9
- Upstream license: GPL-3.0-only
- Citation: Chen et al. 2018, Gigascience
- DOI:
10.1093/gigascience/gix120 - PMID:
29220494
Useful checks before publishing:
taf check
docker build -t ghcr.io/taffish/soapnuke:2.1.9-r1 -f docker/Dockerfile .
taf build
TAFFISH_CONTAINER_BACKEND=docker target/taf-soapnuke-v2.1.9-r1 -- -v
TAFFISH_CONTAINER_BACKEND=docker target/taf-soapnuke-v2.1.9-r1 SOAPnuke filter -h
taf publish --dry-run