TAFFISH wrapper for samblaster, the fast C++ tool for marking duplicates in read-id grouped paired-end SAM streams and extracting structural-variant evidence reads.
This app packages upstream samblaster v.0.1.26 as samblaster 0.1.26-r1.
The default TAFFISH command runs the upstream samblaster executable directly,
and command mode remains enabled so the same image can also be inspected with
taf-samblaster samblaster ....
Package metadata:
name: samblaster
command: taf-samblaster
version: 0.1.26-r1
kind: tool
image: ghcr.io/taffish/samblaster:0.1.26-r1
upstream release: v.0.1.26
upstream runtime banner: samblaster: Version 0.1.26
Show the TAFFISH package version:
taf-samblaster --versionShow the upstream samblaster version:
taf-samblaster -- --version
taf-samblaster samblaster --versionMark duplicates in a read-id grouped SAM file:
taf-samblaster samblaster -i input.sam -o marked.samStream SAM through samblaster:
cat input.sam | taf-samblaster samblaster -o marked.samWrite duplicate-marked SAM to stdout:
taf-samblaster samblaster -i input.sam > marked.samExtract discordant pairs, split reads, and unmapped/clipped reads while marking duplicates:
taf-samblaster samblaster \
-i input.sam \
-o marked.sam \
-d discordant.sam \
-s splitters.sam \
-u unmapped.fqsamblaster works on SAM text, not BAM or CRAM directly. The input SAM must:
contain sequence header records
be read-id grouped, so all alignments for a QNAME are adjacent
use ordinary SAM FLAG, CIGAR, SEQ, and QUAL fields
Aligners such as bwa mem naturally produce read-id grouped SAM. Existing BAM
or coordinate-sorted files should be converted or query-name grouped outside
this app before they are passed to samblaster.
This image intentionally packages samblaster only. It does not include bwa,
samtools, genome indexes, or downstream SV callers. Use separate TAFFISH apps
or host tools for those pipeline stages.
Example with external tools:
bwa mem ref.fa r1.fq r2.fq \
| taf-samblaster samblaster \
| samtools view -Sb - > sample.bamWhen upstream bwa mem -M is used, also pass -M to samblaster:
bwa mem -M ref.fa r1.fq r2.fq \
| taf-samblaster samblaster -M \
| samtools view -Sb - > sample.bamTo extract evidence from a BAM that already has duplicate marks:
samtools view -h sample.bam \
| taf-samblaster samblaster -a -e -d sample.disc.sam -s sample.split.sam -o /dev/null-i, --input FILE read SAM from FILE instead of stdin
-o, --output FILE write SAM to FILE instead of stdout
-d, --discordantFile FILE write discordant read pairs to SAM
-s, --splitterFile FILE write split-read alignments to SAM
-u, --unmappedFile FILE write unmapped/clipped reads as FASTQ or FASTA
-a, --acceptDupMarks use duplicate flags already present in input
-e, --excludeDups omit duplicates from evidence outputs
-r, --removeDups remove duplicates from all outputs
--addMateTags add MC and MQ tags to paired-end SAM records
--ignoreUnmated allow singleton/unmated input when appropriate
-M compatibility mode for older bwa mem -M output
-h, --help print upstream help to stderr
-q, --quiet reduce upstream statistics
--version print upstream version to stderr
The TAFFISH wrapper reserves -h, --help, -v, --version, --compile, and
-- for wrapper-level behavior. For upstream option-leading calls, use -- or
explicit command mode:
taf-samblaster -- --help
taf-samblaster -- --version
taf-samblaster samblaster --helpFor ordinary analysis commands, explicit command mode is the clearest form:
taf-samblaster samblaster -i input.sam -o output.samWith command_mode=true, taf-samblaster samblaster ... runs the executable
inside the container. The default body also invokes samblaster directly, so
taf-samblaster -- -i input.sam -o output.sam is equivalent.
By default, samblaster writes all input alignments to SAM output in the same
order, marking duplicate alignments with SAM FLAG 0x400. The --removeDups
option removes duplicates instead.
Optional evidence outputs:
discordant SAM from -d/--discordantFile
split-read SAM from -s/--splitterFile
unmapped FASTQ/FASTA from -u/--unmappedFile
The unmapped/clipped file is FASTQ when QUAL values are available, otherwise FASTA. Clipped-read output requires soft clipping in the input SAM.
The image contains:
samblaster
Debian glibc and libstdc++ runtime
upstream README, supplemental PDF, and MIT license text under /opt/samblaster
samblaster is a self-contained executable. Source inspection found no runtime
calls to bwa, samtools, shells, compressors, databases, or plotting tools.
Those tools are part of user pipelines, not hidden dependencies of this app.
The Dockerfile builds from the upstream release tarball:
https://github.com/GregoryFaust/samblaster/releases/download/v.0.1.26/samblaster-v.0.1.26.tar.gz
sha256: 6b42a53d64a3ed340852028546693a24c860f236fd70e90c2b24fde9dcc4fd63
tag commit: b642639117eafedc760d8b84c0d2c4872b0da084
The binary is compiled from source inside the image and supports both declared platforms:
linux/amd64
linux/arm64
The package smoke checks are self-contained and offline. They verify:
samblaster command existence
runtime version banner: samblaster: Version 0.1.26
upstream help text
libstdc++ linkage through ldd
duplicate marking on a tiny paired-end SAM
duplicate removal with --removeDups
discordant-pair extraction with -d
split-read extraction with -s
unmapped/clipped FASTQ extraction with -u
Smoke tests validate the container, command surface, and small functional paths. They do not replace full biological validation on production alignment data.
- Project: samblaster
- Source: https://github.com/GregoryFaust/samblaster
- Packaged release: https://github.com/GregoryFaust/samblaster/releases/tag/v.0.1.26
- Upstream license: MIT
- Citation: Faust and Hall 2014, Bioinformatics
- DOI: https://doi.org/10.1093/bioinformatics/btu314
- PMID: https://pubmed.ncbi.nlm.nih.gov/24812344/
taf check
taf build
docker build -t ghcr.io/taffish/samblaster:0.1.26-r1 -f docker/Dockerfile .
taf publish --release --dry-runThe TAFFISH packaging files are licensed under Apache-2.0. Upstream samblaster is distributed under the MIT License.