Skip to content

taffish/samblaster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

samblaster

TAFFISH wrapper for samblaster, the fast C++ tool for marking duplicates in read-id grouped paired-end SAM streams and extracting structural-variant evidence reads.

This app packages upstream samblaster v.0.1.26 as samblaster 0.1.26-r1. The default TAFFISH command runs the upstream samblaster executable directly, and command mode remains enabled so the same image can also be inspected with taf-samblaster samblaster ....

Package metadata:

name: samblaster
command: taf-samblaster
version: 0.1.26-r1
kind: tool
image: ghcr.io/taffish/samblaster:0.1.26-r1
upstream release: v.0.1.26
upstream runtime banner: samblaster: Version 0.1.26

Quick Start

Show the TAFFISH package version:

taf-samblaster --version

Show the upstream samblaster version:

taf-samblaster -- --version
taf-samblaster samblaster --version

Mark duplicates in a read-id grouped SAM file:

taf-samblaster samblaster -i input.sam -o marked.sam

Stream SAM through samblaster:

cat input.sam | taf-samblaster samblaster -o marked.sam

Write duplicate-marked SAM to stdout:

taf-samblaster samblaster -i input.sam > marked.sam

Extract discordant pairs, split reads, and unmapped/clipped reads while marking duplicates:

taf-samblaster samblaster \
  -i input.sam \
  -o marked.sam \
  -d discordant.sam \
  -s splitters.sam \
  -u unmapped.fq

Input Contract

samblaster works on SAM text, not BAM or CRAM directly. The input SAM must:

contain sequence header records
be read-id grouped, so all alignments for a QNAME are adjacent
use ordinary SAM FLAG, CIGAR, SEQ, and QUAL fields

Aligners such as bwa mem naturally produce read-id grouped SAM. Existing BAM or coordinate-sorted files should be converted or query-name grouped outside this app before they are passed to samblaster.

Common Pipelines

This image intentionally packages samblaster only. It does not include bwa, samtools, genome indexes, or downstream SV callers. Use separate TAFFISH apps or host tools for those pipeline stages.

Example with external tools:

bwa mem ref.fa r1.fq r2.fq \
  | taf-samblaster samblaster \
  | samtools view -Sb - > sample.bam

When upstream bwa mem -M is used, also pass -M to samblaster:

bwa mem -M ref.fa r1.fq r2.fq \
  | taf-samblaster samblaster -M \
  | samtools view -Sb - > sample.bam

To extract evidence from a BAM that already has duplicate marks:

samtools view -h sample.bam \
  | taf-samblaster samblaster -a -e -d sample.disc.sam -s sample.split.sam -o /dev/null

Key Options

-i, --input FILE            read SAM from FILE instead of stdin
-o, --output FILE           write SAM to FILE instead of stdout
-d, --discordantFile FILE   write discordant read pairs to SAM
-s, --splitterFile FILE     write split-read alignments to SAM
-u, --unmappedFile FILE     write unmapped/clipped reads as FASTQ or FASTA
-a, --acceptDupMarks        use duplicate flags already present in input
-e, --excludeDups           omit duplicates from evidence outputs
-r, --removeDups            remove duplicates from all outputs
--addMateTags               add MC and MQ tags to paired-end SAM records
--ignoreUnmated             allow singleton/unmated input when appropriate
-M                          compatibility mode for older bwa mem -M output
-h, --help                  print upstream help to stderr
-q, --quiet                 reduce upstream statistics
--version                   print upstream version to stderr

Wrapper Notes

The TAFFISH wrapper reserves -h, --help, -v, --version, --compile, and -- for wrapper-level behavior. For upstream option-leading calls, use -- or explicit command mode:

taf-samblaster -- --help
taf-samblaster -- --version
taf-samblaster samblaster --help

For ordinary analysis commands, explicit command mode is the clearest form:

taf-samblaster samblaster -i input.sam -o output.sam

With command_mode=true, taf-samblaster samblaster ... runs the executable inside the container. The default body also invokes samblaster directly, so taf-samblaster -- -i input.sam -o output.sam is equivalent.

Outputs

By default, samblaster writes all input alignments to SAM output in the same order, marking duplicate alignments with SAM FLAG 0x400. The --removeDups option removes duplicates instead.

Optional evidence outputs:

discordant SAM      from -d/--discordantFile
split-read SAM      from -s/--splitterFile
unmapped FASTQ/FASTA from -u/--unmappedFile

The unmapped/clipped file is FASTQ when QUAL values are available, otherwise FASTA. Clipped-read output requires soft clipping in the input SAM.

Container Contents

The image contains:

samblaster
Debian glibc and libstdc++ runtime
upstream README, supplemental PDF, and MIT license text under /opt/samblaster

samblaster is a self-contained executable. Source inspection found no runtime calls to bwa, samtools, shells, compressors, databases, or plotting tools. Those tools are part of user pipelines, not hidden dependencies of this app.

Build Notes

The Dockerfile builds from the upstream release tarball:

https://github.com/GregoryFaust/samblaster/releases/download/v.0.1.26/samblaster-v.0.1.26.tar.gz
sha256: 6b42a53d64a3ed340852028546693a24c860f236fd70e90c2b24fde9dcc4fd63
tag commit: b642639117eafedc760d8b84c0d2c4872b0da084

The binary is compiled from source inside the image and supports both declared platforms:

linux/amd64
linux/arm64

Smoke Coverage

The package smoke checks are self-contained and offline. They verify:

samblaster command existence
runtime version banner: samblaster: Version 0.1.26
upstream help text
libstdc++ linkage through ldd
duplicate marking on a tiny paired-end SAM
duplicate removal with --removeDups
discordant-pair extraction with -d
split-read extraction with -s
unmapped/clipped FASTQ extraction with -u

Smoke tests validate the container, command surface, and small functional paths. They do not replace full biological validation on production alignment data.

Upstream

Maintainer Commands

taf check
taf build
docker build -t ghcr.io/taffish/samblaster:0.1.26-r1 -f docker/Dockerfile .
taf publish --release --dry-run

License

The TAFFISH packaging files are licensed under Apache-2.0. Upstream samblaster is distributed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors