TAFFISH wrapper for seqtk, Heng Li's fast and lightweight toolkit for processing FASTA and FASTQ files.
This app packages upstream seqtk v1.5 as seqtk 1.5-r1. The upstream usage
banner reports Version: 1.5-r133, matching the upstream release name
seqtk-1.5 (r133). The default TAFFISH command runs the upstream seqtk
executable directly, and command mode remains enabled for explicit
taf-seqtk seqtk ... calls.
Package metadata:
name: seqtk
command: taf-seqtk
version: 1.5-r1
kind: tool
image: ghcr.io/taffish/seqtk:1.5-r1
upstream release: v1.5
upstream runtime banner: Version: 1.5-r133
Show the TAFFISH package version:
taf-seqtk --versionShow upstream seqtk usage and runtime version:
taf-seqtk seqtkConvert FASTQ to FASTA:
taf-seqtk seqtk seq -a reads.fq.gz > reads.faReverse-complement FASTA or FASTQ:
taf-seqtk seqtk seq -r reads.fq > reads.rc.fqExtract records by name:
taf-seqtk seqtk subseq reads.fq names.txt > selected.fqExtract BED regions from FASTA:
taf-seqtk seqtk subseq ref.fa regions.bed > selected.faSample reads with a fixed seed:
taf-seqtk seqtk sample -s100 reads.fq 10000 > sampled.fqTrim FASTQ reads:
taf-seqtk seqtk trimfq reads.fq > trimmed.fqseqtk itself is a subcommand-style program:
seqtk <command> <arguments>
Because this TAFFISH app keeps command_mode=true, a first non-option argument
to taf-seqtk is interpreted as a command inside the container. Therefore,
write the upstream executable name explicitly:
taf-seqtk seqtk seq -a reads.fq > reads.fa
taf-seqtk seqtk size reads.fq
taf-seqtk seqtk fqchk reads.fqDo not use the ambiguous form:
taf-seqtk seq -a reads.fqIn that form seq may be treated as the container's seq executable rather
than the upstream seqtk seq subcommand.
seq common FASTA/FASTQ transformations
size report number of sequences and bases
comp nucleotide composition
sample subsample sequences
subseq extract sequences by name list or BED regions
fqchk FASTQ base/quality summary
mergepe interleave paired-end FASTA/FASTQ files
split split one FASTA/FASTQ into smaller files
trimfq trim FASTQ reads
gc identify high- or low-GC regions
mutfa apply point mutations to FASTA
mergefa merge two FASTA/Q files
famask apply X-coded FASTA mask
dropse drop unpaired records from interleaved PE FASTA/Q
rename rename records
cutN cut sequence at long N tracts
gap report gap locations
hpc homopolymer-compressed sequence
telo identify telomere repeats
Many subcommands print upstream help when called without enough arguments, for example:
taf-seqtk seqtk subseq
taf-seqtk seqtk sample
taf-seqtk seqtk trimfqseqtk reads FASTA and FASTQ, including gzip-compressed input through zlib. Most commands write plain text FASTA, FASTQ, BED-like, or tabular output to stdout. Use normal shell redirection to save output files.
Many commands accept - as input for stdin:
cat reads.fq.gz | taf-seqtk seqtk seq -a - > reads.faThe sample two-pass mode cannot read stdin because upstream needs to read the
input twice. Use a file path for that mode.
The image contains:
seqtk
gzip
Debian glibc, libm, and zlib runtime libraries
upstream README, NEWS, and MIT license text under /opt/seqtk
Source inspection found no runtime calls to external sequence tools, shells,
compressors, databases, models, or plotting programs. gzip input support is
implemented through zlib; the gzip command is present for convenience and
smoke fixtures but is not called by seqtk.
The Dockerfile builds from the upstream v1.5 source archive:
https://github.com/lh3/seqtk/archive/refs/tags/v1.5.tar.gz
sha256: 384aa1e3cecf4f70403839d586cbb29d469b7c6f773a64bc5af48a6e4b8220a6
tag commit: 94e707082d39b0a038f234df676e32d9802c0dc7
The binary is compiled from source inside the image and supports both declared platforms:
linux/amd64
linux/arm64
The package smoke checks are self-contained and offline. They verify:
seqtk command existence
runtime banner: Version: 1.5-r133
subcommand usage text
zlib linkage through ldd
gzip-compressed FASTQ parsing
FASTQ-to-FASTA conversion and seq -R
size and comp summaries
subseq by name list and BED
region masking with seq -M
trimfq, fqchk, and deterministic sample
hpc, gap, split, and telo -P paths
Smoke tests validate the container, command surface, and small functional paths. They do not replace full validation on production-scale sequencing data.
- Project: seqtk
- Source: https://github.com/lh3/seqtk
- Packaged release: https://github.com/lh3/seqtk/releases/tag/v1.5
- Upstream license: MIT
- Citation: Heng Li, seqtk GitHub repository
taf check
taf build
docker build -t ghcr.io/taffish/seqtk:1.5-r1 -f docker/Dockerfile .
taf publish --release --dry-runThe TAFFISH packaging files are licensed under Apache-2.0. Upstream seqtk is distributed under the MIT License.