Skip to content

taffish/seqtk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

seqtk

TAFFISH wrapper for seqtk, Heng Li's fast and lightweight toolkit for processing FASTA and FASTQ files.

This app packages upstream seqtk v1.5 as seqtk 1.5-r1. The upstream usage banner reports Version: 1.5-r133, matching the upstream release name seqtk-1.5 (r133). The default TAFFISH command runs the upstream seqtk executable directly, and command mode remains enabled for explicit taf-seqtk seqtk ... calls.

Package metadata:

name: seqtk
command: taf-seqtk
version: 1.5-r1
kind: tool
image: ghcr.io/taffish/seqtk:1.5-r1
upstream release: v1.5
upstream runtime banner: Version: 1.5-r133

Quick Start

Show the TAFFISH package version:

taf-seqtk --version

Show upstream seqtk usage and runtime version:

taf-seqtk seqtk

Convert FASTQ to FASTA:

taf-seqtk seqtk seq -a reads.fq.gz > reads.fa

Reverse-complement FASTA or FASTQ:

taf-seqtk seqtk seq -r reads.fq > reads.rc.fq

Extract records by name:

taf-seqtk seqtk subseq reads.fq names.txt > selected.fq

Extract BED regions from FASTA:

taf-seqtk seqtk subseq ref.fa regions.bed > selected.fa

Sample reads with a fixed seed:

taf-seqtk seqtk sample -s100 reads.fq 10000 > sampled.fq

Trim FASTQ reads:

taf-seqtk seqtk trimfq reads.fq > trimmed.fq

Command Mode

seqtk itself is a subcommand-style program:

seqtk <command> <arguments>

Because this TAFFISH app keeps command_mode=true, a first non-option argument to taf-seqtk is interpreted as a command inside the container. Therefore, write the upstream executable name explicitly:

taf-seqtk seqtk seq -a reads.fq > reads.fa
taf-seqtk seqtk size reads.fq
taf-seqtk seqtk fqchk reads.fq

Do not use the ambiguous form:

taf-seqtk seq -a reads.fq

In that form seq may be treated as the container's seq executable rather than the upstream seqtk seq subcommand.

Common Subcommands

seq       common FASTA/FASTQ transformations
size      report number of sequences and bases
comp      nucleotide composition
sample    subsample sequences
subseq    extract sequences by name list or BED regions
fqchk     FASTQ base/quality summary
mergepe   interleave paired-end FASTA/FASTQ files
split     split one FASTA/FASTQ into smaller files
trimfq    trim FASTQ reads
gc        identify high- or low-GC regions
mutfa     apply point mutations to FASTA
mergefa   merge two FASTA/Q files
famask    apply X-coded FASTA mask
dropse    drop unpaired records from interleaved PE FASTA/Q
rename    rename records
cutN      cut sequence at long N tracts
gap       report gap locations
hpc       homopolymer-compressed sequence
telo      identify telomere repeats

Many subcommands print upstream help when called without enough arguments, for example:

taf-seqtk seqtk subseq
taf-seqtk seqtk sample
taf-seqtk seqtk trimfq

Input And Output

seqtk reads FASTA and FASTQ, including gzip-compressed input through zlib. Most commands write plain text FASTA, FASTQ, BED-like, or tabular output to stdout. Use normal shell redirection to save output files.

Many commands accept - as input for stdin:

cat reads.fq.gz | taf-seqtk seqtk seq -a - > reads.fa

The sample two-pass mode cannot read stdin because upstream needs to read the input twice. Use a file path for that mode.

Container Contents

The image contains:

seqtk
gzip
Debian glibc, libm, and zlib runtime libraries
upstream README, NEWS, and MIT license text under /opt/seqtk

Source inspection found no runtime calls to external sequence tools, shells, compressors, databases, models, or plotting programs. gzip input support is implemented through zlib; the gzip command is present for convenience and smoke fixtures but is not called by seqtk.

Build Notes

The Dockerfile builds from the upstream v1.5 source archive:

https://github.com/lh3/seqtk/archive/refs/tags/v1.5.tar.gz
sha256: 384aa1e3cecf4f70403839d586cbb29d469b7c6f773a64bc5af48a6e4b8220a6
tag commit: 94e707082d39b0a038f234df676e32d9802c0dc7

The binary is compiled from source inside the image and supports both declared platforms:

linux/amd64
linux/arm64

Smoke Coverage

The package smoke checks are self-contained and offline. They verify:

seqtk command existence
runtime banner: Version: 1.5-r133
subcommand usage text
zlib linkage through ldd
gzip-compressed FASTQ parsing
FASTQ-to-FASTA conversion and seq -R
size and comp summaries
subseq by name list and BED
region masking with seq -M
trimfq, fqchk, and deterministic sample
hpc, gap, split, and telo -P paths

Smoke tests validate the container, command surface, and small functional paths. They do not replace full validation on production-scale sequencing data.

Upstream

Maintainer Commands

taf check
taf build
docker build -t ghcr.io/taffish/seqtk:1.5-r1 -f docker/Dockerfile .
taf publish --release --dry-run

License

The TAFFISH packaging files are licensed under Apache-2.0. Upstream seqtk is distributed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors