Welcome to the Metabarcoding Course repository! This course provides hands-on training in metabarcoding bioinformatics, focusing on the analysis of COI mitochondrial gene amplicon sequencing data for metazoan community characterization.
Metabarcoding is a powerful molecular technique that combines DNA barcoding with high-throughput sequencing to identify and quantify organisms in complex environmental samples. This course covers the complete bioinformatics workflow from raw sequencing reads to taxonomic identification and diversity analysis.
MetabarcodingCourse/
├── data/ # Raw FASTQ files (paired-end)
├── scripts/ # Analysis scripts
│ └── pipeline_explained.sh # Main pipeline execution script
├── tools/ # Additional tools
├── README.md # This file
└── LICENSE # GPL-3.0 License
The metabarcoding pipeline requires the following software:
- FastQC (≥ 0.11.8) - Quality control of raw sequencing data
- Cutadapt (≥ 5.2) - Adapter trimming and quality filtering
- VSEARCH (≥ 2.30.1) - Sequence analysis, merging, and clustering
- dnoise (≥ 1.4.2) - Denoising of amplicon sequences
- SWARM (≥ 3.1.6) - Clustering algorithm for OTU generation
- BLAST+ (≥ 2.17.0) - Taxonomic assignment
- R (≥ 4.4.3) - Statistical analysis and MJOLNIR3 package
- mumu - Post-clustering curation tool
- mkLTG - Local taxonomy generator
- MJOLNIR3 - Main metabarcoding pipeline framework
- Biostrings (≥ 2.74.0) - DNA sequence manipulation
- Rcpp (≥ 1.1.0) - R/C++ interface
- dplyr (≥ 1.1.4) - Data manipulation
- tidyr (≥ 1.3.1) - Data tidying
- stringr (≥ 1.6.0) - String operations
A taxonomy reference database is required for taxonomic assignment. Download from the provided Google Drive link.
Conda provides an easy way to install all required tools in an isolated environment.
Download and install Miniconda:
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-Linux-x86_64.sh# clone the repository
git clone https://github.com/adriantich/MetabarcodingCourse.git
cd MetabarcodingCourse
# Create environment with all required tools
# conda can also be used but mamba is prefered
mamba create -n metabarcoding -c bioconda -c conda-forge python=3.11.14 \
fastqc=0.11.8 \
cutadapt=5.2 \
vsearch=2.30.1 \
dnoise=1.4.2 \
swarm=3.1.6 \
r-base=4.4.3 \
bioconductor-biostrings=2.74.0 \
r-rcpp=1.1.0 \
r-dplyr=1.1.4 \
r-tidyr=1.3.1 \
r-stringr=1.6.0 \
cxx-compiler=1.0.0
# Activate the environment
mamba activate metabarcoding
mkdir -p SOFT
cd SOFT
wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.17.0/ncbi-blast-2.17.0+-x64-linux.tar.gz
tar zxvpf ncbi-blast-2.17.0+-x64-linux.tar.gz
cp -r ncbi-blast-2.17.0+/bin/* $CONDA_PREFIX/bin/.
git clone https://github.com/meglecz/mkLTG.git
git clone https://github.com/frederic-mahe/mumu.git
cd mumu
make ; make check ; make install prefix=$CONDA_PREFIX
cd ..
# warning: if g++ not detected a broken symlink problem could be the cause
# remake the link by
# cd $CONDA_PREFIX/bin && rm g++ && ln -s x86_64-conda-linux-gnu-g++ g++
git clone https://github.com/adriantich/MJOLNIR3.git
cd MJOLNIR3
git checkout obitools2vsearch
cd ..
R CMD INSTALL MJOLNIR3
cd ../tools
make ; make install PREFIX=$CONDA_PREFIX
cd ..
https://drive.google.com/drive/folders/1-LBlUKFA-r5g6GI0sTo-7t92ml3pAS0S?usp=sharing
mamba activate metabarcoding
mamba install conda-forge::gdown
gdown --folder https://drive.google.com/drive/folders/1-LBlUKFA-r5g6GI0sTo-7t92ml3pAS0STest that all tools are properly installed:
# Check FastQC
fastqc --version
# Expected output: FastQC v0.11.9 (or higher)
# Check Cutadapt
cutadapt --version
# Expected output: 3.5 (or higher)
# Check VSEARCH
vsearch --version
# Expected output: vsearch v2.22.1 (or higher)
# Check BLAST+
blastn -version
# Expected output: blastn: 2.17.0+ (or higher)
# Check dnoise
dnoise --version
# Expected output: dnoise 1.4.2 (or higher)
# Check swarm
swarm --version
# Expected output: Swarm 3.1.6 (or higher)
# Check R
R --version
# Expected output: R version 4.4.3 (or higher)
# Check mumu
mumu --version
# Expected output: mumu version information
# Check MJOLNIR3 in R
R -e "library(mjolnir)"
# Expected output: No errors, package loaded successfully
mamba activate metabarcodingcd scripts
./pipeline_explained.shThe pipeline will:
- Perform quality control on raw sequences
- Demultiplex the samples
- Merge paired-end reads
- Filter and trim reads based on quality
- Dereplicate and remove chimeras
- Denoise to obtain ESV and cluster them into OTUs
- Taxonomic assignment
- Post-clustering filter
The data/ directory contains small demonstration datasets:
- Sample format: Paired-end FASTQ files (Illumina)
- Target region: 16S rRNA V4 hypervariable region
- Sequencing platform: Illumina MiSeq (2x250 bp)
- Purpose: Educational demonstration only
For working with your own data, replace the files in data/raw_sequences/ with your samples following the same naming convention: samplename_R1.fastq and samplename_R2.fastq.
Issue 1: "Command not found" errors
- Solution: Ensure your conda environment is activated or tools are in your PATH
- Verify installation with
which toolname
Issue 2: Pipeline fails while running the software
- Solution: Check that input files are in valid format
- Verify sufficient disk space is available
- Review software log files for specific errors
Issue 3: Results do not match the expected results
- Solution: Check the number of reads at each step to detect were the problem is
- Review the metadata information
- Verify correct parameters adapted to your amplicon
For additional help, please contact course instructors.
If you use these materials in your research or teaching, please cite:
Metabarcoding Course Materials -
https://github.com/adriantich/MetabarcodingCourse
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Last updated: April 2026
Course Resources: 15-17 of April 2025