Metabarcoding Course

Course Description

Welcome to the Metabarcoding Course repository! This course provides hands-on training in metabarcoding bioinformatics, focusing on the analysis of COI mitochondrial gene amplicon sequencing data for metazoan community characterization.

Metabarcoding is a powerful molecular technique that combines DNA barcoding with high-throughput sequencing to identify and quantify organisms in complex environmental samples. This course covers the complete bioinformatics workflow from raw sequencing reads to taxonomic identification and diversity analysis.

Repository Structure

MetabarcodingCourse/
├── data/                          # Raw FASTQ files (paired-end)
├── scripts/                       # Analysis scripts
│   └── pipeline_explained.sh      # Main pipeline execution script
├── tools/                         # Additional tools
├── README.md                      # This file
└── LICENSE                        # GPL-3.0 License

Software Requirements

Core Tools

The metabarcoding pipeline requires the following software:

FastQC (≥ 0.11.8) - Quality control of raw sequencing data
Cutadapt (≥ 5.2) - Adapter trimming and quality filtering
VSEARCH (≥ 2.30.1) - Sequence analysis, merging, and clustering
dnoise (≥ 1.4.2) - Denoising of amplicon sequences
SWARM (≥ 3.1.6) - Clustering algorithm for OTU generation
BLAST+ (≥ 2.17.0) - Taxonomic assignment
R (≥ 4.4.3) - Statistical analysis and MJOLNIR3 package
mumu - Post-clustering curation tool
mkLTG - Local taxonomy generator

Required R Packages

MJOLNIR3 - Main metabarcoding pipeline framework
Biostrings (≥ 2.74.0) - DNA sequence manipulation
Rcpp (≥ 1.1.0) - R/C++ interface
dplyr (≥ 1.1.4) - Data manipulation
tidyr (≥ 1.3.1) - Data tidying
stringr (≥ 1.6.0) - String operations

Taxonomy Reference Database

A taxonomy reference database is required for taxonomic assignment. Download from the provided Google Drive link.

Installation Instructions

Option 1: Using Conda/Mamba (Recommended)

Conda provides an easy way to install all required tools in an isolated environment.

Step 1: Install Miniconda or Mambaforge

Download and install Miniconda:

wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-Linux-x86_64.sh

Step 2: Create a Conda Environment

# clone the repository
git clone https://github.com/adriantich/MetabarcodingCourse.git
cd MetabarcodingCourse

# Create environment with all required tools
# conda can also be used but mamba is prefered
mamba create -n metabarcoding -c bioconda -c conda-forge python=3.11.14 \
    fastqc=0.11.8 \
    cutadapt=5.2 \
    vsearch=2.30.1 \
    dnoise=1.4.2 \
    swarm=3.1.6 \
    r-base=4.4.3 \
    bioconductor-biostrings=2.74.0 \
    r-rcpp=1.1.0 \
    r-dplyr=1.1.4 \
    r-tidyr=1.3.1 \
    r-stringr=1.6.0 \
    cxx-compiler=1.0.0

# Activate the environment
mamba activate metabarcoding

mkdir -p SOFT
cd SOFT

wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.17.0/ncbi-blast-2.17.0+-x64-linux.tar.gz
tar zxvpf ncbi-blast-2.17.0+-x64-linux.tar.gz

cp -r ncbi-blast-2.17.0+/bin/* $CONDA_PREFIX/bin/.

git clone https://github.com/meglecz/mkLTG.git

git clone https://github.com/frederic-mahe/mumu.git
cd mumu
make ; make check ; make install prefix=$CONDA_PREFIX
cd ..
# warning: if g++ not detected a broken symlink problem could be the cause
# remake the link by 
# cd $CONDA_PREFIX/bin && rm g++ && ln -s x86_64-conda-linux-gnu-g++ g++

git clone https://github.com/adriantich/MJOLNIR3.git
cd MJOLNIR3
git checkout obitools2vsearch
cd ..
R CMD INSTALL MJOLNIR3

cd ../tools
make ; make install PREFIX=$CONDA_PREFIX
cd ..

Download taxonomy reference database

Manually

https://drive.google.com/drive/folders/1-LBlUKFA-r5g6GI0sTo-7t92ml3pAS0S?usp=sharing

mamba activate metabarcoding
mamba install conda-forge::gdown
gdown --folder https://drive.google.com/drive/folders/1-LBlUKFA-r5g6GI0sTo-7t92ml3pAS0S

Verification

Test that all tools are properly installed:

# Check FastQC
fastqc --version
# Expected output: FastQC v0.11.9 (or higher)

# Check Cutadapt
cutadapt --version
# Expected output: 3.5 (or higher)

# Check VSEARCH
vsearch --version
# Expected output: vsearch v2.22.1 (or higher)

# Check BLAST+
blastn -version
# Expected output: blastn: 2.17.0+ (or higher)

# Check dnoise
dnoise --version
# Expected output: dnoise 1.4.2 (or higher)

# Check swarm
swarm --version
# Expected output: Swarm 3.1.6 (or higher)

# Check R
R --version
# Expected output: R version 4.4.3 (or higher)

# Check mumu
mumu --version
# Expected output: mumu version information

# Check MJOLNIR3 in R
R -e "library(mjolnir)"
# Expected output: No errors, package loaded successfully

Quick Start Guide

1. Activate Your Environment (if using Conda)

mamba activate metabarcoding

3. Run the Pipeline for a single sample as example

cd scripts
./pipeline_explained.sh

The pipeline will:

Perform quality control on raw sequences
Demultiplex the samples
Merge paired-end reads
Filter and trim reads based on quality
Dereplicate and remove chimeras
Denoise to obtain ESV and cluster them into OTUs
Taxonomic assignment
Post-clustering filter

Dataset Information

The data/ directory contains small demonstration datasets:

Sample format: Paired-end FASTQ files (Illumina)
Target region: 16S rRNA V4 hypervariable region
Sequencing platform: Illumina MiSeq (2x250 bp)
Purpose: Educational demonstration only

For working with your own data, replace the files in data/raw_sequences/ with your samples following the same naming convention: samplename_R1.fastq and samplename_R2.fastq.

Common Issues

Issue 1: "Command not found" errors

Solution: Ensure your conda environment is activated or tools are in your PATH
Verify installation with which toolname

Issue 2: Pipeline fails while running the software

Solution: Check that input files are in valid format
Verify sufficient disk space is available
Review software log files for specific errors

Issue 3: Results do not match the expected results

Solution: Check the number of reads at each step to detect were the problem is
Review the metadata information
Verify correct parameters adapted to your amplicon

For additional help, please contact course instructors.

Citation

If you use these materials in your research or teaching, please cite:

Metabarcoding Course Materials - 
https://github.com/adriantich/MetabarcodingCourse

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Last updated: April 2026
Course Resources: 15-17 of April 2025

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
MetaPhylo		MetaPhylo
R_Scripts		R_Scripts
data		data
mans		mans
scripts		scripts
test_one_sample		test_one_sample
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metabarcoding Course

Course Description

Repository Structure

Software Requirements

Core Tools

Required R Packages

Taxonomy Reference Database

Installation Instructions

Option 1: Using Conda/Mamba (Recommended)

Step 1: Install Miniconda or Mambaforge

Step 2: Create a Conda Environment

Download taxonomy reference database

Manually

Verification

Quick Start Guide

1. Activate Your Environment (if using Conda)

3. Run the Pipeline for a single sample as example

Dataset Information

Common Issues

Citation

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Metabarcoding Course

Course Description

Repository Structure

Software Requirements

Core Tools

Required R Packages

Taxonomy Reference Database

Installation Instructions

Option 1: Using Conda/Mamba (Recommended)

Step 1: Install Miniconda or Mambaforge

Step 2: Create a Conda Environment

Download taxonomy reference database

Manually

Verification

Quick Start Guide

1. Activate Your Environment (if using Conda)

3. Run the Pipeline for a single sample as example

Dataset Information

Common Issues

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages