CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Repository Overview

FireRedASR is an industrial-grade Automatic Speech Recognition system specializing in Chinese (Mandarin and dialects) and English. It provides two model variants:

FireRedASR-AED (1.1B params): Attention-based Encoder-Decoder for balanced performance
FireRedASR-LLM (8.3B params): Encoder-Adapter-LLM framework for SOTA performance

语言设置 (Language Settings)

重要: 在这个项目中工作时，Claude 应该默认使用中文回复，即使用户使用英文提问。技术术语和代码相关内容应同时提供中文和英文（英文在括号内）。

Essential Commands

Environment Setup

# Create conda environment
conda create --name fireredasr python=3.10
conda activate fireredasr
pip install -r requirements.txt

# Set paths (required for CLI tools)
export PATH=$PWD/fireredasr/:$PWD/fireredasr/utils/:$PATH
export PYTHONPATH=$PWD/:$PYTHONPATH

Audio Preprocessing

# Convert to required format (16kHz, mono, PCM WAV)
ffmpeg -i input_audio -ar 16000 -ac 1 -acodec pcm_s16le -f wav output.wav

# Batch conversion
for file in data/raw_input/*.mp3; do
    ffmpeg -i "$file" -ar 16000 -ac 1 -acodec pcm_s16le -f wav "data/formated_input/$(basename "$file" .mp3).wav"
done

Running Inference

# Using example scripts
bash examples/inference_fireredasr_aed.sh
bash examples/inference_fireredasr_llm.sh

# Direct CLI usage
speech2text.py --wav_path examples/wav/BAC009S0764W0121.wav \
    --asr_type "aed" \
    --model_dir pretrained_models/FireRedASR-AED-L

Evaluation

# Calculate Word Error Rate
wer.py --print_sentence_wer 1 --do_tn 0 --rm_special 0 \
    --ref reference.txt --hyp hypothesis.txt

High-Level Architecture

Core Components

Model Layer (fireredasr/models/)
- fireredasr.py: Factory pattern for model instantiation
- fireredasr_aed.py: AED architecture with Conformer encoder + Transformer decoder
- fireredasr_llm.py: LLM architecture integrating Qwen2-7B-Instruct
- module/: Neural network building blocks (attention, convolution, transformers)
Data Processing (fireredasr/data/)
- Feature extraction with Kaldi's fbank
- CMVN normalization
- Batch collation with padding
Tokenization (fireredasr/tokenizer/)
- Character-level tokenization for Chinese
- BPE tokenization support
- Special token handling for both AED and LLM variants
CLI Interface (fireredasr/speech2text.py)
- Unified entry point for all ASR operations
- Supports multiple input formats (single file, batch, directory, scp)
- VAD-based splitting for long audio (>30s/60s)

Model Loading Flow

Load YAML config from pretrained model directory
Initialize model architecture based on config
Load pretrained weights from checkpoint
Apply PEFT adapters if using LLM variant
Set up tokenizer with vocabulary

Audio Processing Pipeline

Raw audio → 16kHz PCM conversion (ffmpeg)
VAD segmentation for long audio (vad_split.py)
Feature extraction (80-dim fbank features)
Model inference with beam search
Optional LLM refinement (refine_asr_output/)

Key Development Considerations

No formal test framework: Testing done via example scripts and WER evaluation
GPU required: Models need CUDA-enabled GPU for reasonable performance
Memory requirements: AED needs ~8GB VRAM, LLM needs ~32GB VRAM
Audio limitations: Max 60s (AED) or 30s (LLM) per segment
Dependencies: Requires PyTorch ≥2.0.0, Transformers ≥4.46.3, Kaldi tools

Common Tasks

Adding New Model Variant

Create new model class in fireredasr/models/
Register in factory function in fireredasr.py
Add corresponding tokenizer support
Update CLI arguments in speech2text.py

Processing Long Audio Files

Use VAD splitting: vad_split.py --input long_audio.wav --output_dir segments/
Process segments individually
Concatenate results

Fine-tuning Models

Use PEFT/LoRA for parameter-efficient fine-tuning
Modify adapter configurations in model configs
Leverage existing training scripts (if available in future updates)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Repository Overview

语言设置 (Language Settings)

Essential Commands

Environment Setup

Audio Preprocessing

Running Inference

Evaluation

High-Level Architecture

Core Components

Model Loading Flow

Audio Processing Pipeline

Key Development Considerations

Common Tasks

Adding New Model Variant

Processing Long Audio Files

Fine-tuning Models

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Repository Overview

语言设置 (Language Settings)

Essential Commands

Environment Setup

Audio Preprocessing

Running Inference

Evaluation

High-Level Architecture

Core Components

Model Loading Flow

Audio Processing Pipeline

Key Development Considerations

Common Tasks

Adding New Model Variant

Processing Long Audio Files

Fine-tuning Models