This project uses Pixi to manage environments and dependencies.
Check whether Pixi is already available:
pixi --versionIf the command is not found, install Pixi:
curl -fsSL https://pixi.sh/install.sh | bashAfter installation, restart your terminal (or reload your shell config) so pixi is on PATH.
From the project root (where pixi.toml is located), run:
pixi shellPixi will create/sync the environment and open a shell with all project dependencies.
Examples:
pixi run python --version
pixi run python benchmark_modalities.pyFor experiments with calibrated quality scores, the audio quality scorer expects a local PAM repository at:
./PAMFrom the project root, run:
git clone https://github.com/soham97/PAM.git PAM
pixi run pip install -r PAM/requirements.txtpixi run python -c "from PAM import PAM; print('PAM import OK')"Use benchmark_modalities.py to run the base Qwen model without Quality Aware Attention.
Example:
pixi run python benchmark_modalities.py \
--dataset meld \
--classification-task emotion \
--split test \
--modalities text,audio,video \
--batch-size 4Useful options:
--noisy-modalities audio,image,text,videoto load noisy variants for selected modalities--noise-severity <S>to filter noisy variants to a specific severity--stratified-samples <N>,--total-samples <N>,--start-at-sample <idx>to control evaluation size--qwen-model-id <hf-model-id>to switch checkpoints (default:Qwen/Qwen2.5-Omni-7B)
Outputs are written to out/... (predictions and error rows), unless overridden with --out-path and --out-error-path.
Use benchmark_scored_modalities.py to run the model with Quality Aware Attention (QAA), where modality quality scores are used to scale first-layer attention.
Example (calibrated quality scores):
pixi run python benchmark_scored_modalities.py \
--dataset imdb \
--split test \
--modalities text,audio,video \
--quality-calibration \
--batch-size 4Quality scoring modes:
- Add
--qwen-qualityto estimate quality scores with Qwen (cannot be combined with--quality-calibration) - Add
--quality-calibrationto use ecdf calibrated quality scores (cannot be combined with--qwen-quality)
Extra QAA controls:
--qaa-normalization-mode global|exclude_unscaleddepending on whether quality scores are normalized across all samples or only among the scaled modalities for each sample--force-quality-scores-oneor--force-modality-quality-scores text=0.2,audio=0.9--quality-placebo-random --quality-placebo-random-seed <seed>for placebo runs
This script writes:
- prediction CSV (
--out-path) - error CSV (
--out-error-path) - per-sample quality score CSV (
--quality-score-out-path)