DeBERTa Text Emotion Classification (MELD + Synthetic Data)

This project fine-tunes microsoft/deberta-v3-large for utterance-level emotion classification on the MELD dataset and several synthetic / augmented variants.

It includes:

a training / evaluation pipeline built with PyTorch Lightning
a set of data-generation scripts using Qwen to synthesize extra MELD-style utterances
hyperparameter sweeps driven by Weights & Biases (W&B)
scripts to evaluate checkpoints independently with scikit-learn and to export prediction files

1. Project layout

From the repo root (the folder containing config.py and train.py):

.
├── config.py                     # Central config + label mapping
├── data_module.py                # LightningDataModule for text classification
├── model.py                      # LightningModule (DeBERTa-v3-large classifier)
├── train.py                      # CLI entry point for training / testing
├── eval_dev.py                   # Independent evaluation / prediction script
├── slurm_sweep.sh                # Slurm wrapper for W&B sweeps (HPC)
├── sweep_deberta.yaml            # Base sweep config (MELD train split)
├── sweep_deberta_phase2.yaml     # Sweep on synthetic-only training data
├── sweep_deberta_phase3.yaml     # Sweeps on augmented splits (250/500/1000)
├── run-sweep-dataset-explanation.txt
├── requirements.txt              # Python dependencies
├── poetry.lock                   # Frozen dependency versions (optional)
├── ensemble/
│   ├── ensemble2.py
│   ├── ensemble.sh
├── datasets/
│   ├── train_sent_emo.csv
│   ├── dev_sent_emo.csv
│   ├── test_sent_emo.csv
│   ├── synth_train_sent_emo.csv
│   ├── synth_train_augmented.csv
│   ├── synth_train_hundo_p_take3_dedup.csv
│   ├── synth_train_hundo_p_take3_double_dedup.csv
│   ├── synth_train_triple_filtered.csv
│   ├── train_emotions_aug_250.csv
│   ├── train_emotions_aug_500.csv
│   └── train_emotions_aug_1000.csv
└── synthetic_gen/
    └── MELD_DATA/
        ├── synth_data.py                 # Generate synthetic utterances with Qwen
        ├── synth_data_fix.py             # Clean / patch first synthetic pass
        ├── synth_data_fix_2.py           # Further cleaning / schema fixes
        ├── dedupe_synth_tfidf.py         # Remove near-duplicate synthetic rows
        ├── dedupe_cross_class_tfidf.py   # Cross-class deduping
        ├── augment_train_with_synth.py   # Mix real + synthetic for new splits
        ├── emotionCounter.py             # Class frequency summaries
        ├── meld_synth.sh / *_big_gpu*.sh # Slurm helpers for large Qwen jobs
        └── README.md (optional, if you add one later)

2. Environment setup

You can run everything either locally or on an HPC node. The basic recipe is the same: create a venv, activate it, and install from requirements.txt.

2.1. Create and activate a venv

From the repo root:

python3 -m venv .ftmb           # or any name you like
source .ftmb/bin/activate

On some clusters you may need to load a Python module first, e.g.:

module load python/3.11.6   # example; use whatever your cluster provides

2.2. Install Python dependencies

With the venv active:

pip install --upgrade pip
pip install -r requirements.txt

The key libraries are:

torch / torchvision
pytorch-lightning
transformers
datasets
pandas
scikit-learn
wandb
matplotlib, tqdm

If you prefer Poetry, the pinned environment is in poetry.lock and you can recreate it with:

poetry install

(but the course instructions only require that requirements.txt works.)

2.3. Weights & Biases (W&B)

Sweeps require a W&B API key:

Go google how to do that or something, idk.

3. Data: input formats and derived splits

All training and evaluation runs use CSV files under datasets/.

3.1. Core MELD splits

The original MELD data has been split into three CSVs:

train_sent_emo.csv
dev_sent_emo.csv
test_sent_emo.csv

Each of these has at least the following columns:

Utterance: the input text (one utterance per row)
Emotion: the gold label, one of
anger, disgust, fear, joy, neutral sadness, surprise

Additional columns like Dialogue_ID, Utterance_ID, Speaker may be present but are ignored by the model.

3.2. Synthetic and augmented splits

Synthetic examples are generated by Qwen and written to:

synth_train_sent_emo.csv (raw synthetic data)
various deduplicated versions:
- synth_train_hundo_p_take3_dedup.csv
- synth_train_hundo_p_take3_double_dedup.csv
- synth_train_triple_filtered.csv

Final, balanced(ish) training sets used for sweeps:

train_emotions_aug_250.csv - +250 synthetic examples per class
train_emotions_aug_500.csv - +500 per class (used in the Athena sweep)
train_emotions_aug_1000.csv - +1000 per class (also used in Athena)

See run-sweep-dataset-explanation.txt for the full narrative of which sweep uses which file.

More information about TF-IDF Vectorization and deduplication based on in-class and cross-class similarity can be found in the dataset-level README located at /ADV_NLP_PROJECT/ADV_NLP_PROJECT/MELD_DATA/README.md

All of these use the same schema as the MELD splits: Utterance and Emotion are the only required columns.

4. Running the training pipeline

4.1. CLI arguments

train.py is the main entry point. It accepts paths and hyperparameters via argparse (and defaults are wired up in config.py). The most important arguments:

--train_path, --val_path, --test_path
--text_col, --label_col (default to Utterance and Emotion)
--tokenizer_name (default: microsoft/deberta-v3-large)
--max_epochs, --batch_size, --max_lr, --weight_decay, --dropout
--label_smoothing, --flooding_val
--accelerator, --devices, --precision, --num_workers

The model itself is defined in model.py (BertClassifier), and data_module.py defines TextClassificationDataModule.

4.2. Single run (no sweep)

From the repo root, with venv active and a GPU available:

python -u train.py \
  --train_path datasets/train_emotions_aug_500.csv \
  --val_path   datasets/dev_sent_emo.csv \
  --test_path  datasets/test_sent_emo.csv \
  --text_col Utterance \
  --label_col Emotion \
  --tokenizer_name microsoft/deberta-v3-large \
  --accelerator gpu \
  --devices 1 \
  --precision bf16-mixed \
  --num_workers 1 \
  --batch_size 32 \
  --max_epochs 30

This will:

Fine-tune DeBERTa-v3-large on the selected training set.
Run validation and report metrics (especially val_f1_macro).
Run a final test pass and log test_acc and test_loss.
Save:
- best checkpoint (by val_f1_macro) to
  checkpoints/run_<id-date>/best.ckpt
- test predictions and probabilities under
  preds/run_<id-date>/.

early_stopping and ModelCheckpoint are configured in train.py using val_f1_macro as the monitor.

However, this is CRAZY for most people with regular-type GPUs. For training on a cluster, see section `5.2. Running the sweep on an HPC Node`

5. Hyperparameter sweeps (W&B + Slurm)

Sweeps are configured as YAML files:

sweep_deberta.yaml - original MELD train split
sweep_deberta_phase2.yaml - sweeps on synthetic-only data
sweep_deberta_phase3.yaml - sweeps on the train_emotions_aug_* splits

Each sweep file:

specifies the metric (val_f1_macro, goal maximize)
defines search spaces for max_lr, dropout, weight_decay, label_smoothing, flooding_val, max_epochs, etc.
pins dataset paths via command: ... --train_path ... --val_path ...

5.1. Creating a sweep

From the repo root, with W&B logged in:

wandb sweep sweep_deberta_phase3.yaml

W&B will print a sweep ID of the form:

tar-xvf/modernbert-meld/u1l2r0v1

5.2. Running the sweep on an HPC node

Use slurm_sweep.sh to launch one or more agents. You do not need to copy or modify Python files; Slurm just runs whatever is in the repo at submit time.

Example: start 40 runs on Athena/Hickory (one GPU):

sbatch slurm_sweep.sh tar-xvf/modernbert-meld/u1l2r0v1 40

NOTE: replace {u1l2r0v1} with whatever sweep id is returned by wandb sweep sweep_deberta_phase3.yaml

slurm_sweep.sh:

reserves a single GPU and some RAM
sets cache directories (HF_HOME, WANDB_DIR, etc.) on node-local scratch
runs wandb agent --count <COUNT> <SWEEP_ID>

You can monitor jobs with:

squeue -u $USER

and cancel with:

scancel <JOBID>

6. Independent evaluation and F1 sanity checks

To double-check metrics, use eval_dev.py. It loads a checkpoint, runs it on a CSV split, and recomputes metrics with scikit-learn.

Basic usage:

python -u eval_dev.py \
  --ckpt_path checkpoints/run_1651577-20251205-120956/best.ckpt \
  --data_path datasets/dev_sent_emo.csv \
  --text_col Utterance \
  --label_col Emotion \
  --tokenizer_name microsoft/deberta-v3-large \
  --batch_size 32 \
  --out_prefix eval_dev_run_1651577

Outputs:

eval_dev_run_1651577_metrics.json - per-class precision/recall/F1, macro/micro averages, confusion matrix, etc. (all via scikit-learn)
eval_dev_run_1651577_predictions.csv - one row per utterance with:
- original text
- gold label
- predicted label
- predicted probability for each class

Inside model.py, the Lightning module collects preds and labels as:

preds: integer class IDs (argmax over softmax probabilities)
labels: gold class IDs from the dataset

Both are fed directly into a clean torchmetrics F1 (macro) during training, and eval_dev.py recomputes F1 from scratch using sklearn.metrics.f1_score for independent verification.

7. Data-generation scripts (synthetic_gen/MELD_DATA)

The ADV_NLP_PROJ/MELD_DATA folder documents how the synthetic splits were created. Very high-level overview:

Generate synthetic utterances
synth_data_fix_2.py calls a Qwen model to generate new utterances conditioned on existing MELD context and labels, writing them to synth_train_sent_emo.csv.
Deduplicate
- dedupe_synth_tfidf.py removes near-duplicate synthetic rows within the same class using TF-IDF + cosine similarity.
- dedupe_cross_class_tfidf.py removes cross-class near-duplicates (same text but different label).
Augment MELD training set
augment_train_with_synth.py merges the cleaned synthetic examples with train_sent_emo.csv to build balanced training sets with +250, +500, or +1000 synthetic examples per class. These are written to:
- train_emotions_aug_250.csv
- train_emotions_aug_500.csv
- train_emotions_aug_1000.csv
Utility scripts
emotionCounter.py prints per-class counts for any CSV so you can verify that the balancing worked as intended (or just like, check out what the counts are if you're curious).

8. Logs and artifacts

Training / sweep logs: text logs under logs/, one .out and .err per Slurm job.
Model checkpoints: under checkpoints/run_*. Each run directory contains:
- best.ckpt - best epoch by val_f1_macro
- optional last.ckpt if you enable saving the last epoch
Predictions / probabilities: under preds/run_*:
- test_preds.csv
- test_probs.pkl (float32 numpy array or pandas frame)
W&B: all metrics, curves, and hyperparameters are logged to the configured W&B project (modernbert-meld).

9. Minimal quick-start

Clone this repo to your home or scratch space.

Create / activate venv:

python3 -m venv .ftmb [or whatever you want to name the venv; it doesnt have to be ftmb]
source .ftmb/bin/activate
pip install -r requirements.txt

Run a small verification training (1-3 epochs) on any training split:

python -u train.py \
  --train_path datasets/train_emotions_aug_250.csv \
  --val_path   datasets/dev_sent_emo.csv \
  --test_path  datasets/test_sent_emo.csv \
  --text_col Utterance \
  --label_col Emotion \
  --max_epochs 3 \
  --batch_size 16 \
  --accelerator gpu --devices 1 --precision bf16-mixed

(Optional) Launch a sweep once the single run works:

wandb sweep sweep_deberta_phase3.yaml
sbatch slurm_sweep.sh tar-xvf/modernbert-meld/<SWEEP_ID> 40

Evaluate best checkpoint with eval_dev.py to confirm F1 and run this by running

sbatch eval_dev.sh <CHECKPOINT_PATH> <DATA_PATH> <OUT_PREFIX>

(paths are relative)

Ensemble predictions Use the csv files produced on the prior step and ensure the files are located/moved to the ensemble/ folder. First ensure you install the requirements.txt file. Then run:

sbatch ensemble.sh

10. License and dataset attribution

The code in this repository is released under the MIT License — you're free to use, modify, and distribute it with attribution.

The MELD dataset is a separate work with its own license. MELD is derived from dialogue in the television series Friends and is distributed under the GNU General Public License v3.0. See the official MELD repository for the full license text, dataset terms, and citation requirements. The MIT license on this code does not extend to MELD or to any data derived from it (including the synthetic and augmented splits in datasets/, which are conditioned on MELD content).

If you use MELD via this repo, please cite the original paper:

Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., & Mihalcea, R. (2019). MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. https://aclanthology.org/P19-1050/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeBERTa Text Emotion Classification (MELD + Synthetic Data)

1. Project layout

2. Environment setup

2.1. Create and activate a venv

2.2. Install Python dependencies

2.3. Weights & Biases (W&B)

3. Data: input formats and derived splits

3.1. Core MELD splits

3.2. Synthetic and augmented splits

4. Running the training pipeline

4.1. CLI arguments

4.2. Single run (no sweep)

However, this is CRAZY for most people with regular-type GPUs. For training on a cluster, see section `5.2. Running the sweep on an HPC Node`

5. Hyperparameter sweeps (W&B + Slurm)

5.1. Creating a sweep

5.2. Running the sweep on an HPC node

6. Independent evaluation and F1 sanity checks

7. Data-generation scripts (synthetic_gen/MELD_DATA)

8. Logs and artifacts

9. Minimal quick-start

10. License and dataset attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
datasets		datasets
ensemble		ensemble
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
data_module.py		data_module.py
eval_dev.py		eval_dev.py
eval_dev.sh		eval_dev.sh
model.py		model.py
requirements.txt		requirements.txt
run-sweep-dataset-explanation.txt		run-sweep-dataset-explanation.txt
slurm_sweep.sh		slurm_sweep.sh
sweep_deberta_phase2.yaml		sweep_deberta_phase2.yaml
sweep_deberta_phase3.yaml		sweep_deberta_phase3.yaml

Folders and files

Latest commit

History

Repository files navigation

DeBERTa Text Emotion Classification (MELD + Synthetic Data)

1. Project layout

2. Environment setup

2.1. Create and activate a venv

2.2. Install Python dependencies

2.3. Weights & Biases (W&B)

3. Data: input formats and derived splits

3.1. Core MELD splits

3.2. Synthetic and augmented splits

4. Running the training pipeline

4.1. CLI arguments

4.2. Single run (no sweep)

However, this is CRAZY for most people with regular-type GPUs. For training on a cluster, see section 5.2. Running the sweep on an HPC Node

5. Hyperparameter sweeps (W&B + Slurm)

5.1. Creating a sweep

5.2. Running the sweep on an HPC node

6. Independent evaluation and F1 sanity checks

7. Data-generation scripts (synthetic_gen/MELD_DATA)

8. Logs and artifacts

9. Minimal quick-start

10. License and dataset attribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

However, this is CRAZY for most people with regular-type GPUs. For training on a cluster, see section `5.2. Running the sweep on an HPC Node`

Packages