lab-clip

OpenCLIP fine-tuning toolkit for text-based person re-identification (ReID). Supports contrastive training and triplet loss training with hard negative captions. Designed to be used as a person search engine.

Repository Layout

lab_clip/
  src/
    data.py       # dataset loading, ReIDDataset, TripletReIDDataset, collate functions
    losses.py     # multi-positive InfoNCE loss, symmetric CLIP loss, triplet loss
    retrieval.py  # encode_image_loader, encode_text_loader, retrieval_metrics
    training.py   # contrastive_step, evaluate, optimizer, scheduler, AverageMeter
  train.py              # single-GPU contrastive fine-tuning
  train_ddp.py          # multi-GPU DDP contrastive fine-tuning
  train_triplet.py      # single-GPU triplet loss fine-tuning
  train_ddp_triplet.py  # multi-GPU DDP triplet loss fine-tuning
  tune.py               # Optuna hyperparameter search for contrastive training
  tune_triplet.py       # Optuna hyperparameter search for triplet training
  test.py               # standard retrieval evaluation (Rank-k, mAP)
  test_triplet.py       # hard negative evaluation (triplet accuracy, margin)
  env/.env              # local environment variables (not tracked by git)
  pyproject.toml

Setup

Requires Python 3.12 and uv.

uv sync

Environment

Create env/.env with the following variables:

DATASET_ROOT="/path/to/datasets"
NEGATIVE_REID_DATASET_PATH="/path/to/negative-captions/outputs"
CUDA_VISIBLE_DEVICES=0

DATASET_ROOT must contain the following directory structure:

DATASET_ROOT/
  CUHK-PEDES/
    reid_raw.json
    reid_raw_negative_gemma4:e4b.json   # RTX 5070 Ti
    reid_raw_negative_gemma4:26b.json   # A6000
    imgs/
  RSTPReid/
    data_captions.json
    data_captions_negative_gemma4:e4b.json   # RTX 5070 Ti
    data_captions_negative_gemma4:26b.json   # A6000
    imgs/

GPU-Based Negative Annotation File Selection

Triplet training automatically detects the current GPU and selects the appropriate negative annotation file.

GPU	Negative Model	Annotation File Suffix
RTX 5070 Ti	gemma4:e4b	`*_negative_gemma4:e4b.json`
A6000	gemma4:26b	`*_negative_gemma4:26b.json`

No manual path configuration is required. To add a new GPU, update GPU_NEGATIVE_MODEL_MAP and NEGATIVE_ANNOTATION_FILES in src/data.py.

Supported Datasets

Dataset	Key	Task
CUHK-PEDES	`cuhk-pedes`	text-based person ReID
RSTPReid	`rstpreid`	text-based person ReID

test.py additionally supports mscoco, cc12m, cifar100, and imagenet for general retrieval evaluation.

Training

Contrastive Training (Single GPU)

Fine-tunes OpenCLIP with multi-positive symmetric InfoNCE loss. Same-person captions within a batch are all treated as positives.

uv run python train.py \
  --dataset cuhk-pedes \
  --epochs 5 \
  --batch-size 64 \
  --lr 1e-5

Contrastive Training (Multi-GPU DDP)

torchrun --nproc_per_node=2 train_ddp.py \
  --dataset cuhk-pedes \
  --epochs 5 \
  --batch-size 256

Triplet Training (Single GPU)

Fine-tunes with triplet loss using image as anchor, original caption as positive, and LLM-generated hard negative caption as negative.

Loss: mean(relu(sim(img, neg) - sim(img, pos) + margin))

uv run python train_triplet.py \
  --dataset cuhk-pedes \
  --epochs 5 \
  --batch-size 64 \
  --margin 0.2

The negative annotation file is selected automatically based on the current GPU.

Triplet Training (Multi-GPU DDP)

torchrun --nproc_per_node=2 train_ddp_triplet.py \
  --dataset cuhk-pedes \
  --epochs 5 \
  --batch-size 256 \
  --margin 0.2

Common Training Arguments

Argument	Default	Description
`--dataset`	required	`cuhk-pedes` or `rstpreid`
`--epochs`	`5`	number of training epochs
`--batch-size`	`64`	per-GPU batch size
`--lr`	`1e-5`	AdamW learning rate
`--weight-decay`	`0.2`	AdamW weight decay
`--warmup-ratio`	`0.05`	linear warmup fraction of total steps
`--accum-steps`	`1`	gradient accumulation steps
`--grad-clip-norm`	`1.0`	gradient clipping max norm
`--caption-mode`	`all`	`all` expands each caption; `random` samples one per image
`--margin`	`0.2`	triplet loss margin (triplet training only)
`--model-name`	`ViT-B-16`	OpenCLIP model name
`--pretrained`	`laion2b_s34b_b88k`	OpenCLIP pretrained weight tag
`--output-dir`	auto	artifact save directory
`--resume`	—	checkpoint path to resume training
`--save-every`	`0`	save checkpoint every N epochs (0 disables)
`--no-amp`	—	disable CUDA mixed precision
`--no-grad-checkpointing`	—	disable gradient checkpointing

Checkpoints best.pt (best val score) and last.pt are saved under --output-dir.

Hyperparameter Tuning

Optuna-based tuning that launches training as a subprocess for each trial.

Contrastive

uv run python tune.py \
  --dataset cuhk-pedes \
  --n-trials 100 \
  --epochs 5

Triplet

uv run python tune_triplet.py \
  --dataset cuhk-pedes \
  --n-trials 100 \
  --epochs 5

Tuned parameters and their search spaces:

Parameter	Search Space
`batch_size`	`[32, 64, 128, 192, 256, 384, 512, 768, 1024, 1536, 2048]`
`accum_steps`	`[1, 2, 4, 8, 16]` (max effective batch = 32768)
`lr`	`[1e-6, 2e-6, 5e-6, 1e-5]`
`weight_decay`	`[0.05, 0.1, 0.2, 0.3]`
`warmup_ratio`	`[0.0, 0.15]` continuous
`grad_clip_norm`	`[0.0, 0.5, 1.0, 5.0]`
`caption_mode`	`[all, random]`
`margin`	`[0.1, 0.2, 0.3, 0.5]` (triplet only)

Learning rate is tuned only among small values unless --lr is set; fixed --lr values above 1e-5 are rejected. For a new study, every batch_size x accum_steps pair is enqueued once from the largest physical batch size before normal Optuna sampling. The best checkpoint is symlinked to output-root/best.pt. Use --reset-study to delete the existing study and trial artifacts under output-root before starting again.

Evaluation

Standard Retrieval (Rank-k, mAP)

Evaluates text-to-image retrieval quality over the full gallery. Use this to measure overall person search performance.

# Baseline (pretrained, no fine-tuning)
uv run python test.py --dataset cuhk-pedes --model baseline

# Fine-tuned checkpoint
uv run python test.py --dataset cuhk-pedes --model artifacts/cuhk-pedes_triplet/best.pt

Output metrics: top1, top5, top10, mAP

Hard Negative Evaluation (Triplet Accuracy)

Evaluates how well the model distinguishes a positive caption from a hard negative caption for the same image. Use this to measure localized attribute-level discrimination (e.g., "black coat" vs "white coat").

# Baseline
uv run python test_triplet.py --dataset cuhk-pedes --model baseline

# Fine-tuned checkpoint
uv run python test_triplet.py --dataset cuhk-pedes --model artifacts/cuhk-pedes_triplet/best.pt

Output metrics:

Metric	Description
`triplet_accuracy`	fraction where `sim(img, pos) > sim(img, neg)`
`pos_sim_mean`	mean cosine similarity between image and positive caption
`neg_sim_mean`	mean cosine similarity between image and negative caption
`margin_mean`	mean of `pos_sim - neg_sim`
`margin_std`	standard deviation of the margin

Recommended Evaluation Workflow

Run both evaluations for baseline and trained model to capture the full picture.

uv run python test.py           --dataset cuhk-pedes --model baseline
uv run python test_triplet.py   --dataset cuhk-pedes --model baseline

uv run python test.py           --dataset cuhk-pedes --model artifacts/cuhk-pedes_triplet/best.pt
uv run python test_triplet.py   --dataset cuhk-pedes --model artifacts/cuhk-pedes_triplet/best.pt

Results are saved as JSON under results/.

Loss Functions

Multi-Positive Symmetric InfoNCE (`train.py`, `train_ddp.py`)

All captions sharing the same person_id within a batch are treated as positives. Computed symmetrically in both image-to-text and text-to-image directions.

Triplet Loss (`train_triplet.py`, `train_ddp_triplet.py`)

loss = mean(relu(sim(image, neg_caption) - sim(image, pos_caption) + margin))

Image is the anchor. Positive and negative captions are 1-to-1 paired from the negative annotation JSON. All features are L2-normalized before similarity computation.

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
.github		.github
env		env
src		src
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
claude.md		claude.md
pyproject.toml		pyproject.toml
test.py		test.py
test_triplet.py		test_triplet.py
train.py		train.py
train_diverse_color.py		train_diverse_color.py
train_diverse_color_reid.py		train_diverse_color_reid.py
train_hardneg_reid.py		train_hardneg_reid.py
train_triplet.py		train_triplet.py
train_triplet_reid.py		train_triplet_reid.py
tune.py		tune.py
tune_diverse_color.py		tune_diverse_color.py
tune_diverse_color_reid.py		tune_diverse_color_reid.py
tune_hardneg_reid.py		tune_hardneg_reid.py
tune_triplet.py		tune_triplet.py
tune_triplet_reid.py		tune_triplet_reid.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lab-clip

Repository Layout

Setup

Environment

GPU-Based Negative Annotation File Selection

Supported Datasets

Training

Contrastive Training (Single GPU)

Contrastive Training (Multi-GPU DDP)

Triplet Training (Single GPU)

Triplet Training (Multi-GPU DDP)

Common Training Arguments

Hyperparameter Tuning

Contrastive

Triplet

Evaluation

Standard Retrieval (Rank-k, mAP)

Hard Negative Evaluation (Triplet Accuracy)

Recommended Evaluation Workflow

Loss Functions

Multi-Positive Symmetric InfoNCE (`train.py`, `train_ddp.py`)

Triplet Loss (`train_triplet.py`, `train_ddp_triplet.py`)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

lab-clip

Repository Layout

Setup

Environment

GPU-Based Negative Annotation File Selection

Supported Datasets

Training

Contrastive Training (Single GPU)

Contrastive Training (Multi-GPU DDP)

Triplet Training (Single GPU)

Triplet Training (Multi-GPU DDP)

Common Training Arguments

Hyperparameter Tuning

Contrastive

Triplet

Evaluation

Standard Retrieval (Rank-k, mAP)

Hard Negative Evaluation (Triplet Accuracy)

Recommended Evaluation Workflow

Loss Functions

Multi-Positive Symmetric InfoNCE (train.py, train_ddp.py)

Triplet Loss (train_triplet.py, train_ddp_triplet.py)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Multi-Positive Symmetric InfoNCE (`train.py`, `train_ddp.py`)

Triplet Loss (`train_triplet.py`, `train_ddp_triplet.py`)

Packages