glide-finetune

Fine-tune and evaluate GLIDE text-to-image diffusion models.

Installation

Requires uv and Python 3.12+.

git clone https://github.com/afiaka87/glide-finetune.git
cd glide-finetune/
uv sync
uv pip install -e .

Quick Start

Train Base Model (64x64)

python train_glide.py \
  --data_dir /path/to/dataset \
  --batch_size 4 \
  --learning_rate 1e-4 \
  --uncond_p 0.2 \
  --precision bf16 \
  --wandb_project_name my-project

Train on WebDataset (LAION)

python train_glide.py \
  --data_dir "/mnt/laion/*.tar" \
  --use_webdataset \
  --wds_image_key jpg \
  --wds_caption_key txt \
  --use_captions \
  --batch_size 8 \
  --learning_rate 1e-4 \
  --precision bf16 \
  --activation_checkpointing

Train Upsampler (64->256)

python train_glide.py \
  --data_dir /path/to/dataset \
  --train_upsample \
  --upscale_factor 4 \
  --uncond_p 0.0 \
  --precision bf16

Evaluate / Generate

python evaluate_glide.py \
  --prompt_file eval_captions.txt \
  --base_model checkpoints/base.pt \
  --sr_model checkpoints/sr.pt \
  --use_clip_rerank \
  --clip_candidates 32 \
  --clip_top_k 8 \
  --sampler euler \
  --cfg 4.0

Latent Diffusion Mode (Experimental)

Experimental support for training GLIDE in latent space. Instead of denoising 64x64 pixel images directly, the model operates on 32x32 latent representations (via a frozen SD 1.5 VAE) and produces 256x256 outputs after decoding. A frozen OpenCLIP ViT-L-14 encoder replaces the GLIDE text transformer for conditioning.

Latent Training

python train_glide.py \
  --latent_mode \
  --data_dir "/path/to/webdataset/*.tar" \
  --use_webdataset \
  --wds_dataset_name datacomp-clip \
  --wds_image_key jpg \
  --use_captions \
  --batch_size 32 \
  --learning_rate 3e-4 \
  --precision bf16 \
  --uncond_p 0.2

Latent Mode Arguments

Argument	Default	Description
`--latent_mode`	off	Enable latent diffusion training
`--vae_model`	`stabilityai/sd-vae-ft-mse`	HuggingFace VAE model
`--clip_model_name`	`ViT-L-14`	OpenCLIP model architecture
`--clip_pretrained`	`laion2b_s32b_b82k`	OpenCLIP pretrained weights

Model Initialization (`--init`)

Value	Description
(empty)	Auto: `pretrained` for pixel mode, `scratch` for latent mode
`pretrained`	Load OpenAI pretrained weights (pixel mode only)
`scratch`	Random initialization
`checkpoint:<path>`	Resume from a saved checkpoint
`pixel-transfer:<path>`	Transfer pixel-space weights to latent model (latent mode only)

Training Scope (`--train`)

Value	Description
`all`	Train everything (default)
`unet`	Train only UNet, freeze text encoder
`unet-scratch`	Reinit UNet from random, freeze text encoder
`transformer`	Train only text encoder, freeze UNet (keeps encoder_kv trainable)
`transformer-scratch`	Reinit text encoder from random, freeze UNet

Performance Optimization Flags

The following flags enable GPU performance optimizations that are disabled by default. They were removed from the default training path because torch.compile was found to produce subtly wrong gradients through the mixed BF16/FP32 casting in pixel-space GLIDE models, causing training outputs to slowly degrade into abstract blobs over time. The other flags are safe individually but are opt-in for consistency.

Flag	Description
`--use_compile`	Enable `torch.compile`. Not recommended for pixel-space BF16 models — causes gradient issues through mixed-precision ResBlocks. Safe for latent models which have explicit FP32 protections.
`--use_tf32`	Enable TF32 for matmul and cuDNN. Reduces FP32 precision from 23-bit to 10-bit mantissa.
`--use_channels_last`	Enable `channels_last` memory format for model weights and activations.
`--use_fused_adam`	Enable fused AdamW optimizer (single-kernel parameter update).

Testing

# Run all tests (requires GPU and pretrained weights)
uv run pytest tests/ -m slow -v

# Run only the fast tests (no GPU needed)
uv run pytest tests/ -v

Test Suites

tests/test_training_regression.py -- Regression tests that train for a small number of steps on synthetic data and verify loss decreases, outputs are not degenerate, and no NaN/Inf values appear.
tests/test_training_sanity.py -- Ablation tests parameterized by feature flags (torch.compile, TF32, channels_last, fused AdamW). Used to isolate which optimizations cause training degradation.
tests/test_latent_core.py -- Unit tests for the latent diffusion model architecture.

Full Usage

Run python train_glide.py --help for the complete argument list. Key arguments are documented above.

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
data		data
glide-text2im		glide-text2im
glide_finetune		glide_finetune
scripts		scripts
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
eval_captions_persons_aesthetic.txt		eval_captions_persons_aesthetic.txt
evaluate_glide.py		evaluate_glide.py
gradio_app.py		gradio_app.py
pyproject.toml		pyproject.toml
train_glide.py		train_glide.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

glide-finetune

Installation

Quick Start

Train Base Model (64x64)

Train on WebDataset (LAION)

Train Upsampler (64->256)

Evaluate / Generate

Latent Diffusion Mode (Experimental)

Latent Training

Latent Mode Arguments

Model Initialization (`--init`)

Training Scope (`--train`)

Performance Optimization Flags

Testing

Test Suites

Full Usage

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

glide-finetune

Installation

Quick Start

Train Base Model (64x64)

Train on WebDataset (LAION)

Train Upsampler (64->256)

Evaluate / Generate

Latent Diffusion Mode (Experimental)

Latent Training

Latent Mode Arguments

Model Initialization (--init)

Training Scope (--train)

Performance Optimization Flags

Testing

Test Suites

Full Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Model Initialization (`--init`)

Training Scope (`--train`)

Packages