Skip to content

Latest commit

 

History

History
28 lines (18 loc) · 933 Bytes

File metadata and controls

28 lines (18 loc) · 933 Bytes

ZeroDiffusion

A zero-shot learning diffusion method.

Before training, train an audio embedding model in the audio embedding folder.

Next, create a pickle of the embeddings in the data folder.

Now you can run the ZeroDiffusion training.

Example:

python train.py ../data/ESC_50_synonyms/fold04.pickle --dataset ESC-50 --split fold0

Tried Variants

The repo has been simplified to the MSE + interpolation-based denoising autoencoder baseline because the following variants were tried and did not improve performance in practice:

  • Additive noise instead of interpolation noise
  • L1 reconstruction loss instead of MSE
  • LayerNorm in the denoising autoencoder
  • Contrastive DAE variants
  • Conditional VAE variants
  • Adaptive-noise variants

The current codebase keeps the best-performing path only: MSE loss with interpolation-based noise in the denoising autoencoder.