Add Jax distributed training guide with RNN/MLP examples by atoniolo76 · Pull Request #77 · modal-labs/multinode-training-guide

atoniolo76 · 2026-04-20T23:06:15Z

Creates a Modal training script with 4 entry points: mlp_train, mlp_sample, rnn_train,rnn_sample. Adds a README explaining the advantages of Jax over PyTorch and how to setup a multi-node cluster using mesh/sharding primitives. Requires third-party library Equinox for neural network convenience.

MLP example:
Fit a basic MLP with hidden_size=64 to the x^2 function. Compute the mean-squared error as loss-function and back-propagate with Adam optimizer.

RNN example:
Next-character prediction on Chapter 32 from Moby Dick. Computes cross-entropy loss. Vocabulary is a one-hot vector of size 64.

Checklist

[*] Example is documented with comments throughout, in a Literate Programming style.
[*] Example does not require third-party dependencies to be installed locally
[*] Example follows the style guide
[*] Example pins its dependencies
- [*] Example pins container images to a stable tag, not a dynamic tag like latest
- [*] Example specifies a python_version for the base image, if it is used
- [*] Example pins all dependencies to at least minor version, ~=x.y.z or ==x.y
- [*] Example dependencies with version < 1 are pinned to patch version, ==0.y.z

(Modal's internal guide page for this repo is Multi-node examples guidance.)

Outside contributors

You're great! Thanks for your contribution.

…ction function

…ific files, and save metrics data/load weights from checkpoint.

atoniolo76 added 8 commits April 2, 2026 18:30

Create basic MLP training script with Jax

e3dc3f7

Add GroupNorm layer in jax/model.py and adapt train.py script for Modal

94ed66c

Add performance counter to training step

dd4381b

Add state tracking for BatchNorm training statistics and create predi…

5211024

…ction function

Add data parallelism to Jax training example

c04df26

Finish RNN module, refactor training script into modal + mlp/rnn spec…

8533344

…ific files, and save metrics data/load weights from checkpoint.

Fit style guidelines in multinode-training-guide repo

dad6152

Add tokenization

79de4dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Jax distributed training guide with RNN/MLP examples#77

Add Jax distributed training guide with RNN/MLP examples#77
atoniolo76 wants to merge 8 commits into
mainfrom
alessio/jax-training-example

atoniolo76 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

atoniolo76 commented Apr 20, 2026

Checklist

Outside contributors

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant