Megatron GLM evaluation by pawalt · Pull Request #45 · modal-labs/multinode-training-guide

pawalt · 2026-02-04T19:30:21Z

Adds a native Megatron evaluation pipeline for GLM-4.7 LoRA checkpoints on Modal.

This PR introduces:

A new eval.py script that performs Megatron-native evaluation, incorporating critical patches to correctly load LoRA adapters from distributed checkpoints, especially for MoE models. This addresses issues encountered when evaluating LoRA models within the Megatron-Bridge framework.
A new eval_lora Modal function in modal_train.py to orchestrate the distributed evaluation. This function handles resolving checkpoint paths, ensures a validation dataset is available (by linking to the existing training dataset if validation.jsonl is missing), and launches the torchrun command across the clustered Modal nodes.
Updated README.md to document the new evaluation functionality and usage instructions.

This enables users to evaluate their GLM-4.7 LoRA checkpoints on Modal using the existing LongMIT dataset, providing a crucial diagnostic tool for their training pipelines.

Checklist

Example is documented with comments throughout, in a Literate Programming style.
Example does not require third-party dependencies to be installed locally
Example follows the style guide
Example pins its dependencies
- Example pins container images to a stable tag, not a dynamic tag like latest (e.g., nvcr.io/nvidia/nemo:25.11)
- Example specifies a python_version for the base image, if it is used (Implicit in nemo:25.11 image)
- Example pins all dependencies to at least minor version, ~=x.y.z or ==x.y (Dependencies are primarily handled by the base NeMo image)
- Example dependencies with version < 1 are pinned to patch version, ==0.y.z (N/A, dependencies handled by base image)

Co-authored-by: Peyton Walters <pawalt@hey.com>

cursor · 2026-02-04T19:30:23Z

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
_{Learn more about Cursor Agents}

Add GLM-4.7 eval workflow

865a7be

Co-authored-by: Peyton Walters <pawalt@hey.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Megatron GLM evaluation#45

Megatron GLM evaluation#45
pawalt wants to merge 1 commit into
mainfrom
cursor/megatron-glm-evaluation-3c3b

pawalt commented Feb 4, 2026

Uh oh!

cursor Bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pawalt commented Feb 4, 2026

Checklist

Uh oh!

cursor Bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants