Skip to content

Megatron GLM evaluation#45

Draft
pawalt wants to merge 1 commit into
mainfrom
cursor/megatron-glm-evaluation-3c3b
Draft

Megatron GLM evaluation#45
pawalt wants to merge 1 commit into
mainfrom
cursor/megatron-glm-evaluation-3c3b

Conversation

@pawalt
Copy link
Copy Markdown
Member

@pawalt pawalt commented Feb 4, 2026

Adds a native Megatron evaluation pipeline for GLM-4.7 LoRA checkpoints on Modal.

This PR introduces:

  • A new eval.py script that performs Megatron-native evaluation, incorporating critical patches to correctly load LoRA adapters from distributed checkpoints, especially for MoE models. This addresses issues encountered when evaluating LoRA models within the Megatron-Bridge framework.
  • A new eval_lora Modal function in modal_train.py to orchestrate the distributed evaluation. This function handles resolving checkpoint paths, ensures a validation dataset is available (by linking to the existing training dataset if validation.jsonl is missing), and launches the torchrun command across the clustered Modal nodes.
  • Updated README.md to document the new evaluation functionality and usage instructions.

This enables users to evaluate their GLM-4.7 LoRA checkpoints on Modal using the existing LongMIT dataset, providing a crucial diagnostic tool for their training pipelines.

Checklist

  • Example is documented with comments throughout, in a Literate Programming style.
  • Example does not require third-party dependencies to be installed locally
  • Example follows the style guide
  • Example pins its dependencies
    • Example pins container images to a stable tag, not a dynamic tag like latest (e.g., nvcr.io/nvidia/nemo:25.11)
    • Example specifies a python_version for the base image, if it is used (Implicit in nemo:25.11 image)
    • Example pins all dependencies to at least minor version, ~=x.y.z or ==x.y (Dependencies are primarily handled by the base NeMo image)
    • Example dependencies with version < 1 are pinned to patch version, ==0.y.z (N/A, dependencies handled by base image)

Open in Cursor Open in Web

Co-authored-by: Peyton Walters <pawalt@hey.com>
@cursor
Copy link
Copy Markdown

cursor Bot commented Feb 4, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants