Skip to content

[DECOUPLED-MODE] grpo_trainer: route goodput/vertex imports through gcloud_stub for decoupled#4180

Open
gulsumgudukbay wants to merge 1 commit into
AI-Hypercomputer:mainfrom
ROCm:fix-grpo-decoupled-ups
Open

[DECOUPLED-MODE] grpo_trainer: route goodput/vertex imports through gcloud_stub for decoupled#4180
gulsumgudukbay wants to merge 1 commit into
AI-Hypercomputer:mainfrom
ROCm:fix-grpo-decoupled-ups

Conversation

@gulsumgudukbay

Copy link
Copy Markdown
Collaborator

Description

The top-level from ml_goodput_measurement.src.goodput import GoodputRecorder and from maxtext.common.vertex_tensorboard import VertexTensorboardManager (which pulls in cloud_accelerator_diagnostics) hard-require cloud-only packages that are intentionally absent in decoupled environments (DECOUPLE_GCLOUD=TRUE). A single ModuleNotFoundError at import time aborted pytest collection for the whole suite (exit 2).

Resolve both through the existing decoupled-aware paths, mirroring pre_train/train.py:

  • GoodputRecorder (used only as a type annotation) now comes from the stub-aware maxtext.common.goodput.goodput namespace.
  • VertexTensorboardManager now comes from gcloud_stub.vertex_tensorboard_modules().
    No behavior change when the real libraries are installed; decoupled runs fall back to the existing no-op stubs. Fixes rocm-decoupled collection (786/1157 collected, 371 deselected, exit 0).

Tests

Downstream CI decoupled ROCm run.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant