fix: localize deferred row admission#743
Conversation
Code Review: PR #743 — fix: localize deferred row admissionSummaryReworks adaptive row-group admission so that deferred retry work (e.g. cooling Additionally, the helper Touches one source file ( FindingsCorrectness
Style / project conventions
Performance
Test coverage
Risks
Suggestions (non-blocking)
Structural ImpactNo pre-computed structural impact analysis was available
VerdictApprove with minor suggestions. The change is well-scoped, well-tested, |
Greptile SummaryThis PR fixes adaptive row-group admission so that a deferred retry task (e.g. a rate-limited model) only blocks the next row group when every candidate task in that row group is entangled with the cooling resource — either by sharing the same request/scheduler resource key, or through graph dependency. Previously, any deferred entry unconditionally blocked all further row-group admission, starving healthy model resources.
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py | Core scheduler logic — new deferred-admission analysis, pending-row-group tracking, graph traversal caches, and diagnostics; logic is sound and the _row_group_admission_pending invariant is properly guarded. |
| packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py | Comprehensive regression suite — 9 new unit tests and 1 async integration test cover unrelated-resource passthrough, same-resource blocking, dependency blocking, multi-output siblings, shared scheduler resources, custom-model fallback, local branches, row-guard precedence, and live async progress. |
Reviews (3): Last reviewed commit: "Merge branch 'main' into etramel/fix/742..." | Re-trigger Greptile
andreatgretel
left a comment
There was a problem hiding this comment.
Reviewed #743. No blocking findings.
Validated the GitHub/local diff, ran graphify, ruff on changed Python files, and the full changed scheduler test file (104 passed). Claude cross-review also found no correctness blockers. Only note is to preserve the localized deferred-admission behavior when rebasing with the overlapping scheduler PRs.
Fixes NVIDIA-NeMo#742 Signed-off-by: Eric W. Tramel <1223539+eric-tramel@users.noreply.github.com>
0a30df1 to
5a60c2a
Compare
|
Updated to resolve merge conflicts. |
📋 Summary
Fixes adaptive row-group admission so deferred retry work only blocks new row groups when the next row group has no independent resource/column work to expose. This keeps healthy model resources busy while preserving row/frontier guardrails and adds deferred-admission diagnostics.
🔗 Related Issue
Fixes #742
🔄 Changes
🧪 Testing
uv run --group dev pytest -q packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py -k 'adaptive_row_group or deferred_cooldown'— 13 passeduv run --group dev pytest -q packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py— 104 passeddeferred_tasksblockersmake check-all-fixmake test— config 616 passed; engine 2245 passed; interface 955 passed, 1 skipped✅ Checklist