Skip to content

Fix is_last off-by-one in MaskGenerationPipeline for partial batches#46136

Open
J3r3myPerera wants to merge 5 commits into
huggingface:mainfrom
J3r3myPerera:fix/mask-generation-is-last-partial-batch
Open

Fix is_last off-by-one in MaskGenerationPipeline for partial batches#46136
J3r3myPerera wants to merge 5 commits into
huggingface:mainfrom
J3r3myPerera:fix/mask-generation-is-last-partial-batch

Conversation

@J3r3myPerera
Copy link
Copy Markdown

@J3r3myPerera J3r3myPerera commented May 21, 2026

Fixes #46123

MaskGenerationPipeline.preprocess used i == n_points - points_per_batch to spot the last batch. When n_points isn't a multiple of points_per_batch, that's never true — PipelinePackIterator hits StopIteration and quietly drops the last batch's results.

Fix: i + points_per_batch >= n_points.
Two fast unit tests in test_pipelines_mask_generation.py: one for the partial-batch case (100 points, batch 64), one for an exact multiple (128 points, batch 64).

python -m pytest tests/pipelines/test_pipelines_mask_generation.py::MaskGenerationPipelineTests::test_preprocess_is_last_partial_batch tests/pipelines/test_pipelines_mask_generation.py::MaskGenerationPipelineTests::test_preprocess_is_last_exact_multiple -v
#2 passed

  • I confirm that this is not a pure code agent PR.

Before submitting

Who can review?

cc @Rocketknight1 @yonigozlan @qubvel

@Shashank-Tripathi-07
Copy link
Copy Markdown

Hey bro, the code looks good but you said you didn't use AI agents to make this PR but there are em dashes very visible on the comment you made and also in the original issue. This can be a problem as the repo doesn't like Agent Slop even 1%. Take a look again for safety on this.

@J3r3myPerera
Copy link
Copy Markdown
Author

The CI failures here are pre-existing on main — not caused by this change.
ci/circleci: tests_tensor_parallel_ci — all 3 failures are in Cohere2MoeModelTest (test_tp_forward, test_tp_backward, test_tp_generation), crashing with KeyError: 'rowwise' in distributed/tensor_parallel.py. This PR doesn't touch any of that.

ci/circleci: tests_training_overfit_ci — 1 failure, also Cohere2MoeModelTest::test_training_overfit, loss only drops 27% vs a 90% threshold. Unrelated.

Only two files changed:

src/transformers/pipelines/mask_generation.py (1 line)
tests/pipelines/test_pipelines_mask_generation.py (2 tests)

Neither touches Cohere2MoeModel or anything in the distributed training path.

@J3r3myPerera
Copy link
Copy Markdown
Author

J3r3myPerera commented May 21, 2026

Hey bro, the code looks good but you said you didn't use AI agents to make this PR but there are em dashes very visible on the comment you made and also in the original issue. This can be a problem as the repo doesn't like Agent Slop even 1%. Take a look again for safety on this.

Fair point, and I'll own it. I did use AI to help word the PR description and the issue comment. The fix itself I worked out on my own: i == n_points - points_per_batch only hits when n_points is an exact multiple, so any partial tail batch never gets flagged as last, PipelinePackIterator raises StopIteration and the results are quietly dropped. Replacing it with i + points_per_batch >= n_points handles both cases. I understand what the code does and why the old condition was wrong.

That said, em dashes in prose aren't really a reliable signal for agent-generated code. Plenty of people type them on purpose. The actual thing to check is whether the logic holds up. Which I'd rather be judged on.

@Rocketknight1
Copy link
Copy Markdown
Member

You can ignore those comments, he's just annoyed I wouldn't listen when he claimed his Claude PR was human-written. In this case the actual fix is one line and seems correct, so I don't really care too much whether an agent wrote it or not. You do not actually need to go around hiding all the em-dashes 😅

Replace the two heavily-mocked test methods with a single subTest-driven
check that calls preprocess on a real MaskGenerationPipeline backed by
hf-internal-testing/tiny-random-SamModel. Covers both the partial-batch
(100 points) and exact-multiple (64 points) cases without mocking.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I did a little cleanup to shorten the tests a lot - a regression test for the behaviour is nice, but we want to avoid 60 LOC of tests for a one-line fix, it's a bit agent-sloppy. The fix was good, though!

@Rocketknight1 Rocketknight1 enabled auto-merge May 21, 2026 13:18
@Rocketknight1
Copy link
Copy Markdown
Member

@J3r3myPerera looks like there might be some CI instability at the moment. Can you wait a bit and then try rebasing or rerunning tests? Once the CI is green ping me and I'll merge it.

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46136&sha=ba5335

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MaskGenerationPipeline: is_last never True on final partial batch, silently dropping results

4 participants