Arm backend: Generate random conv2d test inputs lazily#19556
Conversation
Conv2d operator tests were creating random inputs at module import time. The Arm test seed is applied later by an autouse pytest fixture, so those tensors were not actually controlled by ARM_TEST_SEED. That made tests nondeterministic across fresh pytest processes and could expose different quantization behavior from run to run. Generate the affected inputs lazily inside each test case so the existing seed fixture makes them reproducible and ARM_TEST_SEED=RANDOM can rerandomize the intended data. Signed-off-by: Zingo Andersen <Zingo.Andersen@arm.com> Change-Id: I48cf24e462000664d50f44fb9bdea9fa188784e3
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19556
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New Failures, 4 Unrelated FailuresAs of commit 043751e with merge base 2137894 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Pull request overview
This PR makes Arm backend conv2d operator tests deterministic by avoiding module-import-time randomness (e.g., Conv2d weight initialization) and instead constructing the Conv2d test modules lazily within each test case so the existing ARM_TEST_SEED fixture can control the RNG.
Changes:
- Replaced module-level
Conv2d(...)instances with factory functions that return a freshConv2d. - Updated
test_data_FP/bf16/fp16test suites (and derivedtest_data_INT) to store these factories directly.
Comments suppressed due to low confidence (1)
backends/arm/test/ops/test_conv2d.py:163
- Function name includes "pd1" but the Conv2d config uses padding=3. This mismatch makes failing-case IDs misleading; either update the name to reflect the actual padding or change padding to 1 if that’s what the test intends.
def conv2d_3x3_1x3x12x12_st1_pd1_replicate():
return Conv2d(
in_channels=3,
out_channels=4,
kernel_size=(3, 3),
stride=1,
padding=3,
width=12,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def conv2d_3x3_1x3x12x12_st1_pd1_reflect(): | ||
| return Conv2d( | ||
| in_channels=3, | ||
| out_channels=4, | ||
| kernel_size=(3, 3), | ||
| stride=1, | ||
| padding=3, | ||
| width=12, |
| width=8, | ||
| height=9, |
Conv2d operator tests were creating random inputs at module import time. The Arm test seed is applied later by an autouse pytest fixture, so those tensors were not actually controlled by ARM_TEST_SEED.
That made tests nondeterministic across fresh pytest processes and could expose different quantization behavior from run to run. Generate the affected inputs lazily inside each test case so the existing seed fixture makes them reproducible and ARM_TEST_SEED=RANDOM can re-randomize the intended data.
cc @digantdesai @freddan80 @per @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani