build: rebalance unit test shards to reduce CI critical path (~29m --> ~23m) and use fewer workers (16 --> 10)#38287
Open
build: rebalance unit test shards to reduce CI critical path (~29m --> ~23m) and use fewer workers (16 --> 10)#38287
Conversation
afef03c to
c17633d
Compare
Contributor
Author
|
The fix needs to be made upstream: openedx/openedx-events#559 waiting for that to be merged and released before coming back to this PR. |
676809c to
a89e30c
Compare
a89e30c to
19c5643
Compare
Run pytest with extra reporting enabled to generate files with per-test durations. The file is uploaded as a CI artifact so timing data can be downloaded and used to drive optimal shard rebalancing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Redistribute test paths across 9 shards (down from 16) using a greedy bin-packing optimiser driven by real per-test timing data from pytest-reportlog. Predicted critical path: ~18.7m (down from ~29m). Key changes: - Rename shard groups to reflect semantic meaning: lms-*, shared-with-lms-*, shared-with-cms-*, cms-* (openedx/common/xmodule paths explicitly separated from lms-only and cms-only paths) - Split lms/djangoapps/discussion/ into its 4 subdirectories so the heavy rest_api/ shard (15.7m) can be distributed across bins independently - Remove outdated comment referencing unit-tests-gh-hosted.yml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ubclasses Three test classes in the certificates app were calling CourseFactory() in setUp() despite extending SharedModuleStoreTestCase. Unlike ModuleStoreTestCase, SharedModuleStoreTestCase shares a single modulestore across all tests in the class and only closes MongoDB connections at tearDownClass. Calling CourseFactory() in setUp() created a new MongoDB course (and opened connections) for every test method without releasing them, causing connection accumulation across the full test run. Affected classes: - CertificateFiltersTest (test_filters.py) - CertificateInvalidationTest (test_models.py) - CertificateAllowlistTest (test_models.py) In each case the course is only read by test methods (test data such as users, enrollments and certificates is written via Django ORM and rolled back between tests), so sharing a single course across the class is correct. See: https://github.com/openedx/openedx-platform/blob/master/xmodule/modulestore/tests/django_utils.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vents.testing openedx_events/tests/utils.py was moved to openedx_events/testing.py in openedx/openedx-events#559 so the test utilities are included in the installed package (setup.py excludes the tests/ subpackage from the wheel). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When OpenEdxEventsTestMixin was listed after a TestCase subclass (e.g. Foo(SharedModuleStoreTestCase, OpenEdxEventsTestMixin)), it landed after unittest.case.TestCase in the MRO. Since unittest.case.TestCase.setUpClass and tearDownClass do not call super(), the mixin's lifecycle methods never ran. The workaround was to manually call cls.start_events_isolation() in each class's setUpClass, but there was no corresponding tearDownClass to restore event state, causing events disabled by one test class to leak into subsequent classes in the same process. Fix by placing OpenEdxEventsTestMixin first in the base class list so it appears before unittest.case.TestCase in the MRO. This lets setUpClass and tearDownClass run automatically through the cooperative super() chain, removing the need for manual start_events_isolation() calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This mixin is already included via one of the other mixins on this test class so including it again was messing with the MRO for the test classes.
contentstore/ is large enough that the cms-1 runner was being killed mid-run in CI (OOM or runner-level timeout). Splitting it into its own shard keeps each job under the ~20-25 min target. No changes needed to gha_unit_tests_collector.py — it already classifies any shard whose first path starts with "cms/" as a CMS shard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
19c5643 to
1422307
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rebalances the unit test shard configuration to reduce the CI critical path from ~29 minutes down to ~23 minutes, measured across 3 consistent runs.
What changed
Timing data — 3 runs on the new config (all passed)
shared-with-cms-1shared-with-lms-1shared-with-lms-2lms-4lms-1lms-5cms-2lms-2lms-3cms-1For comparison, the old config's critical path (visible in the #38347 run on unmodified master) was ~29m on
lms-4.