ci: run strict-no-cover in scripts/test to catch stale pragmas locally#2305
ci: run strict-no-cover in scripts/test to catch stale pragmas locally#2305
Conversation
PR #2302 passed ./scripts/test (100% coverage) but failed CI when strict-no-cover found lines marked '# pragma: no cover' that the new test actually executed. The claim in CLAUDE.md that ./scripts/test 'matches CI exactly' was false. Adds strict-no-cover as a final step in scripts/test. The tool internally spawns 'uv run coverage json' without --frozen, which rewrites uv.lock on machines with registry overrides; UV_FROZEN=1 propagates to that subprocess. Also converts session.py:426 from 'no cover' to 'lax no cover'. The except-handler during connection cleanup is a genuine race (stream may or may not already be closed depending on timing), so it is nondeterministically covered. strict-no-cover would have flagged it intermittently on high-core machines. CLAUDE.md now documents the fast targeted path (single test file + strict-no-cover, ~4s, no false positives on partial runs), the three pragma types, and the known coverage.py ->exit arc quirk with nested async-with that only the CI matrix catches.
The session.py:426 flag appeared exactly once across 10 runs, and that one run was the only one with a confounding test edit in place. CI has never flagged this line; 8 clean local runs never flagged it either. The original pragma was correct.
There was a problem hiding this comment.
LGTM — straightforward CI/DX improvement with documentation updates and a single pragma annotation change.
Extended reasoning...
Overview
This PR adds strict-no-cover as a final step in scripts/test, updates CLAUDE.md with accurate testing guidance and coverage-pragma documentation, and changes one pragma in src/mcp/shared/session.py from # pragma: no cover to # pragma: lax no cover. The session.py change is purely an annotation change (no logic modification) to prevent flaky failures from a nondeterministically-covered exception handler.
Security risks
None. The changes are limited to a shell script (CI tooling), a documentation file, and a comment annotation. No production code paths, authentication, or data handling are affected.
Level of scrutiny
Low scrutiny warranted. This is a developer-experience improvement: the script change adds one line to run an existing tool, the documentation update is comprehensive and accurate, and the pragma change is well-justified in the PR description (race condition in subprocess coverage on high-core machines).
Other factors
The two bug reports found are both minor: coverage.json missing from .gitignore is a nit, and the unquoted $@ is pre-existing. Neither affects the correctness of this PR. The PR description is thorough, explaining the motivation (CI/local parity gap from PR #2302), testing methodology, and the UV_FROZEN=1 rationale for preventing uv.lock pollution.
Drop the subprocess-flake bullet and the 'add this pragma' prescription for no-branch. Scope lax-no-cover to platform-specific coverage only.
There was a problem hiding this comment.
LGTM — straightforward CI script addition and documentation update.
Extended reasoning...
Overview
This PR adds a single line to scripts/test (UV_FROZEN=1 uv run --frozen strict-no-cover) and updates CLAUDE.md with accurate testing guidance including coverage pragma documentation. Two files changed, both non-production.
Security risks
None. The changes are to a local developer test script and a documentation file. No production code, auth, or data handling is affected.
Level of scrutiny
Low scrutiny appropriate. This is a developer tooling improvement — adding a linting step to a local test script and correcting documentation that previously overclaimed CI parity. The UV_FROZEN=1 environment variable propagation is a thoughtful detail to prevent uv.lock mutation.
Other factors
My two previous inline comments (missing coverage.json in .gitignore and unquoted $@) are both marked resolved. The new nit about the CI matrix description omitting the lowest-direct dimension is a minor documentation accuracy issue — the text already correctly states the script is "Not identical to CI," which is the important message. The PR is well-motivated by a concrete CI failure (PR #2302) and thoroughly tested as described in the PR body.
| - Full check: `./scripts/test` (~23s). Runs coverage + `strict-no-cover` on the | ||
| default Python. Not identical to CI: CI also runs 3.10–3.14 × {ubuntu, windows}, |
There was a problem hiding this comment.
🟡 The CI matrix description says "3.10–14 × {ubuntu, windows}" (10 entries), but .github/workflows/shared.yml defines a three-dimensional matrix: python-version × dep-resolution × os (5×2×2 = 20 entries), omitting the {locked, lowest-direct} dimension. Consider updating to "CI also runs 3.10–14 × {ubuntu, windows} × {locked, lowest-direct}" — the lowest-direct strategy is especially worth mentioning since those failures cannot be reproduced locally with --frozen.
Extended reasoning...
What the bug is
CLAUDE.md lines 32-33 state:
Not identical to CI: CI also runs 3.10–14 × {ubuntu, windows},
and some branch-coverage quirks only surface on specific matrix entries.
This implies the CI matrix has 5 Python versions × 2 OSes = 10 entries. However, the actual CI matrix in .github/workflows/shared.yml (lines 49-56) defines three dimensions:
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
dep-resolution:
- name: lowest-direct
install-flags: "--upgrade --resolution lowest-direct"
- name: locked
install-flags: "--frozen"
os: [ubuntu-latest, windows-latest]That is 5 × 2 × 2 = 20 matrix entries, not 10.
Why the omission matters
The lowest-direct resolution strategy (--upgrade --resolution lowest-direct) tests compatibility with minimum acceptable dependency versions. This is a meaningfully different CI dimension because:
- A dependency feature used by MCP might not exist in the lowest-direct resolved version, causing a CI failure that cannot be reproduced locally where
--frozenuses the lockfile (latest) versions. - A developer seeing failures only on
lowest-directmatrix entries would have no context from CLAUDE.md to understand why — the documentation only mentions Python versions and OS as CI dimensions.
Step-by-step proof
- Developer reads CLAUDE.md line 32-33: "CI also runs 3.10–14 × {ubuntu, windows}".
- Developer infers the CI matrix is 10 entries varying only by Python version and OS.
- Developer pushes a change that uses a dependency feature added in a recent version.
- CI fails on
lowest-directmatrix entries where the older dependency version lacks that feature. - Developer is confused because the failure does not correlate with any Python version or OS — the two dimensions CLAUDE.md told them about.
Impact
This is a minor documentation inaccuracy. The text already represents a significant improvement over the previous claim that ./scripts/test "matches CI exactly," and the sentence does correctly state that scripts/test is "Not identical to CI." The missing dimension is unlikely to cause real confusion in most cases, but including it would make the documentation precisely accurate and help developers diagnose lowest-direct-specific CI failures.
Suggested fix
Change line 33 from:
CI also runs 3.10–14 × {ubuntu, windows},
to:
CI also runs 3.10–14 × {ubuntu, windows} × {locked, lowest-direct},
Adds
strict-no-coveras a final step in./scripts/testso stale# pragma: no coverannotations are caught locally before pushing. UpdatesCLAUDE.mdwith accurate testing guidance and documents the three coverage-pragma types.Motivation and Context
PR #2302 passed
./scripts/test(100% coverage, all green) but failed CI in two ways:src/mcp/server/stdio.py— the new test executed them, but they still carried# pragma: no cover.coverage reportsaw 100% because the pragma excludes the line;strict-no-coversaw that a "never covered" line was in fact executed.async within the test produced a missing->exitarc, a known coverage.py quirk.CLAUDE.mdclaimed./scripts/test"matches CI exactly." It didn't: CI runsstrict-no-coveras a separate step aftercoverage report(Linux only), and CI runs the full 3.10–3.14 × {ubuntu, windows} matrix. The first gap is closed here; the second is documented.CI does not invoke
scripts/test— it inlines the coverage commands inshared.yml— so this change is purely local and does not runstrict-no-covertwice.How Has This Been Tested?
Reproduced the #2302 failure mode: added
# pragma: no coverto a line thattests/server/test_stdio.pyexecutes, then ran both paths.Full path (
./scripts/test):coverage report→ 100%, exit 0. Stale pragma undetected.coverage report→ 100%, thenstrict-no-cover→❎ 1 lines wrongly marked, exit 1. ~21s wall time (vs ~20s before).Fast targeted path (documented in CLAUDE.md):
uv run --frozen coverage erase uv run --frozen coverage run -m pytest tests/server/test_stdio.py uv run --frozen coverage combine uv run --frozen coverage report --include='src/mcp/server/stdio.py' --fail-under=0 UV_FROZEN=1 uv run --frozen strict-no-cover~4s, catches the stale pragma, no false positives on partial runs (
strict-no-coveronly reportsexcluded ∩ executed— fewer executed lines just means fewer chances to catch, never a wrong flag).uv.lock pollution fix:
strict-no-coverinternally spawnssubprocess.run(['uv', 'run', 'coverage', 'json', ...])— no--frozen. On machines with registry overrides this rewritesuv.lock. SettingUV_FROZEN=1propagates to the child process. Verifieduv.lockstays clean through a full run.Breaking Changes
None.
./scripts/testtakes ~1s longer and exits nonzero when a stale pragma is found.Types of changes
Checklist
Additional context
The
# pragma: no branchquirk for nestedasync withis documented as what-it-is, not what-to-do-about-it; the existing ~100 annotations in the tree are the pattern reference.