Skip to content

test: per-directory coverage + E2E cleanup hardening (Days 5-6)#1123

Open
tejaskash wants to merge 5 commits intomainfrom
worktree-test-cleanup-day5-7
Open

test: per-directory coverage + E2E cleanup hardening (Days 5-6)#1123
tejaskash wants to merge 5 commits intomainfrom
worktree-test-cleanup-day5-7

Conversation

@tejaskash
Copy link
Copy Markdown
Contributor

@tejaskash tejaskash commented May 5, 2026

Summary

Two days of test infrastructure work bundled together since they're orthogonal but small.

Day 5 — Per-directory coverage thresholds

  • New scripts/check-coverage.mjs enforces per-directory line coverage thresholds (ratchet values ~5% below current coverage; target values documented inline)
  • PR-changed-line coverage check (≥50%) with per-file breakdown
  • Switched davelosert/vitest-coverage-report-action to file-coverage-mode: changes so PR comments highlight uncovered new lines
  • Ratchet thresholds: schema 78%, operations 52%, lib 82%, aws 40%, tui/hooks 14%, tui/components 68%, commands 46%

Day 6 — E2E cleanup hardening + sweeper spec

  • teardownE2EProject() retries deploy up to 3× (15s backoff) and throws on failure instead of swallowing silently. Stops silent AWS resource leaks.
  • CDK stacks now tagged with Environment=e2e-test / CreatedAt / CreatedBy when AGENTCORE_E2E_TEST=1 is set (opt-in, user projects unaffected)
  • cleanupStaleCredentialProviders window shrunk from 30min → 5min — catches orphans faster
  • Cognito cleanup in byo-custom-jwt.test.ts retries 3× with 5s delay instead of swallowing errors
  • New docs/test-resource-sweeper-spec.md for the infra team to implement a scheduled sweeper

Test plan

  • node scripts/check-coverage.mjs passes against current coverage locally
  • Coverage job in CI runs and posts per-directory table on this PR
  • Intentionally lower a threshold → CI fails (verify the gate works)
  • Typecheck passes on all packages

@tejaskash tejaskash requested a review from a team May 5, 2026 19:02
@github-actions github-actions Bot added size/m PR size: M agentcore-harness-reviewing AgentCore Harness review in progress and removed size/m PR size: M labels May 5, 2026
@agentcore-cli-automation
Copy link
Copy Markdown

Reducing cleanupStaleCredentialProviders to 5 minutes looks risky for the full E2E suite.

cleanupStaleCredentialProviders() is called in beforeAll of every E2E test file, and it deletes any E2e* provider older than maxAgeMs. In e2e-tests-full.yml, the suite runs as a 6-shard × 2-source matrix (12 parallel jobs) against one AWS account, and within a shard vitest runs multiple test files in parallel — so many tests' credential providers live concurrently in the same namespace.

With a 5-minute window, a test whose deploy takes >5 min (container builds have a 900s/15-min timeout, and regular deploys routinely go past 5 min) can have its active credential provider deleted by another test file's beforeAll that fires later. Net effect: flakes caused by the cleanup itself.

A few options, pick whichever fits best:

  1. Keep the window generous (e.g. 20–30 min) — the stated goal is catching crash-orphaned providers, and those sit around for hours until the sweeper runs, so 5 min buys very little.
  2. Instead of age-based cleanup, track provider names created by the current run (e.g. via a file or env-scoped prefix) and only delete foreign/orphaned names.
  3. Gate the 5-min value on something like maxAgeMs = parseInt(process.env.E2E_STALE_CRED_MAX_AGE_MS ?? '', 10) || 30*60*1000 so the scheduled sweeper can use a tight window but normal CI runs stay safe.

@agentcore-cli-automation
Copy link
Copy Markdown

teardownE2EProject throwing on failure now leaks the test temp directory.

In createE2ESuite's afterAll (e2e-helper.ts:98–103):

afterAll(async () => {
  if (projectPath && hasAws) {
    await teardownE2EProject(projectPath, agentName, cfg.modelProvider);
  }
  if (testDir) await rm(testDir, { recursive: true, force: true, maxRetries: 3, retryDelay: 1000 });
}, 600000);

Previously teardownE2EProject swallowed errors, so the rm(testDir, ...) always ran. After this PR it throws, which means every teardown failure now also leaks testDir under os.tmpdir() on the runner. On top of that, any AWS cleanup the caller (evals-lifecycle, archive-lifecycle, ab-test-*, config-bundle-eval-rec, etc.) does after the teardownE2EProject call in its own afterAll is skipped too.

Options:

  1. Wrap the teardown call in the afterAll in a try { ... } finally { await rm(testDir, ...); } (and similarly in the other suites that call it directly).
  2. In teardownE2EProject, record the failure (console.error + maybe write a marker file) instead of throwing, and fail the run via a different mechanism (e.g. a top-level afterAll that asserts no recorded failures).
  3. Keep the throw but restructure each caller's afterAll so cleanup steps that must always run are in finally blocks.

@agentcore-cli-automation
Copy link
Copy Markdown

The PR-changed-line gate silently ignores new/changed .tsx files, which is a big hole in the coverage gate.

In scripts/check-coverage.mjs:

const coveredSet = coveredLinesByFile[file];
if (!coveredSet) continue;

coveredLinesByFile is built from coverage/coverage-final.json, and vitest.config.ts sets coverage.include: ['src/**/*.ts'] (no .tsx). That glob doesn't match .tsx, so TUI files like src/cli/tui/**/*.tsx never appear in the coverage JSON → coveredSet is undefined → those lines are continued past, contributing 0 to both numerator and denominator.

Meanwhile the git-diff in the same script explicitly includes src/**/*.tsx:

`git diff ${baseSha}...${headSha} --unified=0 --diff-filter=AM -- 'src/**/*.ts' 'src/**/*.tsx'`

So a PR that adds a brand-new .tsx file full of untested code will show 0 changed lines counted and pass the 50% threshold cleanly. Same for any edit to existing .tsx files — they're invisible to the gate. Given src/cli/tui/components/ has its own 68% ratchet, this is almost certainly unintentional.

Options:

  1. Extend coverage.include in vitest.config.ts to ['src/**/*.ts', 'src/**/*.tsx'] so .tsx files show up in the report (preferred — also fixes the per-directory gate for TUI components).
  2. In check-coverage.mjs, treat coveredSet === undefined for a production file as 0% covered (i.e. count all changed lines as uncovered) instead of skipping.
  3. At minimum, drop src/**/*.tsx from the git diff pathspec and document that .tsx isn't gated, so the hole is explicit rather than silent.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label May 5, 2026
@tejaskash
Copy link
Copy Markdown
Contributor Author

Addressed — reverted the default to 30 min and added an opt-in E2E_STALE_CRED_MAX_AGE_MS env var so the scheduled sweeper can still use a tight window without risking normal CI.

@tejaskash
Copy link
Copy Markdown
Contributor Author

Addressed — wrapped the teardownE2EProject call in try/finally across createE2ESuite + all 5 standalone callers (ab-test-target-based, ab-test-config-bundle, archive-lifecycle, config-bundle-eval-rec, evals-lifecycle) so testDir cleanup always runs.

@tejaskash
Copy link
Copy Markdown
Contributor Author

Addressed — extended vitest.config.ts coverage include to src/**/*.tsx (and excluded *.test.tsx), plus hardened check-coverage.mjs to treat a changed production file missing from coverage as 0% covered instead of skipping. Adjusted the src/cli/commands/ ratchet from 46% → 34% to reflect the new denominator (3424 → 4553 lines with .tsx included).

@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels May 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 33.39% 9015 / 26999
🔵 Statements 32.76% 9573 / 29214
🔵 Functions 29.2% 1553 / 5318
🔵 Branches 28.38% 5808 / 20465
File CoverageNo changed files found.
Generated in workflow #2471 for commit 9c0156f by the Vitest Coverage Report Action

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Test Coverage Report

Overall

Metric Coverage
Statements 32.76%
Branches 28.38%
Functions 29.2%
Lines 33.39%

Per-Directory Line Coverage

Directory Coverage Ratchet Target Status
src/schema/ 83.0% (322/388) 78% 85% PASS
src/cli/operations/ 57.0% (2457/4314) 52% 65% PASS
src/lib/ 87.1% (695/798) 82% 85% PASS
src/cli/aws/ 45.6% (579/1270) 40% 60% PASS
src/cli/tui/hooks/ 18.8% (280/1489) 14% 55% PASS
src/cli/tui/components/ 73.9% (886/1199) 68% 75% PASS
src/cli/commands/ 39.0% (1781/4561) 34% 60% PASS

PR Changed Lines

100.0% (0/0) — threshold 50% — PASS

tejaskash added 5 commits May 5, 2026 19:06
Day 5: Coverage gate
- Add scripts/check-coverage.mjs with ratchet thresholds per source directory
  (schema 78%, operations 52%, lib 82%, aws 40%, tui/hooks 14%,
  tui/components 68%, commands 46%). Target thresholds documented in the
  report, ratchet set ~5% below current to prevent regression.
- Also check PR changed-line coverage (>=50%) and post a report comment.
- Switch davelosert/vitest-coverage-report-action to file-coverage-mode: changes
  so PR comments highlight uncovered new lines.

Day 6: E2E cleanup
- teardownE2EProject() now retries deploy up to 3x and throws on failure
  instead of swallowing silently. The remove step logs but does not retry.
- Tag CDK stacks with Environment=e2e-test / CreatedAt / CreatedBy when
  AGENTCORE_E2E_TEST=1 is set. The CLI's e2e helper sets this env var;
  user projects are unaffected.
- Shrink cleanupStaleCredentialProviders() window from 30min -> 5min to
  catch orphans faster.
- byo-custom-jwt.test.ts Cognito cleanup now retries 3x with 5s delay
  instead of swallowing errors on first try.
- Add docs/test-resource-sweeper-spec.md for the infra team to implement
  a scheduled sweeper (CFN stacks, credential providers, log groups, ECR).
1. cleanupStaleCredentialProviders default restored to 30 min, now
   configurable via E2E_STALE_CRED_MAX_AGE_MS env var. 5 min was too
   aggressive given container builds >5 min.
2. afterAll blocks that call teardownE2EProject now wrap in try/finally
   so testDir cleanup always runs even when teardown throws. Updated
   createE2ESuite + 5 standalone suites (ab-test-*, archive-lifecycle,
   config-bundle-eval-rec, evals-lifecycle).
3. vitest coverage include adds src/**/*.tsx so TUI React files are
   gated. Also exclude *.test.tsx from the coverage set. Adjusted
   src/cli/commands/ ratchet from 46% to 34% to reflect the new
   denominator (was 3424 lines, now 4553 with .tsx included).
4. check-coverage.mjs no longer silently skips changed files missing
   from coverage-final.json — counts them as 0% covered instead.
@tejaskash tejaskash force-pushed the worktree-test-cleanup-day5-7 branch from ff38b62 to 9c0156f Compare May 5, 2026 23:17
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels May 5, 2026
afterAll(async () => {
if (projectPath && hasAws) {
await teardownE2EProject(projectPath, agentName, 'Bedrock');
try {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: does it make sense to consolidate this cleanup into a common util?

});
}

async function deleteCognitoResourceWithRetry(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this function is called deleteCognitoResourceWithRetry, but is actually a generalized retry with cognito specific logs.

Does it make sense to fully generalize it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants