Skip to content

Mutation testing, cost visibility, CI, and docs#40

Merged
haasonsaas merged 5 commits intomainfrom
improve/mutation-cost-docs-ci
Mar 14, 2026
Merged

Mutation testing, cost visibility, CI, and docs#40
haasonsaas merged 5 commits intomainfrom
improve/mutation-cost-docs-ci

Conversation

@haasonsaas
Copy link
Collaborator

Summary

  • storage_json tests: refresh_summary (get/list with comments → synthesized summary), get_event_stats (empty asserts, single-event exact values, by_repo avg_score), prune (age boundary not expired, max_count removes oldest).
  • docs/mutation-testing.md: How to run mutation testing, equivalent/accepted mutants (cost.rs, storage_json), pre-push vs CI, file globs for -f.
  • CI: New mutation job runs cargo mutants -f '*storage_json*' with 15min timeout; fails if missed count > 15 (baseline documented in doc).
  • README/CLAUDE: Note that mutation runs in CI only; optional git push --no-verify for quick pushes.
  • Analytics: Show "Est. cost (reviews): $X.XX" from useEventStats().total_cost_estimate when available.
  • web: cost.test.ts for formatCost, estimateCost (server preferred), totalCost.

Test plan

  • cargo test (all pass)
  • cd web && npm run test (all pass)
  • Pre-commit (fmt, clippy, web lint/tsc/test) passed on commit.

Made with Cursor

- storage_json: tests for refresh_summary, get_event_stats (empty/single/by_repo), prune boundary
- docs/mutation-testing.md: how to run, equivalent mutants, pre-push vs CI, file globs
- CI: mutation job on *storage_json* with timeout and missed-count baseline (15)
- README/CLAUDE: pre-push vs CI mutation note; optional --no-verify
- Analytics: show est. cost (reviews) from useEventStats when available
- web: cost.test.ts for formatCost, estimateCost, totalCost

Made-with: Cursor
@cursor
Copy link

cursor bot commented Mar 14, 2026

PR Summary

Medium Risk
Moderate risk due to new CI gating (mutation baseline) and a small behavior refactor in JsonStorageBackend::prune to support deterministic tests; could cause unexpected CI failures or pruning edge-case regressions.

Overview
Adds mutation testing to CI via a new mutation job running cargo mutants on storage_json with a 15-minute timeout and a baseline missed-mutant threshold, plus new docs/mutation-testing.md and README/CLAUDE updates clarifying that mutation runs only in CI.

Improves GitHub workflow reliability by publishing ghcr.io/evalops/diffscope images on main and making the DiffScope review workflow skip cleanly when the image or API key is unavailable.

Tightens storage_json with a deterministic prune_at(now_secs) helper and expands Rust tests to assert exact get_event_stats aggregates/percentiles, summary refresh behavior, and prune boundary/max-count behavior; the web UI now displays total estimated review cost from useEventStats().total_cost_estimate and adds unit tests for client-side cost formatting/estimation.

Written by Cursor Bugbot for commit 767a059. This will update automatically on new commits. Configure here.

Comment on lines +1479 to +1489

let removed = backend.prune(max_age, 1000).await.unwrap();
assert_eq!(
removed, 0,
"exactly at boundary (now - max_age) should not be pruned"
);

let list = backend.list_reviews(10, 0).await.unwrap();
assert_eq!(list.len(), 1);
assert_eq!(list[0].id, "boundary");
}

This comment was marked as outdated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: extracted prune_at(max_age_secs, max_count, now_secs) so the test can pass a single now and avoid the second-boundary race. prune() calls prune_at(..., SystemTime::now()); the boundary test calls backend.prune_at(max_age, 1000, now).await with one now_ts().

…available

- publish-image.yml: on push to main, build and push ghcr.io/evalops/diffscope:latest and :sha-<sha>
- diffscope.yml: check image available before Run DiffScope; skip gracefully with notice if pull fails (job no longer fails)

Made-with: Cursor
…re-pull failing job

GitHub pre-pulls container actions before any steps run; that pull was failing the job.
Run DiffScope via 'docker run' only when image is already available from Check image step.

Made-with: Cursor
… single timestamp

- Extract prune_at(max_age_secs, max_count, now_secs) so test and production share logic
- prune() calls prune_at(..., SystemTime::now()); test calls prune_at(..., now_ts()) once

Made-with: Cursor
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.

- name: Mutation test (storage_json)
run: |
timeout 900 cargo mutants -f '*storage_json*' 2>&1 | tee mutation.log || true
MISSED=$(grep -E '[0-9]+ missed' mutation.log | tail -1 | grep -oE '[0-9]+' | head -1 || echo "0")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grep extracts wrong number from cargo-mutants summary

High Severity

The grep chain parses the wrong number from the cargo-mutants summary line. The summary format is e.g. 14 mutants tested in 0:08: 2 missed, 9 caught, 3 unviable. The first grep -E '[0-9]+ missed' matches the full line, then grep -oE '[0-9]+' extracts all numbers (14, 0, 08, 2, 9, 3), and head -1 picks "14" (total mutants) instead of "2" (the actual missed count). This causes MISSED to be set to the total mutant count, making the baseline check nearly always fail spuriously.

Additional Locations (1)
Fix in Cursor Fix in Web

@haasonsaas haasonsaas merged commit bdb0533 into main Mar 14, 2026
13 checks passed
@haasonsaas haasonsaas deleted the improve/mutation-cost-docs-ci branch March 14, 2026 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant