Mutation testing, cost visibility, CI, and docs by haasonsaas · Pull Request #40 · evalops/diffscope

haasonsaas · 2026-03-14T17:10:52Z

Summary

storage_json tests: refresh_summary (get/list with comments → synthesized summary), get_event_stats (empty asserts, single-event exact values, by_repo avg_score), prune (age boundary not expired, max_count removes oldest).
docs/mutation-testing.md: How to run mutation testing, equivalent/accepted mutants (cost.rs, storage_json), pre-push vs CI, file globs for -f.
CI: New mutation job runs cargo mutants -f '*storage_json*' with 15min timeout; fails if missed count > 15 (baseline documented in doc).
README/CLAUDE: Note that mutation runs in CI only; optional git push --no-verify for quick pushes.
Analytics: Show "Est. cost (reviews): $X.XX" from useEventStats().total_cost_estimate when available.
web: cost.test.ts for formatCost, estimateCost (server preferred), totalCost.

Test plan

cargo test (all pass)
cd web && npm run test (all pass)
Pre-commit (fmt, clippy, web lint/tsc/test) passed on commit.

Made with Cursor

- storage_json: tests for refresh_summary, get_event_stats (empty/single/by_repo), prune boundary - docs/mutation-testing.md: how to run, equivalent mutants, pre-push vs CI, file globs - CI: mutation job on *storage_json* with timeout and missed-count baseline (15) - README/CLAUDE: pre-push vs CI mutation note; optional --no-verify - Analytics: show est. cost (reviews) from useEventStats when available - web: cost.test.ts for formatCost, estimateCost, totalCost Made-with: Cursor

cursor · 2026-03-14T17:10:57Z

PR Summary

Medium Risk
Moderate risk due to new CI gating (mutation baseline) and a small behavior refactor in JsonStorageBackend::prune to support deterministic tests; could cause unexpected CI failures or pruning edge-case regressions.

Overview
Adds mutation testing to CI via a new mutation job running cargo mutants on storage_json with a 15-minute timeout and a baseline missed-mutant threshold, plus new docs/mutation-testing.md and README/CLAUDE updates clarifying that mutation runs only in CI.

Improves GitHub workflow reliability by publishing ghcr.io/evalops/diffscope images on main and making the DiffScope review workflow skip cleanly when the image or API key is unavailable.

Tightens storage_json with a deterministic prune_at(now_secs) helper and expands Rust tests to assert exact get_event_stats aggregates/percentiles, summary refresh behavior, and prune boundary/max-count behavior; the web UI now displays total estimated review cost from useEventStats().total_cost_estimate and adds unit tests for client-side cost formatting/estimation.

^{Written by Cursor Bugbot for commit 767a059. This will update automatically on new commits. Configure here.}

haasonsaas · 2026-03-14T17:23:00Z

src/server/storage_json.rs

+
+        let removed = backend.prune(max_age, 1000).await.unwrap();
+        assert_eq!(
+            removed, 0,
+            "exactly at boundary (now - max_age) should not be pruned"
+        );
+
+        let list = backend.list_reviews(10, 0).await.unwrap();
+        assert_eq!(list.len(), 1);
+        assert_eq!(list[0].id, "boundary");
+    }


Addressed: extracted prune_at(max_age_secs, max_count, now_secs) so the test can pass a single now and avoid the second-boundary race. prune() calls prune_at(..., SystemTime::now()); the boundary test calls backend.prune_at(max_age, 1000, now).await with one now_ts().

…available - publish-image.yml: on push to main, build and push ghcr.io/evalops/diffscope:latest and :sha-<sha> - diffscope.yml: check image available before Run DiffScope; skip gracefully with notice if pull fails (job no longer fails) Made-with: Cursor

…ilure never fails job Made-with: Cursor

…re-pull failing job GitHub pre-pulls container actions before any steps run; that pull was failing the job. Run DiffScope via 'docker run' only when image is already available from Check image step. Made-with: Cursor

… single timestamp - Extract prune_at(max_age_secs, max_count, now_secs) so test and production share logic - prune() calls prune_at(..., SystemTime::now()); test calls prune_at(..., now_ts()) once Made-with: Cursor

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.}

cursor · 2026-03-14T17:23:19Z

.github/workflows/ci.yml

+      - name: Mutation test (storage_json)
+        run: |
+          timeout 900 cargo mutants -f '*storage_json*' 2>&1 | tee mutation.log || true
+          MISSED=$(grep -E '[0-9]+ missed' mutation.log | tail -1 | grep -oE '[0-9]+' | head -1 || echo "0")


Grep extracts wrong number from cargo-mutants summary

High Severity

The grep chain parses the wrong number from the cargo-mutants summary line. The summary format is e.g. 14 mutants tested in 0:08: 2 missed, 9 caught, 3 unviable. The first grep -E '[0-9]+ missed' matches the full line, then grep -oE '[0-9]+' extracts all numbers (14, 0, 08, 2, 9, 3), and head -1 picks "14" (total mutants) instead of "2" (the actual missed count). This causes MISSED to be set to the total mutant count, making the baseline check nearly always fail spuriously.

Additional Locations (1)

.github/workflows/ci.yml#L87-L88

sentry bot reviewed Mar 14, 2026

View reviewed changes

haasonsaas added 4 commits March 14, 2026 10:14

diffscope review: continue-on-error on Run DiffScope so image pull fa…

52e44da

…ilure never fails job Made-with: Cursor

cursor bot reviewed Mar 14, 2026

View reviewed changes

haasonsaas merged commit bdb0533 into main Mar 14, 2026
13 checks passed

haasonsaas deleted the improve/mutation-cost-docs-ci branch March 14, 2026 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutation testing, cost visibility, CI, and docs#40

Mutation testing, cost visibility, CI, and docs#40
haasonsaas merged 5 commits intomainfrom
improve/mutation-cost-docs-ci

haasonsaas commented Mar 14, 2026

Uh oh!

cursor bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

haasonsaas Mar 14, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

haasonsaas commented Mar 14, 2026

Summary

Test plan

Uh oh!

cursor bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

This comment was marked as outdated.

Uh oh!

haasonsaas Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 14, 2026

Choose a reason for hiding this comment

Grep extracts wrong number from cargo-mutants summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cursor bot commented Mar 14, 2026 •

edited

Loading