|
| 1 | +# Test & Benchmark Requirements |
| 2 | + |
| 3 | +## Current State |
| 4 | +- Unit tests: 684 pass / 0 fail / 16 ignored (556 in main crate + 128 across other crates) |
| 5 | +- Integration tests: partial (echidnabot has integration tests) |
| 6 | +- E2E tests: NONE |
| 7 | +- Benchmarks: 1,035 benchmark files (likely V-lang prover benchmarks) |
| 8 | +- panic-attack scan: NEVER RUN |
| 9 | + |
| 10 | +## What's Missing |
| 11 | +### Point-to-Point (P2P) |
| 12 | +**Source counts:** 146 Rust + 6,694 V + 34 Zig + 42 Idris2 + 33 ReScript + 52 Julia + 39 JS + 226 Shell |
| 13 | + |
| 14 | +#### Rust crate (146 files, ~684 tests — BEST tested component): |
| 15 | +- 79 files with #[test] in main crate — good inline test coverage |
| 16 | +- 3 files with #[test] in echidnabot — minimal |
| 17 | +- fuzz/ crate exists (Cargo.toml) — verify fuzzing actually runs |
| 18 | +- 9 ignored tests need investigation |
| 19 | + |
| 20 | +#### V-lang prover (6,694 files — ZERO tests): |
| 21 | +- This is the bulk of the codebase |
| 22 | +- Prover correctness is critical — needs property-based testing |
| 23 | +- All proof dispatch, verification, and formal reasoning untested |
| 24 | + |
| 25 | +#### Zig FFI (34 files — 7 test files): |
| 26 | +- Reasonable ratio but verify coverage |
| 27 | + |
| 28 | +#### Idris2 ABI (42 files — ZERO tests): |
| 29 | +- Formal verification definitions should be self-verifying |
| 30 | +- But runtime behavior still needs integration testing |
| 31 | + |
| 32 | +#### Julia (52 files — ZERO tests): |
| 33 | +- Likely analysis/reporting scripts — need at least smoke tests |
| 34 | + |
| 35 | +#### ReScript (33 files — ZERO tests): |
| 36 | +- Dashboard/UI components — need render and interaction tests |
| 37 | + |
| 38 | +#### Shell (226 files — ZERO tests): |
| 39 | +- Scripts need at least --help and dry-run validation |
| 40 | + |
| 41 | +### End-to-End (E2E) |
| 42 | +- Full prover workflow: input problem -> preprocess -> prove -> verify result |
| 43 | +- Echidnabot: receive repo event -> analyze -> report findings |
| 44 | +- Dashboard: load data -> display results -> filter/search |
| 45 | +- Proof dispatch: select prover backend -> route -> collect results |
| 46 | +- VeriSimDB integration: write proof results -> query -> aggregate |
| 47 | +- Dodeca-API interaction |
| 48 | +- Report generation pipeline |
| 49 | + |
| 50 | +### Aspect Tests |
| 51 | +- [ ] Security (proof validation bypass, malicious input to prover, echidnabot webhook injection) |
| 52 | +- [ ] Performance (prover on large problems, proof verification latency) |
| 53 | +- [ ] Concurrency (parallel proof attempts, concurrent echidnabot webhook processing) |
| 54 | +- [ ] Error handling (unsolvable problems, prover timeout, malformed input) |
| 55 | +- [ ] Accessibility (dashboard UI) |
| 56 | + |
| 57 | +### Build & Execution |
| 58 | +- [x] cargo build — compiles |
| 59 | +- [x] cargo test — 684 pass, 0 fail, 16 ignored |
| 60 | +- [ ] V build — not verified |
| 61 | +- [ ] Zig build — not verified |
| 62 | +- [ ] echidnabot binary — not verified |
| 63 | +- [ ] Self-diagnostic — none |
| 64 | + |
| 65 | +### Benchmarks Needed |
| 66 | +- Prover throughput on standard benchmark suites |
| 67 | +- Proof verification latency |
| 68 | +- VeriSimDB write/query performance |
| 69 | +- Echidnabot analysis speed per repo |
| 70 | +- Verify 1,035 existing benchmark files actually run (V-lang prover benchmarks?) |
| 71 | + |
| 72 | +### Self-Tests |
| 73 | +- [ ] panic-attack assail on own repo |
| 74 | +- [ ] Echidnabot self-scan |
| 75 | +- [ ] Prover correctness validation suite |
| 76 | + |
| 77 | +## Priority |
| 78 | +- **HIGH** — The Rust crate is well-tested (684 tests) but the V-lang prover (6,694 files) has ZERO tests. This is a formal verification tool where prover correctness is the entire value proposition. The 1,035 benchmark files need verification — if they are actually prover benchmarks, the situation is better than it appears for performance testing. But correctness testing for the prover logic is non-negotiable and completely absent. |
0 commit comments