Skip to content

Commit e1ae4f7

Browse files
hyperpolymathclaude
andcommitted
Remove template-only ABI files that falsely implied formal verification
Template Idris2 ABI files (Types.idr, Layout.idr, Foreign.idr) contained only RSR template scaffolding with unresolved placeholders and no domain-specific proofs. Removed to prevent false impression of formal verification coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0ae79b9 commit e1ae4f7

5 files changed

Lines changed: 110 additions & 623 deletions

File tree

PROOF-NEEDS.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Proof Requirements
2+
3+
## Current state
4+
- Extensive Idris2 ABI: `EchidnaABI/Types.idr` (655 lines), `EchidnaABI/Foreign.idr` (445 lines), `EchidnaABI/Layout.idr` (236 lines)
5+
- Prover-specific ABI: `EchidnaABI/Provers/` — InteractiveAssistants, SmtSolvers, FirstOrderAtp, DeclarativeProvers, AutoActive, ConstraintSolvers
6+
- `EchidnaABI/VqlUt.idr`, `EchidnaABI/Gnn.idr` — VQL and GNN type definitions
7+
- `src/rust/verification/axiom_tracker.rs` — Tracks `sorry`/`Admitted` patterns (detection, not usage)
8+
- `src/rust/provers/lean.rs` — Generates `sorry` as placeholder in Lean proof scaffolding (intentional — feeds goals to provers)
9+
- `src/rust/provers/coq.rs` — Parses `Admitted` in Coq output (detection, not usage)
10+
- No `believe_me` or `sorry` in the Idris2 ABI layer itself
11+
12+
## What needs proving
13+
- **Confidence scoring soundness**: Prove that confidence levels in `confidence.rs` form a valid lattice and that `sorry` detection correctly downgrades confidence
14+
- **Axiom tracking completeness**: Prove `axiom_tracker.rs` detects ALL dangerous patterns (no false negatives for `sorry`, `Admitted`, `believe_me`, `postulate`, `assert_total`, `unsafeCoerce`)
15+
- **Prover dispatch correctness**: Prove that proof goals are dispatched to compatible provers (e.g., linear logic goals do not go to first-order ATP)
16+
- **GNN embedding faithfulness**: Prove that proof-graph GNN embeddings preserve structural properties of the original proof tree
17+
- **VQL-UT query safety**: Prove VQL queries are injection-free and type-safe at the ABI boundary
18+
- **Proof composition**: Prove that combining sub-proofs from different provers preserves overall soundness (no implicit axiom conflicts)
19+
20+
## Recommended prover
21+
- **Idris2** for ABI-level properties and prover dispatch correctness (dependent types model the prover taxonomy well)
22+
- **Lean4** for confidence lattice algebraic properties
23+
- **Agda** for metatheoretic properties of proof composition
24+
25+
## Priority
26+
- **HIGH** — ECHIDNA is a proof verification orchestrator. A tool that checks proofs MUST itself be provably correct, or its guarantees are hollow. The axiom tracker and confidence scorer are the most critical targets.
27+
28+
## Template ABI Cleanup (2026-03-29)
29+
30+
Template ABI removed -- was creating false impression of formal verification.
31+
The removed files (Types.idr, Layout.idr, Foreign.idr) contained only RSR template
32+
scaffolding with unresolved {{PROJECT}}/{{AUTHOR}} placeholders and no domain-specific proofs.

TEST-NEEDS.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Test & Benchmark Requirements
2+
3+
## Current State
4+
- Unit tests: 684 pass / 0 fail / 16 ignored (556 in main crate + 128 across other crates)
5+
- Integration tests: partial (echidnabot has integration tests)
6+
- E2E tests: NONE
7+
- Benchmarks: 1,035 benchmark files (likely V-lang prover benchmarks)
8+
- panic-attack scan: NEVER RUN
9+
10+
## What's Missing
11+
### Point-to-Point (P2P)
12+
**Source counts:** 146 Rust + 6,694 V + 34 Zig + 42 Idris2 + 33 ReScript + 52 Julia + 39 JS + 226 Shell
13+
14+
#### Rust crate (146 files, ~684 tests — BEST tested component):
15+
- 79 files with #[test] in main crate — good inline test coverage
16+
- 3 files with #[test] in echidnabot — minimal
17+
- fuzz/ crate exists (Cargo.toml) — verify fuzzing actually runs
18+
- 9 ignored tests need investigation
19+
20+
#### V-lang prover (6,694 files — ZERO tests):
21+
- This is the bulk of the codebase
22+
- Prover correctness is critical — needs property-based testing
23+
- All proof dispatch, verification, and formal reasoning untested
24+
25+
#### Zig FFI (34 files — 7 test files):
26+
- Reasonable ratio but verify coverage
27+
28+
#### Idris2 ABI (42 files — ZERO tests):
29+
- Formal verification definitions should be self-verifying
30+
- But runtime behavior still needs integration testing
31+
32+
#### Julia (52 files — ZERO tests):
33+
- Likely analysis/reporting scripts — need at least smoke tests
34+
35+
#### ReScript (33 files — ZERO tests):
36+
- Dashboard/UI components — need render and interaction tests
37+
38+
#### Shell (226 files — ZERO tests):
39+
- Scripts need at least --help and dry-run validation
40+
41+
### End-to-End (E2E)
42+
- Full prover workflow: input problem -> preprocess -> prove -> verify result
43+
- Echidnabot: receive repo event -> analyze -> report findings
44+
- Dashboard: load data -> display results -> filter/search
45+
- Proof dispatch: select prover backend -> route -> collect results
46+
- VeriSimDB integration: write proof results -> query -> aggregate
47+
- Dodeca-API interaction
48+
- Report generation pipeline
49+
50+
### Aspect Tests
51+
- [ ] Security (proof validation bypass, malicious input to prover, echidnabot webhook injection)
52+
- [ ] Performance (prover on large problems, proof verification latency)
53+
- [ ] Concurrency (parallel proof attempts, concurrent echidnabot webhook processing)
54+
- [ ] Error handling (unsolvable problems, prover timeout, malformed input)
55+
- [ ] Accessibility (dashboard UI)
56+
57+
### Build & Execution
58+
- [x] cargo build — compiles
59+
- [x] cargo test — 684 pass, 0 fail, 16 ignored
60+
- [ ] V build — not verified
61+
- [ ] Zig build — not verified
62+
- [ ] echidnabot binary — not verified
63+
- [ ] Self-diagnostic — none
64+
65+
### Benchmarks Needed
66+
- Prover throughput on standard benchmark suites
67+
- Proof verification latency
68+
- VeriSimDB write/query performance
69+
- Echidnabot analysis speed per repo
70+
- Verify 1,035 existing benchmark files actually run (V-lang prover benchmarks?)
71+
72+
### Self-Tests
73+
- [ ] panic-attack assail on own repo
74+
- [ ] Echidnabot self-scan
75+
- [ ] Prover correctness validation suite
76+
77+
## Priority
78+
- **HIGH** — The Rust crate is well-tested (684 tests) but the V-lang prover (6,694 files) has ZERO tests. This is a formal verification tool where prover correctness is the entire value proposition. The 1,035 benchmark files need verification — if they are actually prover benchmarks, the situation is better than it appears for performance testing. But correctness testing for the prover logic is non-negotiable and completely absent.

echidna-playground/src/abi/Foreign.idr

Lines changed: 0 additions & 216 deletions
This file was deleted.

0 commit comments

Comments
 (0)