feat(formal): stand up Coq formal/ track + mechanize the K-1 Wave-0 seed#620
Merged
Conversation
… seed
Wave 0 of the PROOF-NEEDS.adoc programme, on the keystone obligation K-1
(codegen -> typed-WASM semantic-preservation). Prover: Coq/Rocq 8.18 —
chosen for typed-wasm / ephapax interop (both Coq `Semantics.v`).
formal/K1_CodegenPreservation.v proves, with NO `Admitted` and NO axioms
(`Print Assumptions`: "Closed under the global context"), a complete
compiler-correctness theorem for a minimal AffineScript fragment:
Definition K1_preservation : Prop :=
forall e v, seval e = Some v -> wexec (compile e) [] = Some [obs v].
Theorem k1_preservation_holds : K1_preservation. (* proven *)
i.e. source big-step eval ⇒ the compiled stack-machine code yields the
corresponding wasm value. The fragment (nat/bool · add/and → a little
stack machine) is deliberately tiny; the real AST + real typed-WASM
semantics remain the open obligation, expanded later the way solo-core's
Duet/Ensemble tracks expand Solo. This mirrors how SameCube.agda grounds
F-2 with a real proof rather than a hole.
Coq `.v` policy carve-out (the `.v` extension is shared with the banned
V-lang and with Verilog — Coq is neither):
- `.hypatia-ignore`: explicit `formal/*.v` exemption from
`cicd_rules/vlang_detected` (+ banned_language_file), so no sweep can
mis-flag Coq as V-lang. Coq has no `v.mod` → `vmod_detected` never fires.
- `.claude/CLAUDE.md`: new "Formal-methods Coq `.v` (NOT V-lang)" note
documenting the carve-out, the no-Admitted/no-axiom rule, and that
these files must not be migrated/deleted as V-lang.
Track scaffolding: README.adoc, _CoqProject, justfile (`check` recipe
type-checks and asserts the proof is axiom-free), .gitignore for Coq
artifacts.
Docs synced: PROOF-NEEDS.adoc K-1 prose→partial + `formal/` now exists;
FRG-PROFILE.adoc "no formalisation directory" gap met (grade stays E —
D needs type-preservation/progress for the affine calculus, distinct
from this codegen-preservation seed).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KPG9mEQXFyA3k7NWAzMNMr
🔍 Hypatia Security ScanFindings: 41 issues detected
View findings[
{
"reason": "Action denoland/setup-deno@v2 needs attention",
"type": "unpinned_action",
"file": "publish-jsr.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in instant-sync.yml",
"type": "secret_action_without_presence_gate",
"file": "instant-sync.yml",
"action": "peter-evans/repository-dispatch",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
"type": "js_exec_sync",
"file": "/home/runner/work/affinescript/affinescript/packages/affinescript-cli/mod.js",
"action": "flag",
"rule_module": "code_safety",
"severity": "high"
},
{
"reason": "Shell execution -- validate input before passing to shell (2 occurrences, CWE-78)",
"type": "js_exec_sync",
"file": "/home/runner/work/affinescript/affinescript/packages/affine-vscode/mod.js",
"action": "flag",
"rule_module": "code_safety",
"severity": "high"
},
{
"reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
"type": "js_exec_sync",
"file": "/home/runner/work/affinescript/affinescript/affinescript-vite/src/affine-plugin-improved.js",
"action": "flag",
"rule_module": "code_safety",
"severity": "high"
},
{
"reason": "expect() in hot path (32 occurrences, CWE-754)",
"type": "expect_in_hot_path",
"file": "/home/runner/work/affinescript/affinescript/affinescriptiser/src/codegen/wasm_gen.rs",
"action": "flag",
"rule_module": "code_safety",
"severity": "medium"
},
{
"reason": "expect() in hot path (29 occurrences, CWE-754)",
"type": "expect_in_hot_path",
"file": "/home/runner/work/affinescript/affinescript/affinescriptiser/src/codegen/affine_gen.rs",
"action": "flag",
"rule_module": "code_safety",
"severity": "medium"
},
{
"reason": "unsafe block -- requires SAFETY comment (2 occurrences, CWE-676)",
"type": "unsafe_block",
"file": "/home/runner/work/affinescript/affinescript/runtime/src/panic.rs",
"action": "flag",
"rule_module": "code_safety",
"severity": "medium"
},
{
"reason": "unsafe block -- requires SAFETY comment (1 occurrences, CWE-676)",
"type": "unsafe_block",
"file": "/home/runner/work/affinescript/affinescript/runtime/src/alloc.rs",
"action": "flag",
"rule_module": "code_safety",
"severity": "medium"
},
{
"reason": "unsafe block -- requires SAFETY comment (3 occurrences, CWE-676)",
"type": "unsafe_block",
"file": "/home/runner/work/affinescript/affinescript/runtime/src/ffi.rs",
"action": "flag",
"rule_module": "code_safety",
"severity": "medium"
}
]Powered by Hypatia Neurosymbolic CI/CD Intelligence |
hyperpolymath
added a commit
that referenced
this pull request
Jun 21, 2026
…622) ## Why A reader (human or agent) asking "is AffineScript sound / what's broken?" could open any of ~6 docs and get a **stale answer in the dangerous direction**. The known holes `#554/#555/#556/#558/#559` were fixed, fenced, or removed across 2026-05/06, but their status was duplicated across the capability matrix, a `CLAUDE.md` survey block, `STATE-*` snapshots, the wiki, etc. — surfaces that drift independently. (This PR was prompted by exactly that: a stale answer sourced from `CAPABILITY-MATRIX.adoc` + the `CLAUDE.md` survey.) This fixes the **content and the layout** so it can't silently rot again. ## What **New single source of truth — `docs/SOUNDNESS.adoc`** - The one place soundness-hole status lives. **Test-anchored**: every row names the fixture/test that proves it. - Carries a freshness stamp (`:ground-truth-sha:` + date). Ground-truthed against a green `dune build` / `dune runtest` at `d55e22c`. - Honest about residuals (interpreter non-tail resume, Lean/Why3 `return`-drop, #559 generic overlap) and about the implementation-vs-proof distinction. **New anti-staleness gate — `tools/check-soundness-ledger.sh`** (wired into `just guard` + CI) - Fails the build if the ledger loses its primacy declaration or freshness stamp, if any **anchor fixture goes missing**, or if a status surface stops linking back to the ledger. This binds prose to executable truth. - Verified it *bites*: returns non-zero and names the offender on a missing anchor. **Corrected every live status surface** to ground-truth + made them defer to the ledger: `README`, `CAPABILITY-MATRIX` (borrow/effects/refinement/traits rows + anti-over-claim bullet + See-also), `PROOF-NEEDS` (holes block + P-9/P-10), `NAVIGATION`, `reference/COMPILER-CAPABILITIES`, `TECH-DEBT` (CORE-04/05), the wiki (README + traits + dependent-types), `STATE.a2ml`, agent debt, and the `CLAUDE.md` survey. Dated snapshot `STATE-2026-06-11` is **capped with a superseded banner**, not rewritten. ## Ground-truth recorded (verified in source + green suite) | Issue | Was documented as | Actually | |---|---|---| | #554 | open use-after-move | **fixed** — rejected `MoveWhileBorrowed` | | #555 | silently mis-lowered | **fenced loud** on every compiled backend; 1 pinned interp residual | | #556 | silent sync fallback | **fixed** — fails loud | | #558 | parse-only/unenforced | **removed** in v1; `assume(...)` rejected at parse | | #559 | coherence unchecked | **fixed** for concrete overlaps (wired in `typecheck.ml`) | | #553 | "0% implemented" | M1–M3, **test-only/unwired** | Closing these *implementation* holes is **not** the same as *proving* soundness — the metatheory is still prose (one Wave-0 Coq seed from #620 noted), per `PROOF-NEEDS.adoc`. ## Note for review - The `.claude/CLAUDE.md` change is **body-only** (the stale survey → a deferral to the ledger). I did **not** add or alter its license header — that's the owner-gated act flagged in the soundness handoff. - No code touched; docs + one shell gate + CI/justfile wiring. `dune build` and `dune runtest` are green at `d55e22c`. - `AFFIRMATION.adoc` (a parked, dated attestation) was deliberately left untouched. 🤖 Generated with [Claude Code](https://claude.com/claude-code) https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa --- _Generated by [Claude Code](https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa)_ Co-authored-by: Claude <noreply@anthropic.com>
This was referenced Jun 21, 2026
hyperpolymath
added a commit
that referenced
this pull request
Jun 21, 2026
… xfail pin-liveness (#631) Makes `docs/SOUNDNESS.adoc` keep every promise it makes. The ledger on `main` is "prose ahead of mechanism" (it claims content-binding / stamp-enforcement / pinned xfails, but the gate enforced only 2 of those). This builds the missing mechanism, and folds in the closed-#625 capability-matrix anchoring. ## The five properties (each maps to a function in the gate) | # | Property | Function | Provenance | |---|----------|----------|-----------| | 1 | Anchors exist | `check_anchors_exist` | Jonathan's #622 design (kept) | | 2 | Back-links | `check_backlinks` | Jonathan's #622 design (kept) | | 3 | **Content-binding** | `check_content_binding` + `tools/soundness-anchors.sha256` + `--reseal` | **new** | | 4 | **Stamp-enforcement** | `check_stamp` | **new** | | 5 | **Pin-liveness (xfail)** | `check_pins` + `test/xfail/test_xfail_pins.ml` | **new** | `## What this gate enforces` is documented at the top of the script. Everything **fails closed**. ## Ground-truth correction (compiler wins) Running the compiler showed **#559 generic-subsumption is already detected/rejected** (`impl[T] Greet for Box[T]` vs `impl Greet for Box[Int]` → "Trait coherence violation"). So the ledger's `open (tracked)` "not yet detected" was stale **in the dangerous direction**. Corrected to `fixed` with a positive test; the stale `test_e2e.ml` comment fixed. → one fewer xfail pin than the spec assumed. Also: the stub-return row uses **#624** (the real tracker); #560 is *variable-string wasm ops*, unrelated — this change supplies the pin #628 couldn't (the fixture/test now exist). Stamp re-pointed to `dd6c19e` (a real main-ancestor; the old `d55e22c` was squash-orphaned). Metatheory note updated for the new `formal/` proofs (#620–#627). ## Self-tests — each new check watched failing ``` SELF-TEST 1 — Property 3 (mutate a fixture by one token): ERROR (property 3): anchor content drift vs tools/soundness-anchors.sha256 ... SELF-TEST 2 — Property 4 (un-advanced/orphaned stamp + soundness change): ERROR (property 4): stamp d55e22c is not an ancestor of HEAD; re-point :ground-truth-sha: ... SELF-TEST (5a) — Property 5 (pinned row names a missing pin): ERROR (property 1): test anchor not defined: test_stub_backend_return_DELETED FATAL: anchor test:test_stub_backend_return_DELETED: expected exactly one defining file, found 0 (fail closed) SELF-TEST (5b) — Property 5 (an xfail pin flips to XPASS): ALARM (property 5): pin test_resume_nontail_xfail is PASSING — the hole may be fixed. Open docs/SOUNDNESS.adoc and update the row to 'fixed' (do NOT just silence the pin). ``` Full suite green (534 tests; xfail harness reports both pins `XFAIL-OK`), all four guard gates green, `dune build`/`dune runtest` green at `dd6c19e`. ## Claims I could not make fully mechanical (named, not silently softened) 1. **Content-binding scope.** Fixtures + pinned-test *bodies* are digest-bound (11/12 anchors); the one SUITE-file anchor (`#553` → `test/test_borrow_polonius.ml`) is existence+stamp-checked only — a whole-file hash is too coarse. The ledger sentence was tightened to say exactly this. 2. **Stamp "advanced-in-this-change" detection** is robust for the normal *branch-off-fresh-main* workflow (and the orphaned-stamp case fails closed, self-test 2). It has a known edge in a *multi-commit-since-stamp* history (stamp bumped in an earlier commit, soundness changed again later without re-bump could read as "advanced"); decision-2's full "diff-on-main" freshness check is not separately implemented. Flagged for your call. ## CI `build` job now checks out `fetch-depth: 0` so property 4 can resolve the stamp; the xfail harness is in `.ocamlformat-ignore` (authored without ocamlformat available). 🤖 Generated with [Claude Code](https://claude.com/claude-code) https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa --- _Generated by [Claude Code](https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa)_ Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Wave 0 of the
docs/PROOF-NEEDS.adocprogramme, on the keystone obligation K-1 (codegen → typed-WASM semantic-preservation). Stands up theformal/directory (the dir #513 names) and lands a fully mechanized, axiom-free Coq proof for a minimal fragment.Prover: Coq/Rocq 8.18 — chosen for K-1 because the typed-WASM target semantics interoperate with
typed-wasmand ephapax, both of which use CoqSemantics.v.The proof —
formal/K1_CodegenPreservation.vA complete compiler-correctness theorem with no
Admitted, no axioms (Print Assumptions→ "Closed under the global context"):i.e. whenever the source big-step evaluates to
v, the compiled stack-machine code run on the empty operand stack yields exactly the corresponding wasm value[obs v]. The fragment (nat/bool ·add/and→ a little stack machine standing in for typed-WASM) is deliberately tiny; the real AST + real typed-WASM operational semantics remain the open obligation, expanded later the way solo-core's Duet/Ensemble tracks expand Solo. This mirrors howinvariant-path/proofs/SameCube.agdagrounds F-2 with a real proof rather than a hole.Coq
.vpolicy carve-out (the point you raised).vis shared by Coq, Verilog, and the estate-banned V-lang (→ Zig). Coq is neither — so this PR makes the distinction explicit so nothing can sweep it up:.hypatia-ignore— explicitformal/*.vexemption fromcicd_rules/vlang_detected(+banned_language_file). Coq ships nov.mod, sovmod_detectednever fires..claude/CLAUDE.md— new "Formal-methods Coq.v(NOT V-lang)" note: documents the estatepath_allow_prefixescarve-out for Coq proof scripts, the no-Admitted/no-axiom rule, the Coq-vs-Idris2 prover split (Coq here for typed-wasm interop; solo-core stays Idris2), and "do not migrate/delete these as V-lang."Track scaffolding
README.adoc,_CoqProject,justfile(checkrecipe type-checks and asserts the proof is axiom-free),.gitignorefor Coq artifacts.Docs synced
PROOF-NEEDS.adoc: K-1prose→partial;formal/now exists; Wave-0 row marked in progress.FRG-PROFILE.adoc: the "no formalisation directory" honest-gap is met (grade stays E — D needs type-preservation/progress for the affine calculus, a theorem distinct from this codegen-preservation seed).How to check
just -f formal/justfile check # or: cd formal && coqc K1_CodegenPreservation.vRequires Coq 8.18+. Verified locally: compiles clean,
Print Assumptionsclosed.🤖 Generated with Claude Code
https://claude.ai/code/session_01KPG9mEQXFyA3k7NWAzMNMr
Generated by Claude Code