Skip to content

feat(formal): stand up Coq formal/ track + mechanize the K-1 Wave-0 seed#620

Merged
hyperpolymath merged 1 commit into
mainfrom
claude/lucid-cray-4a22dp
Jun 21, 2026
Merged

feat(formal): stand up Coq formal/ track + mechanize the K-1 Wave-0 seed#620
hyperpolymath merged 1 commit into
mainfrom
claude/lucid-cray-4a22dp

Conversation

@hyperpolymath

Copy link
Copy Markdown
Owner

What

Wave 0 of the docs/PROOF-NEEDS.adoc programme, on the keystone obligation K-1 (codegen → typed-WASM semantic-preservation). Stands up the formal/ directory (the dir #513 names) and lands a fully mechanized, axiom-free Coq proof for a minimal fragment.

Prover: Coq/Rocq 8.18 — chosen for K-1 because the typed-WASM target semantics interoperate with typed-wasm and ephapax, both of which use Coq Semantics.v.

The proof — formal/K1_CodegenPreservation.v

A complete compiler-correctness theorem with no Admitted, no axioms (Print Assumptions"Closed under the global context"):

Definition K1_preservation : Prop :=
  forall e v, seval e = Some v -> wexec (compile e) [] = Some [obs v].

Theorem k1_preservation_holds : K1_preservation.   (* proven *)

i.e. whenever the source big-step evaluates to v, the compiled stack-machine code run on the empty operand stack yields exactly the corresponding wasm value [obs v]. The fragment (nat/bool · add/and → a little stack machine standing in for typed-WASM) is deliberately tiny; the real AST + real typed-WASM operational semantics remain the open obligation, expanded later the way solo-core's Duet/Ensemble tracks expand Solo. This mirrors how invariant-path/proofs/SameCube.agda grounds F-2 with a real proof rather than a hole.

Coq .v policy carve-out (the point you raised)

.v is shared by Coq, Verilog, and the estate-banned V-lang (→ Zig). Coq is neither — so this PR makes the distinction explicit so nothing can sweep it up:

  • .hypatia-ignore — explicit formal/*.v exemption from cicd_rules/vlang_detected (+ banned_language_file). Coq ships no v.mod, so vmod_detected never fires.
  • .claude/CLAUDE.md — new "Formal-methods Coq .v (NOT V-lang)" note: documents the estate path_allow_prefixes carve-out for Coq proof scripts, the no-Admitted/no-axiom rule, the Coq-vs-Idris2 prover split (Coq here for typed-wasm interop; solo-core stays Idris2), and "do not migrate/delete these as V-lang."

Track scaffolding

README.adoc, _CoqProject, justfile (check recipe type-checks and asserts the proof is axiom-free), .gitignore for Coq artifacts.

Docs synced

  • PROOF-NEEDS.adoc: K-1 prosepartial; formal/ now exists; Wave-0 row marked in progress.
  • FRG-PROFILE.adoc: the "no formalisation directory" honest-gap is met (grade stays E — D needs type-preservation/progress for the affine calculus, a theorem distinct from this codegen-preservation seed).

How to check

just -f formal/justfile check        # or:  cd formal && coqc K1_CodegenPreservation.v

Requires Coq 8.18+. Verified locally: compiles clean, Print Assumptions closed.

🤖 Generated with Claude Code

https://claude.ai/code/session_01KPG9mEQXFyA3k7NWAzMNMr


Generated by Claude Code

… seed

Wave 0 of the PROOF-NEEDS.adoc programme, on the keystone obligation K-1
(codegen -> typed-WASM semantic-preservation). Prover: Coq/Rocq 8.18 —
chosen for typed-wasm / ephapax interop (both Coq `Semantics.v`).

formal/K1_CodegenPreservation.v proves, with NO `Admitted` and NO axioms
(`Print Assumptions`: "Closed under the global context"), a complete
compiler-correctness theorem for a minimal AffineScript fragment:

  Definition K1_preservation : Prop :=
    forall e v, seval e = Some v -> wexec (compile e) [] = Some [obs v].
  Theorem k1_preservation_holds : K1_preservation.   (* proven *)

i.e. source big-step eval ⇒ the compiled stack-machine code yields the
corresponding wasm value. The fragment (nat/bool · add/and → a little
stack machine) is deliberately tiny; the real AST + real typed-WASM
semantics remain the open obligation, expanded later the way solo-core's
Duet/Ensemble tracks expand Solo. This mirrors how SameCube.agda grounds
F-2 with a real proof rather than a hole.

Coq `.v` policy carve-out (the `.v` extension is shared with the banned
V-lang and with Verilog — Coq is neither):
  - `.hypatia-ignore`: explicit `formal/*.v` exemption from
    `cicd_rules/vlang_detected` (+ banned_language_file), so no sweep can
    mis-flag Coq as V-lang. Coq has no `v.mod` → `vmod_detected` never fires.
  - `.claude/CLAUDE.md`: new "Formal-methods Coq `.v` (NOT V-lang)" note
    documenting the carve-out, the no-Admitted/no-axiom rule, and that
    these files must not be migrated/deleted as V-lang.

Track scaffolding: README.adoc, _CoqProject, justfile (`check` recipe
type-checks and asserts the proof is axiom-free), .gitignore for Coq
artifacts.

Docs synced: PROOF-NEEDS.adoc K-1 prose→partial + `formal/` now exists;
FRG-PROFILE.adoc "no formalisation directory" gap met (grade stays E —
D needs type-preservation/progress for the affine calculus, distinct
from this codegen-preservation seed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KPG9mEQXFyA3k7NWAzMNMr
@github-actions

Copy link
Copy Markdown

🔍 Hypatia Security Scan

Findings: 41 issues detected

Severity Count
🔴 Critical 2
🟠 High 23
🟡 Medium 16

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Action denoland/setup-deno@v2 needs attention",
    "type": "unpinned_action",
    "file": "publish-jsr.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in instant-sync.yml",
    "type": "secret_action_without_presence_gate",
    "file": "instant-sync.yml",
    "action": "peter-evans/repository-dispatch",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/packages/affinescript-cli/mod.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (2 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/packages/affine-vscode/mod.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/affinescript-vite/src/affine-plugin-improved.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "expect() in hot path (32 occurrences, CWE-754)",
    "type": "expect_in_hot_path",
    "file": "/home/runner/work/affinescript/affinescript/affinescriptiser/src/codegen/wasm_gen.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "expect() in hot path (29 occurrences, CWE-754)",
    "type": "expect_in_hot_path",
    "file": "/home/runner/work/affinescript/affinescript/affinescriptiser/src/codegen/affine_gen.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "unsafe block -- requires SAFETY comment (2 occurrences, CWE-676)",
    "type": "unsafe_block",
    "file": "/home/runner/work/affinescript/affinescript/runtime/src/panic.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "unsafe block -- requires SAFETY comment (1 occurrences, CWE-676)",
    "type": "unsafe_block",
    "file": "/home/runner/work/affinescript/affinescript/runtime/src/alloc.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "unsafe block -- requires SAFETY comment (3 occurrences, CWE-676)",
    "type": "unsafe_block",
    "file": "/home/runner/work/affinescript/affinescript/runtime/src/ffi.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

@hyperpolymath hyperpolymath marked this pull request as ready for review June 21, 2026 11:25
@hyperpolymath hyperpolymath merged commit ac6e574 into main Jun 21, 2026
16 checks passed
@hyperpolymath hyperpolymath deleted the claude/lucid-cray-4a22dp branch June 21, 2026 11:25
hyperpolymath added a commit that referenced this pull request Jun 21, 2026
…622)

## Why

A reader (human or agent) asking "is AffineScript sound / what's
broken?" could open any of ~6 docs and get a **stale answer in the
dangerous direction**. The known holes `#554/#555/#556/#558/#559` were
fixed, fenced, or removed across 2026-05/06, but their status was
duplicated across the capability matrix, a `CLAUDE.md` survey block,
`STATE-*` snapshots, the wiki, etc. — surfaces that drift independently.
(This PR was prompted by exactly that: a stale answer sourced from
`CAPABILITY-MATRIX.adoc` + the `CLAUDE.md` survey.)

This fixes the **content and the layout** so it can't silently rot
again.

## What

**New single source of truth — `docs/SOUNDNESS.adoc`**
- The one place soundness-hole status lives. **Test-anchored**: every
row names the fixture/test that proves it.
- Carries a freshness stamp (`:ground-truth-sha:` + date).
Ground-truthed against a green `dune build` / `dune runtest` at
`d55e22c`.
- Honest about residuals (interpreter non-tail resume, Lean/Why3
`return`-drop, #559 generic overlap) and about the
implementation-vs-proof distinction.

**New anti-staleness gate — `tools/check-soundness-ledger.sh`** (wired
into `just guard` + CI)
- Fails the build if the ledger loses its primacy declaration or
freshness stamp, if any **anchor fixture goes missing**, or if a status
surface stops linking back to the ledger. This binds prose to executable
truth.
- Verified it *bites*: returns non-zero and names the offender on a
missing anchor.

**Corrected every live status surface** to ground-truth + made them
defer to the ledger: `README`, `CAPABILITY-MATRIX`
(borrow/effects/refinement/traits rows + anti-over-claim bullet +
See-also), `PROOF-NEEDS` (holes block + P-9/P-10), `NAVIGATION`,
`reference/COMPILER-CAPABILITIES`, `TECH-DEBT` (CORE-04/05), the wiki
(README + traits + dependent-types), `STATE.a2ml`, agent debt, and the
`CLAUDE.md` survey. Dated snapshot `STATE-2026-06-11` is **capped with a
superseded banner**, not rewritten.

## Ground-truth recorded (verified in source + green suite)

| Issue | Was documented as | Actually |
|---|---|---|
| #554 | open use-after-move | **fixed** — rejected `MoveWhileBorrowed`
|
| #555 | silently mis-lowered | **fenced loud** on every compiled
backend; 1 pinned interp residual |
| #556 | silent sync fallback | **fixed** — fails loud |
| #558 | parse-only/unenforced | **removed** in v1; `assume(...)`
rejected at parse |
| #559 | coherence unchecked | **fixed** for concrete overlaps (wired in
`typecheck.ml`) |
| #553 | "0% implemented" | M1–M3, **test-only/unwired** |

Closing these *implementation* holes is **not** the same as *proving*
soundness — the metatheory is still prose (one Wave-0 Coq seed from #620
noted), per `PROOF-NEEDS.adoc`.

## Note for review

- The `.claude/CLAUDE.md` change is **body-only** (the stale survey → a
deferral to the ledger). I did **not** add or alter its license header —
that's the owner-gated act flagged in the soundness handoff.
- No code touched; docs + one shell gate + CI/justfile wiring. `dune
build` and `dune runtest` are green at `d55e22c`.
- `AFFIRMATION.adoc` (a parked, dated attestation) was deliberately left
untouched.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa

---
_Generated by [Claude
Code](https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa)_

Co-authored-by: Claude <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Jun 21, 2026
… xfail pin-liveness (#631)

Makes `docs/SOUNDNESS.adoc` keep every promise it makes. The ledger on
`main` is "prose ahead of mechanism" (it claims content-binding /
stamp-enforcement / pinned xfails, but the gate enforced only 2 of
those). This builds the missing mechanism, and folds in the closed-#625
capability-matrix anchoring.

## The five properties (each maps to a function in the gate)

| # | Property | Function | Provenance |
|---|----------|----------|-----------|
| 1 | Anchors exist | `check_anchors_exist` | Jonathan's #622 design
(kept) |
| 2 | Back-links | `check_backlinks` | Jonathan's #622 design (kept) |
| 3 | **Content-binding** | `check_content_binding` +
`tools/soundness-anchors.sha256` + `--reseal` | **new** |
| 4 | **Stamp-enforcement** | `check_stamp` | **new** |
| 5 | **Pin-liveness (xfail)** | `check_pins` +
`test/xfail/test_xfail_pins.ml` | **new** |

`## What this gate enforces` is documented at the top of the script.
Everything **fails closed**.

## Ground-truth correction (compiler wins)

Running the compiler showed **#559 generic-subsumption is already
detected/rejected** (`impl[T] Greet for Box[T]` vs `impl Greet for
Box[Int]` → "Trait coherence violation"). So the ledger's `open
(tracked)` "not yet detected" was stale **in the dangerous direction**.
Corrected to `fixed` with a positive test; the stale `test_e2e.ml`
comment fixed. → one fewer xfail pin than the spec assumed.

Also: the stub-return row uses **#624** (the real tracker); #560 is
*variable-string wasm ops*, unrelated — this change supplies the pin
#628 couldn't (the fixture/test now exist). Stamp re-pointed to
`dd6c19e` (a real main-ancestor; the old `d55e22c` was squash-orphaned).
Metatheory note updated for the new `formal/` proofs (#620#627).

## Self-tests — each new check watched failing

```
SELF-TEST 1 — Property 3 (mutate a fixture by one token):
  ERROR (property 3): anchor content drift vs tools/soundness-anchors.sha256 ...

SELF-TEST 2 — Property 4 (un-advanced/orphaned stamp + soundness change):
  ERROR (property 4): stamp d55e22c is not an ancestor of HEAD; re-point :ground-truth-sha: ...

SELF-TEST (5a) — Property 5 (pinned row names a missing pin):
  ERROR (property 1): test anchor not defined: test_stub_backend_return_DELETED
  FATAL: anchor test:test_stub_backend_return_DELETED: expected exactly one defining file, found 0 (fail closed)

SELF-TEST (5b) — Property 5 (an xfail pin flips to XPASS):
  ALARM (property 5): pin test_resume_nontail_xfail is PASSING — the hole may be fixed.
  Open docs/SOUNDNESS.adoc and update the row to 'fixed' (do NOT just silence the pin).
```

Full suite green (534 tests; xfail harness reports both pins
`XFAIL-OK`), all four guard gates green, `dune build`/`dune runtest`
green at `dd6c19e`.

## Claims I could not make fully mechanical (named, not silently
softened)

1. **Content-binding scope.** Fixtures + pinned-test *bodies* are
digest-bound (11/12 anchors); the one SUITE-file anchor (`#553` →
`test/test_borrow_polonius.ml`) is existence+stamp-checked only — a
whole-file hash is too coarse. The ledger sentence was tightened to say
exactly this.
2. **Stamp "advanced-in-this-change" detection** is robust for the
normal *branch-off-fresh-main* workflow (and the orphaned-stamp case
fails closed, self-test 2). It has a known edge in a
*multi-commit-since-stamp* history (stamp bumped in an earlier commit,
soundness changed again later without re-bump could read as "advanced");
decision-2's full "diff-on-main" freshness check is not separately
implemented. Flagged for your call.

## CI
`build` job now checks out `fetch-depth: 0` so property 4 can resolve
the stamp; the xfail harness is in `.ocamlformat-ignore` (authored
without ocamlformat available).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa

---
_Generated by [Claude
Code](https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa)_

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant