Skip to content

docs: make soundness status a single, test-anchored source of truth#622

Merged
hyperpolymath merged 2 commits into
mainfrom
claude/modest-cori-kzjtcy
Jun 21, 2026
Merged

docs: make soundness status a single, test-anchored source of truth#622
hyperpolymath merged 2 commits into
mainfrom
claude/modest-cori-kzjtcy

Conversation

@hyperpolymath

Copy link
Copy Markdown
Owner

Why

A reader (human or agent) asking "is AffineScript sound / what's broken?" could open any of ~6 docs and get a stale answer in the dangerous direction. The known holes #554/#555/#556/#558/#559 were fixed, fenced, or removed across 2026-05/06, but their status was duplicated across the capability matrix, a CLAUDE.md survey block, STATE-* snapshots, the wiki, etc. — surfaces that drift independently. (This PR was prompted by exactly that: a stale answer sourced from CAPABILITY-MATRIX.adoc + the CLAUDE.md survey.)

This fixes the content and the layout so it can't silently rot again.

What

New single source of truth — docs/SOUNDNESS.adoc

New anti-staleness gate — tools/check-soundness-ledger.sh (wired into just guard + CI)

  • Fails the build if the ledger loses its primacy declaration or freshness stamp, if any anchor fixture goes missing, or if a status surface stops linking back to the ledger. This binds prose to executable truth.
  • Verified it bites: returns non-zero and names the offender on a missing anchor.

Corrected every live status surface to ground-truth + made them defer to the ledger: README, CAPABILITY-MATRIX (borrow/effects/refinement/traits rows + anti-over-claim bullet + See-also), PROOF-NEEDS (holes block + P-9/P-10), NAVIGATION, reference/COMPILER-CAPABILITIES, TECH-DEBT (CORE-04/05), the wiki (README + traits + dependent-types), STATE.a2ml, agent debt, and the CLAUDE.md survey. Dated snapshot STATE-2026-06-11 is capped with a superseded banner, not rewritten.

Ground-truth recorded (verified in source + green suite)

Issue Was documented as Actually
#554 open use-after-move fixed — rejected MoveWhileBorrowed
#555 silently mis-lowered fenced loud on every compiled backend; 1 pinned interp residual
#556 silent sync fallback fixed — fails loud
#558 parse-only/unenforced removed in v1; assume(...) rejected at parse
#559 coherence unchecked fixed for concrete overlaps (wired in typecheck.ml)
#553 "0% implemented" M1–M3, test-only/unwired

Closing these implementation holes is not the same as proving soundness — the metatheory is still prose (one Wave-0 Coq seed from #620 noted), per PROOF-NEEDS.adoc.

Note for review

  • The .claude/CLAUDE.md change is body-only (the stale survey → a deferral to the ledger). I did not add or alter its license header — that's the owner-gated act flagged in the soundness handoff.
  • No code touched; docs + one shell gate + CI/justfile wiring. dune build and dune runtest are green at d55e22c.
  • AFFIRMATION.adoc (a parked, dated attestation) was deliberately left untouched.

🤖 Generated with Claude Code

https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa


Generated by Claude Code

The known soundness holes (#554/#555/#556/#558/#559) were fixed, fenced, or
removed across 2026-05/2026-06, but their status was duplicated across ~6 docs
that drifted independently. A reader who opened the stale one (the
CAPABILITY-MATRIX rows, the CLAUDE.md survey block, a STATE snapshot) got a
stale "is it sound?" answer — and in the dangerous direction.

Structural fix, not just prose:

- Add docs/SOUNDNESS.adoc: the single source of truth for soundness-hole
  status, test-anchored (every row names the fixture/test that proves it),
  with a freshness stamp (SHA + date). Ground-truthed against a green
  `dune build` / `dune runtest` at 85e3f0d on 2026-06-21.
- Add tools/check-soundness-ledger.sh, wired into `just guard` + CI: fails the
  build if the ledger loses its primacy declaration or freshness stamp, if any
  anchor fixture it cites goes missing, or if a status surface stops linking
  back to it. This binds prose to executable truth so the ledger cannot rot
  silently (verified: the gate fails on a missing anchor).
- Correct every live status surface to ground-truth and point at the ledger:
  README, CAPABILITY-MATRIX (borrow / effects / refinement / traits rows + the
  anti-over-claim bullet + See-also), PROOF-NEEDS (holes block + P-9/P-10),
  NAVIGATION, reference/COMPILER-CAPABILITIES, TECH-DEBT (CORE-04/CORE-05),
  the wiki (README + traits + dependent-types), STATE.a2ml, agent debt, and
  the CLAUDE.md survey (converted to a deferral; no header/license change).
- Cap the dated STATE-2026-06-11 snapshot with a superseded banner rather than
  rewriting history.

Ground-truth recorded: #554 fixed (use-after-move via a callee-returned borrow
rejected), #555 fenced loud on every compiled backend (one pinned interpreter
non-tail-resume residual), #556 fixed, #558 removed, #559 fixed for concrete
overlaps, #553 Polonius M1-M3 but test-only/unwired. Closing these
implementation holes is not the same as proving soundness; the metatheory
remains prose (PROOF-NEEDS).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa
@github-actions

Copy link
Copy Markdown

🔍 Hypatia Security Scan

Findings: 41 issues detected

Severity Count
🔴 Critical 2
🟠 High 23
🟡 Medium 16

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Action denoland/setup-deno@v2 needs attention",
    "type": "unpinned_action",
    "file": "publish-jsr.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in instant-sync.yml",
    "type": "secret_action_without_presence_gate",
    "file": "instant-sync.yml",
    "action": "peter-evans/repository-dispatch",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/packages/affinescript-cli/mod.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (2 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/packages/affine-vscode/mod.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/affinescript-vite/src/affine-plugin-improved.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "expect() in hot path (32 occurrences, CWE-754)",
    "type": "expect_in_hot_path",
    "file": "/home/runner/work/affinescript/affinescript/affinescriptiser/src/codegen/wasm_gen.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "expect() in hot path (29 occurrences, CWE-754)",
    "type": "expect_in_hot_path",
    "file": "/home/runner/work/affinescript/affinescript/affinescriptiser/src/codegen/affine_gen.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "unsafe block -- requires SAFETY comment (2 occurrences, CWE-676)",
    "type": "unsafe_block",
    "file": "/home/runner/work/affinescript/affinescript/runtime/src/panic.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "unsafe block -- requires SAFETY comment (1 occurrences, CWE-676)",
    "type": "unsafe_block",
    "file": "/home/runner/work/affinescript/affinescript/runtime/src/alloc.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "unsafe block -- requires SAFETY comment (3 occurrences, CWE-676)",
    "type": "unsafe_block",
    "file": "/home/runner/work/affinescript/affinescript/runtime/src/ffi.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

@hyperpolymath hyperpolymath marked this pull request as ready for review June 21, 2026 12:08
@hyperpolymath hyperpolymath enabled auto-merge (squash) June 21, 2026 12:08
@github-actions

Copy link
Copy Markdown

🔍 Hypatia Security Scan

Findings: 41 issues detected

Severity Count
🔴 Critical 2
🟠 High 23
🟡 Medium 16

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Action denoland/setup-deno@v2 needs attention",
    "type": "unpinned_action",
    "file": "publish-jsr.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in instant-sync.yml",
    "type": "secret_action_without_presence_gate",
    "file": "instant-sync.yml",
    "action": "peter-evans/repository-dispatch",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/packages/affinescript-cli/mod.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (2 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/packages/affine-vscode/mod.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/affinescript-vite/src/affine-plugin-improved.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "expect() in hot path (32 occurrences, CWE-754)",
    "type": "expect_in_hot_path",
    "file": "/home/runner/work/affinescript/affinescript/affinescriptiser/src/codegen/wasm_gen.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "expect() in hot path (29 occurrences, CWE-754)",
    "type": "expect_in_hot_path",
    "file": "/home/runner/work/affinescript/affinescript/affinescriptiser/src/codegen/affine_gen.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "unsafe block -- requires SAFETY comment (2 occurrences, CWE-676)",
    "type": "unsafe_block",
    "file": "/home/runner/work/affinescript/affinescript/runtime/src/panic.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "unsafe block -- requires SAFETY comment (1 occurrences, CWE-676)",
    "type": "unsafe_block",
    "file": "/home/runner/work/affinescript/affinescript/runtime/src/alloc.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "unsafe block -- requires SAFETY comment (3 occurrences, CWE-676)",
    "type": "unsafe_block",
    "file": "/home/runner/work/affinescript/affinescript/runtime/src/ffi.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

@hyperpolymath hyperpolymath disabled auto-merge June 21, 2026 12:09
@hyperpolymath hyperpolymath merged commit e6c155f into main Jun 21, 2026
17 checks passed
@hyperpolymath hyperpolymath deleted the claude/modest-cori-kzjtcy branch June 21, 2026 12:09
hyperpolymath added a commit that referenced this pull request Jun 21, 2026
… xfail pin-liveness (#631)

Makes `docs/SOUNDNESS.adoc` keep every promise it makes. The ledger on
`main` is "prose ahead of mechanism" (it claims content-binding /
stamp-enforcement / pinned xfails, but the gate enforced only 2 of
those). This builds the missing mechanism, and folds in the closed-#625
capability-matrix anchoring.

## The five properties (each maps to a function in the gate)

| # | Property | Function | Provenance |
|---|----------|----------|-----------|
| 1 | Anchors exist | `check_anchors_exist` | Jonathan's #622 design
(kept) |
| 2 | Back-links | `check_backlinks` | Jonathan's #622 design (kept) |
| 3 | **Content-binding** | `check_content_binding` +
`tools/soundness-anchors.sha256` + `--reseal` | **new** |
| 4 | **Stamp-enforcement** | `check_stamp` | **new** |
| 5 | **Pin-liveness (xfail)** | `check_pins` +
`test/xfail/test_xfail_pins.ml` | **new** |

`## What this gate enforces` is documented at the top of the script.
Everything **fails closed**.

## Ground-truth correction (compiler wins)

Running the compiler showed **#559 generic-subsumption is already
detected/rejected** (`impl[T] Greet for Box[T]` vs `impl Greet for
Box[Int]` → "Trait coherence violation"). So the ledger's `open
(tracked)` "not yet detected" was stale **in the dangerous direction**.
Corrected to `fixed` with a positive test; the stale `test_e2e.ml`
comment fixed. → one fewer xfail pin than the spec assumed.

Also: the stub-return row uses **#624** (the real tracker); #560 is
*variable-string wasm ops*, unrelated — this change supplies the pin
#628 couldn't (the fixture/test now exist). Stamp re-pointed to
`dd6c19e` (a real main-ancestor; the old `d55e22c` was squash-orphaned).
Metatheory note updated for the new `formal/` proofs (#620#627).

## Self-tests — each new check watched failing

```
SELF-TEST 1 — Property 3 (mutate a fixture by one token):
  ERROR (property 3): anchor content drift vs tools/soundness-anchors.sha256 ...

SELF-TEST 2 — Property 4 (un-advanced/orphaned stamp + soundness change):
  ERROR (property 4): stamp d55e22c is not an ancestor of HEAD; re-point :ground-truth-sha: ...

SELF-TEST (5a) — Property 5 (pinned row names a missing pin):
  ERROR (property 1): test anchor not defined: test_stub_backend_return_DELETED
  FATAL: anchor test:test_stub_backend_return_DELETED: expected exactly one defining file, found 0 (fail closed)

SELF-TEST (5b) — Property 5 (an xfail pin flips to XPASS):
  ALARM (property 5): pin test_resume_nontail_xfail is PASSING — the hole may be fixed.
  Open docs/SOUNDNESS.adoc and update the row to 'fixed' (do NOT just silence the pin).
```

Full suite green (534 tests; xfail harness reports both pins
`XFAIL-OK`), all four guard gates green, `dune build`/`dune runtest`
green at `dd6c19e`.

## Claims I could not make fully mechanical (named, not silently
softened)

1. **Content-binding scope.** Fixtures + pinned-test *bodies* are
digest-bound (11/12 anchors); the one SUITE-file anchor (`#553` →
`test/test_borrow_polonius.ml`) is existence+stamp-checked only — a
whole-file hash is too coarse. The ledger sentence was tightened to say
exactly this.
2. **Stamp "advanced-in-this-change" detection** is robust for the
normal *branch-off-fresh-main* workflow (and the orphaned-stamp case
fails closed, self-test 2). It has a known edge in a
*multi-commit-since-stamp* history (stamp bumped in an earlier commit,
soundness changed again later without re-bump could read as "advanced");
decision-2's full "diff-on-main" freshness check is not separately
implemented. Flagged for your call.

## CI
`build` job now checks out `fetch-depth: 0` so property 4 can resolve
the stamp; the xfail harness is in `.ocamlformat-ignore` (authored
without ocamlformat available).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa

---
_Generated by [Claude
Code](https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa)_

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants