docs: extend test-anchoring to the capability matrix + file residual issues#625
Closed
hyperpolymath wants to merge 1 commit into
Closed
docs: extend test-anchoring to the capability matrix + file residual issues#625hyperpolymath wants to merge 1 commit into
hyperpolymath wants to merge 1 commit into
Conversation
…issues Follow-up to #622. Applies the same "claims bound to executable tests, gated in CI" discipline to the *feature*-readiness matrix (not just the soundness ledger), and wires in the now-tracked residuals. - docs/CAPABILITY-MATRIX.adoc: add a "== Test anchors" section mapping each load-bearing component (lexer, parser, type checker, effects, borrow checker, QTT, interpreter, traits, stdlib, typed-wasm, codegen, JS-host backends, Solo core) to the test(s) that exercise it. - tools/check-capability-anchors.sh (wired into `just guard` + CI): fails the build if the Test-anchors section disappears or if any test path the matrix names goes missing — so a "works"/"enforced" row cannot outlive its test. Verified it bites (non-zero on a missing anchor). - docs/SOUNDNESS.adoc: wire the residual rows to their new tracking issues (#623 interp non-tail resume, #624 Lean/Why3 return-drop) and refresh the metatheory note to reflect the grown Coq formal/ track (#620/#621: K-1, K-1-let, F-1). Residuals filed as tracked issues so they aren't just prose: - #623 — interpreter non-tail single-shot resume (shallow continuation; needs CPS rewrite, owner-steer). - #624 — Lean/Why3 experimental backends drop `return` control flow. - #559 (already open) updated by comment: concrete-overlap coherence landed; residual narrowed to generic-subsumption overlap. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa
Owner
Author
|
Closing in favour of a single combined PR (owner's call: fold the capability-anchors work together with the soundness-ledger hardening into one change). The hardening PR will carry all of this PR's content — the Superseding PR to follow. Generated by Claude Code |
hyperpolymath
pushed a commit
that referenced
this pull request
Jun 21, 2026
…ility anchors Fold of the capability-matrix test-anchoring (closed #625) and the soundness-gate hardening into one change. Makes docs/SOUNDNESS.adoc's claims true by building the mechanism the prose-ahead ledger already promises. Gate tools/check-soundness-ledger.sh: 2 -> 5 enforced properties (anchors exist, back-links, content-binding via tools/soundness-anchors.sha256 + --reseal, stamp-enforcement, xfail pin-liveness). New xfail harness test/xfail/, new fixture stub_backend_return_dropped.affine, capability-matrix Test-anchors section + tools/check-capability-anchors.sh. CI build job fetch-depth: 0 for the stamp diff. Ground-truth corrections (compiler wins): #559 generic-subsumption IS detected (positive test); stub-return uses #624; interp non-tail uses #623. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa
hyperpolymath
added a commit
that referenced
this pull request
Jun 21, 2026
… xfail pin-liveness (#631) Makes `docs/SOUNDNESS.adoc` keep every promise it makes. The ledger on `main` is "prose ahead of mechanism" (it claims content-binding / stamp-enforcement / pinned xfails, but the gate enforced only 2 of those). This builds the missing mechanism, and folds in the closed-#625 capability-matrix anchoring. ## The five properties (each maps to a function in the gate) | # | Property | Function | Provenance | |---|----------|----------|-----------| | 1 | Anchors exist | `check_anchors_exist` | Jonathan's #622 design (kept) | | 2 | Back-links | `check_backlinks` | Jonathan's #622 design (kept) | | 3 | **Content-binding** | `check_content_binding` + `tools/soundness-anchors.sha256` + `--reseal` | **new** | | 4 | **Stamp-enforcement** | `check_stamp` | **new** | | 5 | **Pin-liveness (xfail)** | `check_pins` + `test/xfail/test_xfail_pins.ml` | **new** | `## What this gate enforces` is documented at the top of the script. Everything **fails closed**. ## Ground-truth correction (compiler wins) Running the compiler showed **#559 generic-subsumption is already detected/rejected** (`impl[T] Greet for Box[T]` vs `impl Greet for Box[Int]` → "Trait coherence violation"). So the ledger's `open (tracked)` "not yet detected" was stale **in the dangerous direction**. Corrected to `fixed` with a positive test; the stale `test_e2e.ml` comment fixed. → one fewer xfail pin than the spec assumed. Also: the stub-return row uses **#624** (the real tracker); #560 is *variable-string wasm ops*, unrelated — this change supplies the pin #628 couldn't (the fixture/test now exist). Stamp re-pointed to `dd6c19e` (a real main-ancestor; the old `d55e22c` was squash-orphaned). Metatheory note updated for the new `formal/` proofs (#620–#627). ## Self-tests — each new check watched failing ``` SELF-TEST 1 — Property 3 (mutate a fixture by one token): ERROR (property 3): anchor content drift vs tools/soundness-anchors.sha256 ... SELF-TEST 2 — Property 4 (un-advanced/orphaned stamp + soundness change): ERROR (property 4): stamp d55e22c is not an ancestor of HEAD; re-point :ground-truth-sha: ... SELF-TEST (5a) — Property 5 (pinned row names a missing pin): ERROR (property 1): test anchor not defined: test_stub_backend_return_DELETED FATAL: anchor test:test_stub_backend_return_DELETED: expected exactly one defining file, found 0 (fail closed) SELF-TEST (5b) — Property 5 (an xfail pin flips to XPASS): ALARM (property 5): pin test_resume_nontail_xfail is PASSING — the hole may be fixed. Open docs/SOUNDNESS.adoc and update the row to 'fixed' (do NOT just silence the pin). ``` Full suite green (534 tests; xfail harness reports both pins `XFAIL-OK`), all four guard gates green, `dune build`/`dune runtest` green at `dd6c19e`. ## Claims I could not make fully mechanical (named, not silently softened) 1. **Content-binding scope.** Fixtures + pinned-test *bodies* are digest-bound (11/12 anchors); the one SUITE-file anchor (`#553` → `test/test_borrow_polonius.ml`) is existence+stamp-checked only — a whole-file hash is too coarse. The ledger sentence was tightened to say exactly this. 2. **Stamp "advanced-in-this-change" detection** is robust for the normal *branch-off-fresh-main* workflow (and the orphaned-stamp case fails closed, self-test 2). It has a known edge in a *multi-commit-since-stamp* history (stamp bumped in an earlier commit, soundness changed again later without re-bump could read as "advanced"); decision-2's full "diff-on-main" freshness check is not separately implemented. Flagged for your call. ## CI `build` job now checks out `fetch-depth: 0` so property 4 can resolve the stamp; the xfail harness is in `.ocamlformat-ignore` (authored without ocamlformat available). 🤖 Generated with [Claude Code](https://claude.com/claude-code) https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa --- _Generated by [Claude Code](https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa)_ Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #622, doing the two things offered there.
1. Test-anchor the feature matrix (not just soundness)
#622madedocs/SOUNDNESS.adoctest-anchored + CI-gated. This applies the same discipline to the feature-readiness matrix:docs/CAPABILITY-MATRIX.adoc→ new== Test anchorssection mapping each load-bearing component (lexer, parser/AST/name-resolution, type checker, effects, borrow checker, QTT/affine, interpreter, traits incl. tracking: CORE-04 traits residual — associated-type substitution, where-clause supertraits, coherence checking #559 coherence, stdlib, typed-wasm, codegen golden, JS-host backends, Solo core) to the test(s) that exercise it.tools/check-capability-anchors.sh(wired intojust guard+ CI): fails the build if that section disappears or if any test path the matrix names goes missing — so aworks/enforcedrow can't outlive the test that backs it. Verified it bites (rc=1 on a planted missing anchor) and then reverted.Each gate now owns one doc:
check-soundness-ledger.sh→ the soundness ledger,check-capability-anchors.sh→ the feature matrix. Same anti-rot principle, symmetric.2. File the residuals as tracked issues
The "Still open" items in the ledger were prose; now they're tracked:
eval, blocked on OCaml 4.14 / jsoo — owner-steer).returncontrol flow — early returns silently mis-emitted #624 — Lean/Why3 experimental backends dropreturncontrol flow (ExprReturn (Some e) -> gen_expr e).docs/SOUNDNESS.adocis updated to cite #623/#624 on the residual rows, and its metatheory note now reflects the grown Coqformal/track (#620/#621: K-1, K-1-let, F-1) rather than just the original #620 seed — keeping the ledger itself fresh.Verification
All four guard gates green (
check-no-extension-ts,check-doc-truthing,check-soundness-ledger,check-capability-anchors). No code touched — docs + one shell gate + justfile/CI wiring.🤖 Generated with Claude Code
https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa
Generated by Claude Code