Skip to content

docs: extend test-anchoring to the capability matrix + file residual issues#625

Closed
hyperpolymath wants to merge 1 commit into
mainfrom
claude/capability-test-anchors
Closed

docs: extend test-anchoring to the capability matrix + file residual issues#625
hyperpolymath wants to merge 1 commit into
mainfrom
claude/capability-test-anchors

Conversation

@hyperpolymath

Copy link
Copy Markdown
Owner

Follow-up to #622, doing the two things offered there.

1. Test-anchor the feature matrix (not just soundness)

#622 made docs/SOUNDNESS.adoc test-anchored + CI-gated. This applies the same discipline to the feature-readiness matrix:

  • docs/CAPABILITY-MATRIX.adoc → new == Test anchors section mapping each load-bearing component (lexer, parser/AST/name-resolution, type checker, effects, borrow checker, QTT/affine, interpreter, traits incl. tracking: CORE-04 traits residual — associated-type substitution, where-clause supertraits, coherence checking #559 coherence, stdlib, typed-wasm, codegen golden, JS-host backends, Solo core) to the test(s) that exercise it.
  • tools/check-capability-anchors.sh (wired into just guard + CI): fails the build if that section disappears or if any test path the matrix names goes missing — so a works/enforced row can't outlive the test that backs it. Verified it bites (rc=1 on a planted missing anchor) and then reverted.

Each gate now owns one doc: check-soundness-ledger.sh → the soundness ledger, check-capability-anchors.sh → the feature matrix. Same anti-rot principle, symmetric.

2. File the residuals as tracked issues

The "Still open" items in the ledger were prose; now they're tracked:

docs/SOUNDNESS.adoc is updated to cite #623/#624 on the residual rows, and its metatheory note now reflects the grown Coq formal/ track (#620/#621: K-1, K-1-let, F-1) rather than just the original #620 seed — keeping the ledger itself fresh.

Verification

All four guard gates green (check-no-extension-ts, check-doc-truthing, check-soundness-ledger, check-capability-anchors). No code touched — docs + one shell gate + justfile/CI wiring.

🤖 Generated with Claude Code

https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa


Generated by Claude Code

…issues

Follow-up to #622. Applies the same "claims bound to executable tests, gated
in CI" discipline to the *feature*-readiness matrix (not just the soundness
ledger), and wires in the now-tracked residuals.

- docs/CAPABILITY-MATRIX.adoc: add a "== Test anchors" section mapping each
  load-bearing component (lexer, parser, type checker, effects, borrow checker,
  QTT, interpreter, traits, stdlib, typed-wasm, codegen, JS-host backends, Solo
  core) to the test(s) that exercise it.
- tools/check-capability-anchors.sh (wired into `just guard` + CI): fails the
  build if the Test-anchors section disappears or if any test path the matrix
  names goes missing — so a "works"/"enforced" row cannot outlive its test.
  Verified it bites (non-zero on a missing anchor).
- docs/SOUNDNESS.adoc: wire the residual rows to their new tracking issues
  (#623 interp non-tail resume, #624 Lean/Why3 return-drop) and refresh the
  metatheory note to reflect the grown Coq formal/ track (#620/#621: K-1,
  K-1-let, F-1).

Residuals filed as tracked issues so they aren't just prose:
- #623 — interpreter non-tail single-shot resume (shallow continuation; needs
  CPS rewrite, owner-steer).
- #624 — Lean/Why3 experimental backends drop `return` control flow.
- #559 (already open) updated by comment: concrete-overlap coherence landed;
  residual narrowed to generic-subsumption overlap.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa
@hyperpolymath hyperpolymath marked this pull request as ready for review June 21, 2026 13:03
@hyperpolymath hyperpolymath enabled auto-merge (squash) June 21, 2026 13:07
@hyperpolymath hyperpolymath disabled auto-merge June 21, 2026 13:14

Copy link
Copy Markdown
Owner Author

Closing in favour of a single combined PR (owner's call: fold the capability-anchors work together with the soundness-ledger hardening into one change).

The hardening PR will carry all of this PR's content — the == Test anchors section + tools/check-capability-anchors.sh — plus the three new gate properties (content-binding manifest, stamp-enforcement, table-completeness/pin-liveness via a real xfail harness) and the reconciled ledger rewrite. Nothing here is dropped; it's superseded by the bigger change so the ledger rewrite and the gate land atomically (the rewritten ledger claims enforcement that must exist in the same commit).

Superseding PR to follow.


Generated by Claude Code

hyperpolymath pushed a commit that referenced this pull request Jun 21, 2026
…ility anchors

Fold of the capability-matrix test-anchoring (closed #625) and the soundness-gate
hardening into one change. Makes docs/SOUNDNESS.adoc's claims true by building the
mechanism the prose-ahead ledger already promises.

Gate tools/check-soundness-ledger.sh: 2 -> 5 enforced properties (anchors exist,
back-links, content-binding via tools/soundness-anchors.sha256 + --reseal,
stamp-enforcement, xfail pin-liveness). New xfail harness test/xfail/, new fixture
stub_backend_return_dropped.affine, capability-matrix Test-anchors section +
tools/check-capability-anchors.sh. CI build job fetch-depth: 0 for the stamp diff.

Ground-truth corrections (compiler wins): #559 generic-subsumption IS detected
(positive test); stub-return uses #624; interp non-tail uses #623.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa
hyperpolymath added a commit that referenced this pull request Jun 21, 2026
… xfail pin-liveness (#631)

Makes `docs/SOUNDNESS.adoc` keep every promise it makes. The ledger on
`main` is "prose ahead of mechanism" (it claims content-binding /
stamp-enforcement / pinned xfails, but the gate enforced only 2 of
those). This builds the missing mechanism, and folds in the closed-#625
capability-matrix anchoring.

## The five properties (each maps to a function in the gate)

| # | Property | Function | Provenance |
|---|----------|----------|-----------|
| 1 | Anchors exist | `check_anchors_exist` | Jonathan's #622 design
(kept) |
| 2 | Back-links | `check_backlinks` | Jonathan's #622 design (kept) |
| 3 | **Content-binding** | `check_content_binding` +
`tools/soundness-anchors.sha256` + `--reseal` | **new** |
| 4 | **Stamp-enforcement** | `check_stamp` | **new** |
| 5 | **Pin-liveness (xfail)** | `check_pins` +
`test/xfail/test_xfail_pins.ml` | **new** |

`## What this gate enforces` is documented at the top of the script.
Everything **fails closed**.

## Ground-truth correction (compiler wins)

Running the compiler showed **#559 generic-subsumption is already
detected/rejected** (`impl[T] Greet for Box[T]` vs `impl Greet for
Box[Int]` → "Trait coherence violation"). So the ledger's `open
(tracked)` "not yet detected" was stale **in the dangerous direction**.
Corrected to `fixed` with a positive test; the stale `test_e2e.ml`
comment fixed. → one fewer xfail pin than the spec assumed.

Also: the stub-return row uses **#624** (the real tracker); #560 is
*variable-string wasm ops*, unrelated — this change supplies the pin
#628 couldn't (the fixture/test now exist). Stamp re-pointed to
`dd6c19e` (a real main-ancestor; the old `d55e22c` was squash-orphaned).
Metatheory note updated for the new `formal/` proofs (#620#627).

## Self-tests — each new check watched failing

```
SELF-TEST 1 — Property 3 (mutate a fixture by one token):
  ERROR (property 3): anchor content drift vs tools/soundness-anchors.sha256 ...

SELF-TEST 2 — Property 4 (un-advanced/orphaned stamp + soundness change):
  ERROR (property 4): stamp d55e22c is not an ancestor of HEAD; re-point :ground-truth-sha: ...

SELF-TEST (5a) — Property 5 (pinned row names a missing pin):
  ERROR (property 1): test anchor not defined: test_stub_backend_return_DELETED
  FATAL: anchor test:test_stub_backend_return_DELETED: expected exactly one defining file, found 0 (fail closed)

SELF-TEST (5b) — Property 5 (an xfail pin flips to XPASS):
  ALARM (property 5): pin test_resume_nontail_xfail is PASSING — the hole may be fixed.
  Open docs/SOUNDNESS.adoc and update the row to 'fixed' (do NOT just silence the pin).
```

Full suite green (534 tests; xfail harness reports both pins
`XFAIL-OK`), all four guard gates green, `dune build`/`dune runtest`
green at `dd6c19e`.

## Claims I could not make fully mechanical (named, not silently
softened)

1. **Content-binding scope.** Fixtures + pinned-test *bodies* are
digest-bound (11/12 anchors); the one SUITE-file anchor (`#553` →
`test/test_borrow_polonius.ml`) is existence+stamp-checked only — a
whole-file hash is too coarse. The ledger sentence was tightened to say
exactly this.
2. **Stamp "advanced-in-this-change" detection** is robust for the
normal *branch-off-fresh-main* workflow (and the orphaned-stamp case
fails closed, self-test 2). It has a known edge in a
*multi-commit-since-stamp* history (stamp bumped in an earlier commit,
soundness changed again later without re-bump could read as "advanced");
decision-2's full "diff-on-main" freshness check is not separately
implemented. Flagged for your call.

## CI
`build` job now checks out `fetch-depth: 0` so property 4 can resolve
the stamp; the xfail harness is in `.ocamlformat-ignore` (authored
without ocamlformat available).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa

---
_Generated by [Claude
Code](https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa)_

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants