From a6b187bd3b78dc751ae81f306495fd0291577be2 Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 27 Jun 2026 20:00:45 +0000 Subject: [PATCH] =?UTF-8?q?feat(formal):=20real-lift=20R-mem=20(kernel)=20?= =?UTF-8?q?=E2=80=94=20linear=20memory=20+=20heap=20tuple=20round-trips?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fifth rung of the real lift (formal/REAL-LIFT.adoc): linear memory. Establishes the word-addressed heap and the deterministic cell layout the codegen's elaboration nodes assume — the precondition for arrays/records/strings/floats. Axiom-free. Target IR (RealWasm.v): - adds the faithful lib/wasm.ml memory instructions `I32Load (off)` / `I32Store (off)` to `instr` (+ dead `step1` arms). Zero blast radius — the proven wexec/cexec/compile_correct/compile_stmt_correct all still compile. R-mem kernel (new file RealMem.v): - word-addressed memory `mem := list Z` (cell = one i32), `mem_get`/`mem_set` with the heap round-trip lemmas (get-after-set same/other, length) reused directly from RealWasm's `set_nth_eq`/`set_nth_neq`/`set_nth_length`. - a bump allocator `alloc n m = (length m, m ++ repeat 0 n)` (base = current size) with base/length lemmas. - the memory-aware executor `mexec`: because memory ops are *straight-line* (no nested control), `mexec` is a **structural** Fixpoint on the instruction list — no fuel needed, unlike R2's wexec / R2-loops' cexec. `I32Load off` pops addr a and pushes mem[Z.to_nat a + off]; `I32Store off` pops value then addr and writes mem[Z.to_nat a + off]; everything else defers to the memory-free `step1`. Plus the clean structural sequencing lemma `mexec_app`. - `mexec_store_then_load` — storing v at a fresh in-range cell then loading it back yields v (set_nth's get-after-set lifted through mexec). - `mexec_pair_build_proj` — build a 2-cell tuple at `base` (store v0 at base+0, v1 at base+1) then project BOTH fields back: the `field i at base+i` heap layout, proved as a get-after-set round-trip (Nat2Z.id discharges the Z.of_nat/Z.to_nat address round-trip). Whole formal/ track re-audited: **20 files, 35 `Print Assumptions` closure reports, all "Closed under the global context" — zero axioms, no Admitted.** Remaining R-mem (next sub-rungs): runtime `memory.grow`/`size` allocator instruction, byte-granular layout, control-bearing field expressions (fuse `mexec` with the R2-loops `cexec`), and the full source `tuple`/`array`/`record` + indexing/field-access compile-correctness. Then R-float / R-str / R-call per the ladder. Refs REAL-LIFT.adoc, docs/PROOF-NEEDS.adoc (K-1 row). Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01KPG9mEQXFyA3k7NWAzMNMr --- .hypatia-ignore | 4 + docs/PROOF-NEEDS.adoc | 10 ++- formal/README.adoc | 23 +++++- formal/REAL-LIFT.adoc | 28 +++++-- formal/RealMem.v | 165 ++++++++++++++++++++++++++++++++++++++++++ formal/RealWasm.v | 5 +- formal/_CoqProject | 1 + formal/justfile | 2 +- 8 files changed, 223 insertions(+), 15 deletions(-) create mode 100644 formal/RealMem.v diff --git a/.hypatia-ignore b/.hypatia-ignore index ced910f..639d179 100644 --- a/.hypatia-ignore +++ b/.hypatia-ignore @@ -87,3 +87,7 @@ cicd_rules/banned_language_file:formal/RealCompile.v # script, wired into formal/_CoqProject; exempt from the V-lang ban. cicd_rules/vlang_detected:formal/RealLoop.v cicd_rules/banned_language_file:formal/RealLoop.v +# RealMem.v — REAL-LIFT rung R-mem (linear memory: I32Load/I32Store + heap +# tuples). Coq proof script, wired into formal/_CoqProject; exempt from V-lang. +cicd_rules/vlang_detected:formal/RealMem.v +cicd_rules/banned_language_file:formal/RealMem.v diff --git a/docs/PROOF-NEEDS.adoc b/docs/PROOF-NEEDS.adoc index ac56ebc..fc43a25 100644 --- a/docs/PROOF-NEEDS.adoc +++ b/docs/PROOF-NEEDS.adoc @@ -12,7 +12,7 @@ This is the single document that enumerates *what AffineScript must prove*, at gap between the soundness *arguments* the project relies on (prose in `docs/academic/proofs/`, the borrow-checker comments in `lib/borrow.ml`, `docs/CAPABILITY-MATRIX.adoc`) and the soundness *proofs* that are actually -mechanized today: a growing axiom-free Coq core (19 files) that pins each +mechanized today: a growing axiom-free Coq core (20 files) that pins each obligation on a deliberately *small model* — no full-language theorem is discharged yet (see <>). @@ -70,7 +70,7 @@ than the prose corpus suggests: `TypeSafety.v` is example lemmas about list length, not AffineScript. Not core-metatheory. * **`formal/`** — the directory #513 names as the mechanized-proof target now - **exists** (Coq/Rocq 8.18), axiom-free throughout — **19 files, 33 `Print + **exists** (Coq/Rocq 8.18), axiom-free throughout — **20 files, 35 `Print Assumptions` closure reports**, all "Closed under the global context". Codegen keystone: `K1_CodegenPreservation.v` (K-1 minimal) + `K1Let_CodegenPreservation.v` (K-1 grown with variables/`let`/environment). @@ -90,7 +90,11 @@ than the prose corpus suggests: branch-aware executor `cexec` over backward jumps, a source `while`/`seq`/`set` statement language, and `compile_stmt_correct` — the lowered terminating `while` simulates the reference run, settling the #601 value-returning-tail - question concretely: the statement tail leaves an empty value stack). + question concretely: the statement tail leaves an empty value stack) + + `RealMem.v` (R-mem: word-addressed linear memory with `I32Load`/`I32Store`, a + structural memory executor `mexec`, a bump allocator, and the store→load / + 2-cell tuple-layout round-trips — `field i at base+i` proved via `set_nth`'s + get-after-set; the heap precondition for arrays/records/strings/floats). Record rows: `Rows.v` (P-11 — progress + preservation for extensible records). See <>, <>. * **Research tracks** (not core soundness): `docs/academic/tropical-session-types/` diff --git a/formal/README.adoc b/formal/README.adoc index 3859b17..71aa4bd 100644 --- a/formal/README.adoc +++ b/formal/README.adoc @@ -114,6 +114,12 @@ cannot mistake it. Coq has no `v.mod` manifest, so `vmod_detected` never fires. empty-stack unit tail — settles #601) | **mechanized**, axiom-free +| `RealMem.v` +| **Real lift R-mem (kernel)** — `I32Load`/`I32Store`; word-addressed memory + + structural executor `mexec`; bump allocator; store→load and 2-cell + tuple build+project round-trips (`field i at base+i`) +| **mechanized**, axiom-free + | `Rows.v` | **P-11** — record-row soundness: progress + preservation for STLC + extensible records (empty/extend/select/restrict over a `lacks`-checked row), funext-free @@ -239,11 +245,11 @@ the same canonical AST emit identical code). The observational version for the trailing-statement/tail split is mechanized separately in `invariant-path/proofs/SameCube.agda` (F-2). -== The real lift (R0 → R2-loops) +== The real lift (R0 → R-mem) The *real lift* replaces the toy stack machine with the actual `lib/wasm.ml` IR and the toy source with the real (resolved) AST, climbing a strict-superset -fragment ladder. Four rungs are down: +fragment ladder. Five rungs are down (R-mem in kernel form): * **R0** (`RealWasm.v`) — the pure i32 numeric core (real `instr` / `value_type` names) and a stack-machine `wexec`. @@ -271,8 +277,17 @@ fragment ladder. Four rungs are down: proves a terminating `while` simulates the reference `run_stmt`, ending with an **empty value stack** — settling the #601 value-returning-tail question (statement tail = unit) concretely. - -The full plan — the rest of the ladder (R-mem → R-float → R-str → R-call → +* **R-mem** (kernel) (`RealWasm.v` grown with `I32Load`/`I32Store`; `RealMem.v`) + — linear memory. A **word-addressed** heap (`mem := list Z`, cell = i32) with a + *structural* executor `mexec` (memory ops are straight-line, so no fuel) + + `mexec_app`, a bump allocator (`base = current size`), and the heap round-trips: + store→load (`mexec_store_then_load`) and a 2-cell tuple build+project + (`mexec_pair_build_proj` — `field i at base+i`, both projections), all reusing + `set_nth`'s get-after-set. Remaining R-mem: runtime `memory.grow`, byte layout, + control-bearing fields (fuse `mexec` with `cexec`), and the full source + tuple/array/record compile-correctness. + +The full plan — the rest of the ladder (finish R-mem → R-float → R-str → R-call → R-wrap → R-eff), the Coq module structure, and the strategy for each hard part (linear memory, floats, the elaboration nodes, effects) — is `formal/REAL-LIFT.adoc`. diff --git a/formal/REAL-LIFT.adoc b/formal/REAL-LIFT.adoc index 4dd4e2c..435ed9b 100644 --- a/formal/REAL-LIFT.adoc +++ b/formal/REAL-LIFT.adoc @@ -188,9 +188,17 @@ next. The rung names are stable references the `PROOF-NEEDS.adoc` rows will cite and its lowering's `cexec` ends with an **empty value stack** (the unit tail) — a theorem — versus the expression tail, which keeps its value on the stack. -| **R-mem** -| Linear memory: load/store family, `MemorySize/Grow`, a byte-addressed memory - model; `ExprTuple`/`ExprArray`/`ExprRecord` + indexing/field access. +| **R-mem** ⏳ (kernel landed) +| Linear memory. **Done (`RealMem.v`):** `I32Load`/`I32Store` added to the real + `instr` (zero blast radius); a **word-addressed** memory (`mem := list Z`, cell + = i32) with a structural memory executor `mexec` (straight-line ⇒ no fuel) + + `mexec_app`; a bump allocator (`base = current size`); and the heap round-trips + — store→load (`mexec_store_then_load`) and a 2-cell **tuple build+project** + (`mexec_pair_build_proj`: `field i at base+i`, both projections) — reusing + `set_nth`'s get-after-set. **Remaining:** `MemorySize`/`Grow` as a runtime + allocator instruction, byte-granular layout, control-bearing field + expressions (combine `mexec` with the R2-loops `cexec`), and the full source + `ExprTuple`/`ExprArray`/`ExprRecord` + indexing/field-access compile-correctness. | Heap layout — the precondition for strings and the "cell" elaboration nodes. | **R-float** @@ -324,8 +332,16 @@ already has the front-end scaffolding (`F1_TransformerPreservation.v`, terminating `while` lowered to `Block[Loop[…;BrIf 1;…;Br 0]]` simulates the reference run, ending with an **empty value stack** — the unit/statement tail, settling **#601** concretely. Axiom-free; 19 files / 33 closure reports. -* Next: **R-mem** — linear memory (load/store, byte model) for tuples/arrays/ - records; then R-float / R-str / R-call per the ladder. (`for`/`break`/ - `continue` and a general source-divergence model reuse the R2-loops `cexec`.) +* **R-mem kernel landed** — `RealWasm.v` grown with `I32Load`/`I32Store` (zero + blast radius). `RealMem.v` gives a word-addressed memory (`mem := list Z`), a + *structural* memory executor `mexec` (straight-line ⇒ no fuel) + `mexec_app`, a + bump allocator, and the heap round-trips: store→load (`mexec_store_then_load`) + and the 2-cell tuple build+project (`mexec_pair_build_proj`, `field i at + base+i`), reusing `set_nth`'s get-after-set. Axiom-free; 20 files / 35 closure + reports. **Remaining R-mem:** runtime `MemorySize`/`Grow`, byte layout, + control-bearing fields (fuse `mexec` with `cexec`), full source + tuple/array/record compile-correctness. +* Next: finish R-mem (above), then **R-float** / **R-str** / **R-call** per the + ladder. Tracked against `docs/PROOF-NEEDS.adoc` §6 Wave 3 and #513. diff --git a/formal/RealMem.v b/formal/RealMem.v new file mode 100644 index 0000000..c1a21f1 --- /dev/null +++ b/formal/RealMem.v @@ -0,0 +1,165 @@ +(* SPDX-License-Identifier: MPL-2.0 *) +(* SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell (hyperpolymath) *) + +(* + RealMem.v + ═════════ + REAL-LIFT rung **R-mem** (see formal/REAL-LIFT.adoc): linear memory. + Adds the load/store execution for `lib/wasm.ml`'s `I32Load`/`I32Store` (the + instructions were added to `instr` in RealWasm.v) and a source heap layer — + `tuple`/`record`-style cells laid out at consecutive addresses — with the + build-then-project round-trip proved. + + Memory is **word-addressed** (`mem := list Z`, address = cell index): the + deterministic layout the codegen's elaboration nodes assume, where tuples / + records / arrays are runs of i32 cells. (Byte-granular `[len][utf8]` layout + and IEEE lanes are the later R-str / R-float rungs; bit-exact wrap is R-wrap.) + + Memory ops are *straight-line* (no control), so unlike R2's `wexec` / the + R2-loops `cexec`, the memory executor `mexec` is a **structural** `Fixpoint` + on the instruction list — no fuel needed. The heap round-trip is then exactly + `set_nth`'s get-after-set, reused from RealWasm. + + Axiom-free, no Admitted. + `.v` is Coq, not V-lang — see formal/README.adoc and .hypatia-ignore. +*) + +Require Import List. +Require Import ZArith. +Require Import PeanoNat. +Require Import Lia. +Require Import ASFormal.RealWasm. +Require Import ASFormal.RealCompile. +Import ListNotations. + +(* ── word-addressed linear memory ─────────────────────────────────────────── *) +Definition mem := list Z. +Definition mem_get (m : mem) (a : nat) : option Z := nth_error m a. +Definition mem_set (a : nat) (v : Z) (m : mem) : mem := set_nth a v m. + +Lemma mem_set_length : forall a v m, length (mem_set a v m) = length m. +Proof. intros; apply set_nth_length. Qed. + +Lemma mem_get_set_eq : forall a v m, a < length m -> mem_get (mem_set a v m) a = Some v. +Proof. intros; apply set_nth_eq; assumption. Qed. + +Lemma mem_get_set_neq : forall a a' v m, + a <> a' -> mem_get (mem_set a' v m) a = mem_get m a. +Proof. intros; apply set_nth_neq; assumption. Qed. + +(* bump allocator: base = current size; reserve n zeroed cells. *) +Definition alloc (n : nat) (m : mem) : nat * mem := (length m, m ++ repeat 0%Z n). + +Lemma alloc_base : forall n m, fst (alloc n m) = length m. +Proof. reflexivity. Qed. + +Lemma alloc_length : forall n m, length (snd (alloc n m)) = length m + n. +Proof. intros; cbn. rewrite app_length, repeat_length; reflexivity. Qed. + +(* ── memory-aware executor (structural on the instruction list) ─────────────── + I32Load off : pop addr a, push mem[⌊a⌋+off]. + I32Store off : pop value v then addr a, set mem[⌊a⌋+off] := v. + everything else: the memory-free `step1` (memory unchanged). Control + instructions (Block/Loop/Br/BrIf/IfElse) trap via step1 = None — combining + memory with control is the next sub-rung. *) +Fixpoint mexec (is : list instr) (m : mem) (lo : locals) (st : stack) + : option (mem * locals * stack) := + match is with + | [] => Some (m, lo, st) + | a :: r => + match a with + | I32Load off => + match st with + | addr :: t => + match mem_get m (Z.to_nat addr + off) with + | Some v => mexec r m lo (v :: t) + | None => None + end + | [] => None + end + | I32Store off => + match st with + | v :: addr :: t => mexec r (mem_set (Z.to_nat addr + off) v m) lo t + | _ => None + end + | _ => + match step1 a lo st with + | Some (lo', st') => mexec r m lo' st' + | None => None + end + end + end. + +(* sequencing: straight-line, so a plain structural split (no fuel/monotonicity). *) +Lemma mexec_app : forall is1 m lo st m1 lo1 st1, + mexec is1 m lo st = Some (m1, lo1, st1) -> + forall is2, + mexec (is1 ++ is2) m lo st = mexec is2 m1 lo1 st1. +Proof. + induction is1 as [| a r IH]; intros m lo st m1 lo1 st1 H is2. + - cbn in H. injection H as -> -> ->. reflexivity. + - cbn [app]. destruct a; + try (cbn [mexec] in H |- *; destruct (step1 _ lo st) as [[lo' st']|]; + [ apply IH; exact H | discriminate H ]). + + (* I32Load *) + cbn [mexec] in H |- *. destruct st as [| addr t]; [discriminate H|]. + destruct (mem_get m (Z.to_nat addr + off)) as [v|]; [| discriminate H]. + apply IH; exact H. + + (* I32Store *) + cbn [mexec] in H |- *. destruct st as [| v [| addr t]]; try discriminate H. + apply IH; exact H. +Qed. + +(* ── concrete sanity: store then load the same cell round-trips ─────────────── *) +Example mexec_store_load_demo : + (* mem = [0;0;0]; store 42 at addr 1, then load addr 1 ⇒ 42 *) + mexec [I32Const 1; I32Const 42; I32Store 0; I32Const 1; I32Load 0] + [0;0;0]%Z [] [] + = Some (mem_set 1 42 [0;0;0]%Z, [], [42%Z]). +Proof. reflexivity. Qed. + +(* the executor-level store→load round-trip: storing v at a fresh in-range cell + then loading it back yields v (this is `set_nth`'s get-after-set, lifted + through `mexec`). *) +Lemma mexec_store_then_load : forall m a v, + a < length m -> + mexec [I32Const (Z.of_nat a); I32Const v; I32Store 0; + I32Const (Z.of_nat a); I32Load 0] m [] [] + = Some (mem_set a v m, [], [v]). +Proof. + intros m a v Ha. + cbn [mexec step1]. + rewrite Nat.add_0_r, Nat2Z.id. + rewrite mem_get_set_eq by exact Ha. + reflexivity. +Qed. + +(* ── heap tuples: build a 2-cell tuple at `base`, then project both fields ──── + This is the codegen's deterministic cell layout (field i at base+i) proved as + a get-after-set round-trip — the precondition for arrays / records / strings. + (`base` is the address the bump allocator hands out; a runtime `memory.grow` + that produces it, and field expressions with control, are the next sub-rung.) *) +Theorem mexec_pair_build_proj : forall m base v0 v1, + base + 1 < length m -> + exists m', + (* build: store v0 at base+0, v1 at base+1 *) + mexec [I32Const (Z.of_nat base); I32Const v0; I32Store 0; + I32Const (Z.of_nat base); I32Const v1; I32Store 1] m [] [] + = Some (m', [], []) + (* project field 0 ⇒ v0 *) + /\ mexec [I32Const (Z.of_nat base); I32Load 0] m' [] [] = Some (m', [], [v0]) + (* project field 1 ⇒ v1 *) + /\ mexec [I32Const (Z.of_nat base); I32Load 1] m' [] [] = Some (m', [], [v1]). +Proof. + intros m base v0 v1 Hb. + exists (mem_set (base + 1) v1 (mem_set base v0 m)). split; [| split]. + - cbn [mexec step1]. rewrite !Nat2Z.id, Nat.add_0_r. reflexivity. + - cbn [mexec step1]. rewrite Nat2Z.id, Nat.add_0_r. + rewrite mem_get_set_neq by lia. + rewrite mem_get_set_eq by lia. reflexivity. + - cbn [mexec step1]. rewrite Nat2Z.id. + rewrite mem_get_set_eq by (rewrite mem_set_length; lia). reflexivity. +Qed. + +Print Assumptions mexec_store_then_load. +Print Assumptions mexec_pair_build_proj. diff --git a/formal/RealWasm.v b/formal/RealWasm.v index 4e34780..603f66e 100644 --- a/formal/RealWasm.v +++ b/formal/RealWasm.v @@ -45,7 +45,9 @@ Inductive instr := | Block (body : list instr) (* R2-loops: forward label — Br exits past it *) | Loop (body : list instr) (* R2-loops: backward label — Br re-enters it *) | Br (k : nat) (* R2-loops: branch to the k-th enclosing label *) -| BrIf (k : nat). (* R2-loops: pop; if nonzero, branch to label k *) +| BrIf (k : nat) (* R2-loops: pop; if nonzero, branch to label k *) +| I32Load (off : nat) (* R-mem: pop addr, push mem[addr+off] (word) *) +| I32Store (off : nat). (* R-mem: pop val then addr, mem[addr+off]:=val *) Definition stack := list Z. Definition locals := list Z. @@ -97,6 +99,7 @@ Definition step1 (i : instr) (lo : locals) (st : stack) : option (locals * stack end | IfElse _ _ => None | Block _ | Loop _ | Br _ | BrIf _ => None (* structured control: cexec (RealLoop), not step1 *) + | I32Load _ | I32Store _ => None (* memory ops: mexec (RealMem), not step1 *) end. (* fuel-indexed executor. None = trap OR out of fuel. *) diff --git a/formal/_CoqProject b/formal/_CoqProject index d9f2a85..7b2020f 100644 --- a/formal/_CoqProject +++ b/formal/_CoqProject @@ -17,4 +17,5 @@ F5_RenderFaithful.v RealWasm.v RealCompile.v RealLoop.v +RealMem.v Rows.v diff --git a/formal/justfile b/formal/justfile index a848533..e35ed1f 100644 --- a/formal/justfile +++ b/formal/justfile @@ -13,7 +13,7 @@ check: F1_TransformerPreservation F3_PragmaDecidable F4_ErrorFaithful \ P3_BorrowSound P3_BorrowGraph P2_Progress P2_Stlc \ QttSemiring AffineUsage QttTyping QttDynamic \ - F5_RenderFaithful RealWasm RealCompile RealLoop Rows; do + F5_RenderFaithful RealWasm RealCompile RealLoop RealMem Rows; do echo "== coqc $f.v ==" o="$(coqc -Q . ASFormal "$f.v")" printf '%s\n' "$o"