runtimeverification · cds-amal · Mar 10, 2026
diff --git a/docs/adr/004-interned-index-receipts.md b/docs/adr/004-interned-index-receipts.md
@@ -0,0 +1,167 @@
+# ADR-004: Interned index receipts for golden-file normalization
+
+**Status:** Proposed
+**Date:** 2026-03-10
+
+## Context
+
+Integration test golden files compare serialized Stable MIR JSON across platforms
+(macOS vs. Linux CI). Several values in the output are "interned indices":
+compile-session-specific integers assigned by rustc's internal interning machinery
+for types like `Ty`, `Span`, `AllocId`, `DefId`, and `AdtDef`. These indices are
+consistent within a single rustc invocation but differ across platforms and across
+runs; a `Ty` that interns as index 42 on macOS might be index 37 on Linux.
+
+The normalise filter (`normalise-filter.jq`) strips or zeroes these indices before
+comparison. The trouble is that the filter has to independently know the JSON
+schema: every time Stable MIR adds a new field or array position that carries an
+interned index, the filter needs a corresponding rule. We've been discovering
+these gaps exclusively through CI failures on the other platform; the feedback
+loop is slow and the pattern is pure whack-a-mole. In the span of three commits,
+we hit three distinct categories of missed interned index (bare `span` fields,
+`{"Type": N}` newtype wrappers, and positional indices inside `Cast[2]` and
+`Closure[0]` arrays).
+
+The root cause: the schema knowledge (which values are interned) lives in the
+Rust type definitions, but the normalization rules live in a separate jq script
+that has to reverse-engineer that knowledge from examples. There's no contract
+between the producer (the printer) and the consumer (the normaliser) that says
+"these are the interned paths."
+
+## Decision
+
+The printer emits a companion "receipts" file (`*.smir.receipts.json`) alongside
+each `*.smir.json` output. The receipts file declares which JSON key names,
+newtype wrappers, and array positions carry interned indices. The normalise filter
+reads the receipts and applies them generically, rather than hardcoding per-field
+rules.
+
+The receipts are generated dynamically by observing actual serde serialization
+calls, not by a static list. This is the key property: if upstream adds a new `Ty`
+field somewhere inside `Body` (which we don't control), the receipt generator
+automatically detects it because serde's derive-generated code calls
+`serialize_newtype_struct("Ty", ...)` for the new field, and our serialization
+observer records it.
+
+### Receipt format
+
+```json
+{
+  "interned_keys": ["span", "ty", "def_id", "id", "alloc_id", "adt_def"],
+  "interned_newtypes": ["Type"],
+  "interned_positions": {
+    "Cast": [2],
+    "Closure": [0],
+    "VTable": [0],
+    "Adt": [0],
+    "Field": [1]
+  }
+}
+```
+
+Three categories, mapping directly to the three normalization patterns the jq
+filter needs:
+
+| Category | Meaning | jq action |
+|----------|---------|-----------|
+| `interned_keys` | Object field names whose values are interned indices | `del(.[$key])` or zero the value |
+| `interned_newtypes` | Enum variant names that wrap a bare interned integer (e.g. `{"Type": 42}`) | `.[$name] = 0` when value is a number |
+| `interned_positions` | Parent array name to list of positions carrying interned indices | `.[$name][$pos] = 0` |
+
+### How it works: the spy serializer
+
+The mechanism is a "spy" `serde::Serializer` implementation that mirrors the
+structure of a real serializer but produces no output; it only tracks context
+(which struct field, which array position, which enum variant we're currently
+inside) and records findings.
+
+When the spy encounters a `serialize_newtype_struct` call whose type name matches
+a known interned type (`Ty`, `Span`, `AllocId`, `DefId`, `AdtDef`, `CrateNum`,
+`VariantIdx`), it examines the current context to classify the finding:
+
+- Inside a struct field named `"ty"` → `interned_keys` gets `"ty"`
+- Inside an enum newtype variant named `"Type"` → `interned_newtypes` gets `"Type"`
+- Inside a tuple variant named `"Cast"` at position 2 → `interned_positions["Cast"]` gets `2`
+
+The spy serializer runs as a separate pass before the real `serde_json` serialization.
+This means we serialize twice, which is acceptable: the spy pass is cheap (no I/O,
+no string formatting, just context tracking) and the SmirJson structure is
+typically modest in size. The two passes are:
+
+1. `value.serialize(&mut SpySerializer::new(...))` — collect receipts
+2. `serde_json::to_string(&value)` — produce the actual JSON
+
+### How the normaliser consumes receipts
+
+The normalise filter receives the receipts file via jq's `--slurpfile` mechanism:
+
+```shell
+jq -S -e --slurpfile receipts input.smir.receipts.json \
+   -f normalise-filter.jq input.smir.json
+```
+
+The items walk simplifies from a list of hardcoded rules to a generic application
+of the receipt:
+
+```jq
+# Before (hardcoded):
+walk(if type == "object" then del(.ty) | del(.span) | del(.def_id) | del(.id)
+     | if .Field then .Field[1] = 0 else . end
+     | if .Type and (.Type | type) == "number" then .Type = 0 else . end
+     # ... more rules added with each CI failure ...
+     else . end)
+
+# After (receipt-driven):
+walk(if type == "object" then
+       reduce ($receipts[0].interned_keys[]) as $k (.; del(.[$k]))
+     | reduce ($receipts[0].interned_newtypes[]) as $n (.;
+         if .[$n] and (.[$n] | type) == "number" then .[$n] = 0 else . end)
+     | reduce ($receipts[0].interned_positions | to_entries[]) as $e (.;
+         if .[$e.key] then
+           reduce ($e.value[]) as $p (.; .[$e.key][$p] = 0)
+         else . end)
+     else . end)
+```
+
+The normalise filter no longer needs to know about individual field names or
+array positions. A new interned field upstream is automatically captured by the
+receipts; the filter handles it without any change.
+
+## Consequences
+
+**What improves:**
+
+- The schema knowledge moves from the jq filter to the Rust code, right next to
+  the type definitions. The jq filter becomes a generic consumer.
+- New interned fields in Body (which comes from stable_mir and whose structure we
+  don't control) are automatically detected via the spy serializer observing
+  serde's derive-generated code.
+- The receipts file is itself a useful diagnostic artifact: it tells you exactly
+  which parts of the output carry non-deterministic values.
+
+**What stays the same:**
+
+- The top-level array handling in the normalise filter (stripping alloc_id from
+  the allocs array, removing the Ty key from the types array, etc.) is
+  structurally different from the walk-based normalization and remains as
+  explicit jq code. The receipts cover the Body tree where the whack-a-mole
+  problem lives; the top-level arrays are stable and few.
+- Golden files still need regeneration when the normalise filter changes. The
+  receipts reduce how often the filter needs to change, but they don't eliminate
+  golden file churn entirely.
+
+**What to watch for:**
+
+- The spy serializer depends on stable_mir types using `#[derive(Serialize)]` with
+  standard newtype struct serialization. If a type switches to a custom Serialize
+  impl that doesn't call `serialize_newtype_struct`, the spy won't detect it. In
+  practice this is unlikely for the interned index types (they're all simple
+  newtypes around `usize`), but worth noting.
+- The `INTERNED_TYPES` list (the set of type names the spy recognizes) is
+  maintained in Rust. If stable_mir adds a new interned newtype with a name not
+  in the list, it won't be detected. This is a small, infrequently-changing list
+  (currently 7 entries) and is trivial to update; it's also easy to validate by
+  comparing receipts across platforms.
+- The receipts file adds one more output artifact per compilation. The Makefile
+  and test harness need to account for it (passing the receipts to jq, cleaning
+  up receipts files alongside JSON files).
diff --git a/src/printer/mod.rs b/src/printer/mod.rs
@@ -14,6 +14,7 @@
 //! | [`mir_visitor`] | `BodyAnalyzer`: single-pass MIR body traversal collecting calls, allocs, types, spans |
 //! | [`ty_visitor`] | `TyCollector`: recursively collects reachable types with layout info (some special kinds are traversed but not stored) |
 //! | [`link_map`] | Function resolution map: type + instance kind to symbol name |
+//! | [`receipts`] | Spy serializer that discovers interned-index locations; emits `*.smir.receipts.json` (see ADR-004) |
 //! | [`types`] | Type helpers and [`TypeMetadata`](schema::TypeMetadata) construction |
 //! | [`util`] | Name resolution, attribute queries, and small collection utilities |
 
@@ -49,6 +50,7 @@ mod collect;
 mod items;
 mod link_map;
 mod mir_visitor;
+pub(crate) mod receipts;
 mod schema;
 mod ty_visitor;
 mod types;
@@ -61,19 +63,38 @@ pub use schema::{AllocInfo, FnSymType, Item, LinkMapKey, SmirJson, TypeMetadata}
 pub(crate) use util::hash;
 
 pub fn emit_smir(tcx: TyCtxt<'_>) {
-    let smir_json =
-        serde_json::to_string(&collect_smir(tcx)).expect("serde_json failed to write result");
+    let collected = collect_smir(tcx);
+
+    // Run the spy serializer to discover which JSON paths carry interned
+    // indices, then serialize the receipts alongside the main output.
+    let receipt = receipts::collect_receipts(&collected);
+    let receipt_json =
+        serde_json::to_string(&receipt).expect("serde_json failed to write receipts");
+
+    let smir_json = serde_json::to_string(&collected).expect("serde_json failed to write result");
 
     match crate::compat::output::mir_output_path(tcx, "smir.json") {
         crate::compat::output::OutputDest::Stdout => {
-            write!(&io::stdout(), "{}", smir_json).expect("Failed to write smir.json");
+            write!(&io::stdout(), "{smir_json}").expect("Failed to write smir.json");
+            // Receipts go to stderr when main output goes to stdout,
+            // so they can be captured separately.
+            eprintln!("{receipt_json}");
         }
         crate::compat::output::OutputDest::File(path) => {
             let mut b = io::BufWriter::new(
                 File::create(&path)
                     .unwrap_or_else(|e| panic!("Failed to create {}: {}", path.display(), e)),
             );
-            write!(b, "{}", smir_json).expect("Failed to write smir.json");
+            write!(b, "{smir_json}").expect("Failed to write smir.json");
+
+            // Write the receipts file alongside the JSON output:
+            // foo.smir.json → foo.smir.receipts.json
+            let receipts_path = path.with_extension("receipts.json");
+            let mut rb =
+                io::BufWriter::new(File::create(&receipts_path).unwrap_or_else(|e| {
+                    panic!("Failed to create {}: {}", receipts_path.display(), e)
+                }));
+            write!(rb, "{receipt_json}").expect("Failed to write receipts");
         }
     }
 }