diff --git a/docs/adr/004-interned-index-receipts.md b/docs/adr/004-interned-index-receipts.md new file mode 100644 index 00000000..23f72e70 --- /dev/null +++ b/docs/adr/004-interned-index-receipts.md @@ -0,0 +1,167 @@ +# ADR-004: Interned index receipts for golden-file normalization + +**Status:** Proposed +**Date:** 2026-03-10 + +## Context + +Integration test golden files compare serialized Stable MIR JSON across platforms +(macOS vs. Linux CI). Several values in the output are "interned indices": +compile-session-specific integers assigned by rustc's internal interning machinery +for types like `Ty`, `Span`, `AllocId`, `DefId`, and `AdtDef`. These indices are +consistent within a single rustc invocation but differ across platforms and across +runs; a `Ty` that interns as index 42 on macOS might be index 37 on Linux. + +The normalise filter (`normalise-filter.jq`) strips or zeroes these indices before +comparison. The trouble is that the filter has to independently know the JSON +schema: every time Stable MIR adds a new field or array position that carries an +interned index, the filter needs a corresponding rule. We've been discovering +these gaps exclusively through CI failures on the other platform; the feedback +loop is slow and the pattern is pure whack-a-mole. In the span of three commits, +we hit three distinct categories of missed interned index (bare `span` fields, +`{"Type": N}` newtype wrappers, and positional indices inside `Cast[2]` and +`Closure[0]` arrays). + +The root cause: the schema knowledge (which values are interned) lives in the +Rust type definitions, but the normalization rules live in a separate jq script +that has to reverse-engineer that knowledge from examples. There's no contract +between the producer (the printer) and the consumer (the normaliser) that says +"these are the interned paths." + +## Decision + +The printer emits a companion "receipts" file (`*.smir.receipts.json`) alongside +each `*.smir.json` output. The receipts file declares which JSON key names, +newtype wrappers, and array positions carry interned indices. The normalise filter +reads the receipts and applies them generically, rather than hardcoding per-field +rules. + +The receipts are generated dynamically by observing actual serde serialization +calls, not by a static list. This is the key property: if upstream adds a new `Ty` +field somewhere inside `Body` (which we don't control), the receipt generator +automatically detects it because serde's derive-generated code calls +`serialize_newtype_struct("Ty", ...)` for the new field, and our serialization +observer records it. + +### Receipt format + +```json +{ + "interned_keys": ["span", "ty", "def_id", "id", "alloc_id", "adt_def"], + "interned_newtypes": ["Type"], + "interned_positions": { + "Cast": [2], + "Closure": [0], + "VTable": [0], + "Adt": [0], + "Field": [1] + } +} +``` + +Three categories, mapping directly to the three normalization patterns the jq +filter needs: + +| Category | Meaning | jq action | +|----------|---------|-----------| +| `interned_keys` | Object field names whose values are interned indices | `del(.[$key])` or zero the value | +| `interned_newtypes` | Enum variant names that wrap a bare interned integer (e.g. `{"Type": 42}`) | `.[$name] = 0` when value is a number | +| `interned_positions` | Parent array name to list of positions carrying interned indices | `.[$name][$pos] = 0` | + +### How it works: the spy serializer + +The mechanism is a "spy" `serde::Serializer` implementation that mirrors the +structure of a real serializer but produces no output; it only tracks context +(which struct field, which array position, which enum variant we're currently +inside) and records findings. + +When the spy encounters a `serialize_newtype_struct` call whose type name matches +a known interned type (`Ty`, `Span`, `AllocId`, `DefId`, `AdtDef`, `CrateNum`, +`VariantIdx`), it examines the current context to classify the finding: + +- Inside a struct field named `"ty"` → `interned_keys` gets `"ty"` +- Inside an enum newtype variant named `"Type"` → `interned_newtypes` gets `"Type"` +- Inside a tuple variant named `"Cast"` at position 2 → `interned_positions["Cast"]` gets `2` + +The spy serializer runs as a separate pass before the real `serde_json` serialization. +This means we serialize twice, which is acceptable: the spy pass is cheap (no I/O, +no string formatting, just context tracking) and the SmirJson structure is +typically modest in size. The two passes are: + +1. `value.serialize(&mut SpySerializer::new(...))` — collect receipts +2. `serde_json::to_string(&value)` — produce the actual JSON + +### How the normaliser consumes receipts + +The normalise filter receives the receipts file via jq's `--slurpfile` mechanism: + +```shell +jq -S -e --slurpfile receipts input.smir.receipts.json \ + -f normalise-filter.jq input.smir.json +``` + +The items walk simplifies from a list of hardcoded rules to a generic application +of the receipt: + +```jq +# Before (hardcoded): +walk(if type == "object" then del(.ty) | del(.span) | del(.def_id) | del(.id) + | if .Field then .Field[1] = 0 else . end + | if .Type and (.Type | type) == "number" then .Type = 0 else . end + # ... more rules added with each CI failure ... + else . end) + +# After (receipt-driven): +walk(if type == "object" then + reduce ($receipts[0].interned_keys[]) as $k (.; del(.[$k])) + | reduce ($receipts[0].interned_newtypes[]) as $n (.; + if .[$n] and (.[$n] | type) == "number" then .[$n] = 0 else . end) + | reduce ($receipts[0].interned_positions | to_entries[]) as $e (.; + if .[$e.key] then + reduce ($e.value[]) as $p (.; .[$e.key][$p] = 0) + else . end) + else . end) +``` + +The normalise filter no longer needs to know about individual field names or +array positions. A new interned field upstream is automatically captured by the +receipts; the filter handles it without any change. + +## Consequences + +**What improves:** + +- The schema knowledge moves from the jq filter to the Rust code, right next to + the type definitions. The jq filter becomes a generic consumer. +- New interned fields in Body (which comes from stable_mir and whose structure we + don't control) are automatically detected via the spy serializer observing + serde's derive-generated code. +- The receipts file is itself a useful diagnostic artifact: it tells you exactly + which parts of the output carry non-deterministic values. + +**What stays the same:** + +- The top-level array handling in the normalise filter (stripping alloc_id from + the allocs array, removing the Ty key from the types array, etc.) is + structurally different from the walk-based normalization and remains as + explicit jq code. The receipts cover the Body tree where the whack-a-mole + problem lives; the top-level arrays are stable and few. +- Golden files still need regeneration when the normalise filter changes. The + receipts reduce how often the filter needs to change, but they don't eliminate + golden file churn entirely. + +**What to watch for:** + +- The spy serializer depends on stable_mir types using `#[derive(Serialize)]` with + standard newtype struct serialization. If a type switches to a custom Serialize + impl that doesn't call `serialize_newtype_struct`, the spy won't detect it. In + practice this is unlikely for the interned index types (they're all simple + newtypes around `usize`), but worth noting. +- The `INTERNED_TYPES` list (the set of type names the spy recognizes) is + maintained in Rust. If stable_mir adds a new interned newtype with a name not + in the list, it won't be detected. This is a small, infrequently-changing list + (currently 7 entries) and is trivial to update; it's also easy to validate by + comparing receipts across platforms. +- The receipts file adds one more output artifact per compilation. The Makefile + and test harness need to account for it (passing the receipts to jq, cleaning + up receipts files alongside JSON files). diff --git a/src/printer/mod.rs b/src/printer/mod.rs index f251e27f..efba326b 100644 --- a/src/printer/mod.rs +++ b/src/printer/mod.rs @@ -14,6 +14,7 @@ //! | [`mir_visitor`] | `BodyAnalyzer`: single-pass MIR body traversal collecting calls, allocs, types, spans | //! | [`ty_visitor`] | `TyCollector`: recursively collects reachable types with layout info (some special kinds are traversed but not stored) | //! | [`link_map`] | Function resolution map: type + instance kind to symbol name | +//! | [`receipts`] | Spy serializer that discovers interned-index locations; emits `*.smir.receipts.json` (see ADR-004) | //! | [`types`] | Type helpers and [`TypeMetadata`](schema::TypeMetadata) construction | //! | [`util`] | Name resolution, attribute queries, and small collection utilities | @@ -49,6 +50,7 @@ mod collect; mod items; mod link_map; mod mir_visitor; +pub(crate) mod receipts; mod schema; mod ty_visitor; mod types; @@ -61,19 +63,38 @@ pub use schema::{AllocInfo, FnSymType, Item, LinkMapKey, SmirJson, TypeMetadata} pub(crate) use util::hash; pub fn emit_smir(tcx: TyCtxt<'_>) { - let smir_json = - serde_json::to_string(&collect_smir(tcx)).expect("serde_json failed to write result"); + let collected = collect_smir(tcx); + + // Run the spy serializer to discover which JSON paths carry interned + // indices, then serialize the receipts alongside the main output. + let receipt = receipts::collect_receipts(&collected); + let receipt_json = + serde_json::to_string(&receipt).expect("serde_json failed to write receipts"); + + let smir_json = serde_json::to_string(&collected).expect("serde_json failed to write result"); match crate::compat::output::mir_output_path(tcx, "smir.json") { crate::compat::output::OutputDest::Stdout => { - write!(&io::stdout(), "{}", smir_json).expect("Failed to write smir.json"); + write!(&io::stdout(), "{smir_json}").expect("Failed to write smir.json"); + // Receipts go to stderr when main output goes to stdout, + // so they can be captured separately. + eprintln!("{receipt_json}"); } crate::compat::output::OutputDest::File(path) => { let mut b = io::BufWriter::new( File::create(&path) .unwrap_or_else(|e| panic!("Failed to create {}: {}", path.display(), e)), ); - write!(b, "{}", smir_json).expect("Failed to write smir.json"); + write!(b, "{smir_json}").expect("Failed to write smir.json"); + + // Write the receipts file alongside the JSON output: + // foo.smir.json → foo.smir.receipts.json + let receipts_path = path.with_extension("receipts.json"); + let mut rb = + io::BufWriter::new(File::create(&receipts_path).unwrap_or_else(|e| { + panic!("Failed to create {}: {}", receipts_path.display(), e) + })); + write!(rb, "{receipt_json}").expect("Failed to write receipts"); } } } diff --git a/src/printer/receipts.rs b/src/printer/receipts.rs new file mode 100644 index 00000000..fc31a141 --- /dev/null +++ b/src/printer/receipts.rs @@ -0,0 +1,680 @@ +//! Interned index receipt generation. +//! +//! Emits a companion `*.smir.receipts.json` alongside each `*.smir.json` that +//! declares which JSON paths carry non-deterministic interned indices (`Ty`, +//! `Span`, `AllocId`, etc.). See ADR-004 for the design rationale. +//! +//! The receipt is generated by a "spy" serde [`Serializer`] that mirrors the +//! structure of a real serializer but produces no output; it only tracks which +//! struct fields, enum variants, and array positions contain known interned +//! types. The spy runs as a separate pass before the real `serde_json` +//! serialization. + +use crate::compat::serde; + +use serde::ser::{self, Serialize}; +use serde::Serialize as SerializeDerive; +use std::cell::RefCell; +use std::collections::{BTreeMap, BTreeSet, HashSet}; +use std::fmt; +use std::rc::Rc; + +// ── Known interned type names ─────────────────────────────────────────────── +// +// These are the stable_mir newtype struct names whose `#[derive(Serialize)]` +// calls `serialize_newtype_struct("", &inner_usize)`. When the spy +// serializer sees one of these names, it records the enclosing context in +// the receipt. + +const INTERNED_TYPES: &[&str] = &["Ty", "Span", "AllocId", "DefId", "AdtDef", "CrateNum"]; + +// Interned key names that always need normalizing, regardless of whether +// the spy encounters them in a particular program's output. These fields +// carry interned indices used as cross-reference keys by downstream tools +// (joining AggregateKind::Adt in MIR bodies with type metadata entries, +// etc.), so they can't be dropped from the output itself; they're only +// stripped for golden-file comparison. +// +// The spy may miss them because (a) the program under test doesn't +// exercise the code path that produces them, or (b) the DefId is wrapped +// in a domain-specific newtype (TraitDef, StaticDef, ClosureDef) that's +// transparent in JSON but opaque to the spy's immediate-parent check. +const SEEDED_INTERNED_KEYS: &[&str] = &["def_id", "adt_def", "ty", "span", "id", "alloc_id"]; + +// Interned newtype wrappers that always need normalizing. On newer +// nightlies the serialize_index_impl! macro serializes interned types +// as bare integers, so the spy can't discover these dynamically. +const SEEDED_INTERNED_NEWTYPES: &[&str] = &["Type", "Array"]; + +// Interned positions that always need normalizing. +const SEEDED_INTERNED_POSITIONS: &[(&str, &[usize])] = &[ + ("Field", &[1]), + ("Cast", &[2]), + ("Closure", &[0]), + ("VTable", &[0]), + ("Adt", &[0]), +]; + +// ── Receipt output ────────────────────────────────────────────────────────── + +/// Schema-level receipt of where interned indices appear in the JSON output. +/// +/// Three categories, mapping to the three normalization patterns the jq filter +/// needs (see ADR-004 for the full mapping table). +#[derive(Debug, Clone, SerializeDerive)] +pub struct Receipts { + /// Object field names whose values are interned indices. + pub interned_keys: BTreeSet, + /// Enum variant / newtype names that wrap a bare interned integer. + pub interned_newtypes: BTreeSet, + /// Parent array name → positions within the array carrying interned indices. + pub interned_positions: BTreeMap>, +} + +// ── Tracker (mutable state) ───────────────────────────────────────────────── + +/// What kind of container we're currently inside. +#[derive(Debug, Clone)] +#[allow(dead_code)] // SeqElement's index is structurally meaningful even if unused in matching +enum ParentKind { + /// Serializing the value for a struct/map field with this key. + Field(String), + /// Serializing element N of a tuple variant / tuple struct with this name. + TupleEntry(String, usize), + /// Serializing the inner value of a newtype variant / newtype struct. + Newtype(String), + /// Serializing element N of a plain sequence (no named parent). + SeqElement(usize), + /// Top-level or unknown context. + Root, +} + +#[derive(Debug)] +struct Tracker { + /// Stack of enclosing contexts. The spy pushes on entry and pops on exit. + context: Vec, + /// The set of type names we recognize as interned. + interned_names: HashSet<&'static str>, + // Accumulated receipt data: + keys: BTreeSet, + newtypes: BTreeSet, + positions: BTreeMap>, +} + +impl Tracker { + fn new() -> Self { + let mut positions = BTreeMap::new(); + for (name, idxs) in SEEDED_INTERNED_POSITIONS { + positions + .entry(name.to_string()) + .or_insert_with(BTreeSet::new) + .extend(idxs.iter().copied()); + } + Self { + context: vec![ParentKind::Root], + interned_names: INTERNED_TYPES.iter().copied().collect(), + keys: SEEDED_INTERNED_KEYS.iter().map(|s| s.to_string()).collect(), + newtypes: SEEDED_INTERNED_NEWTYPES + .iter() + .map(|s| s.to_string()) + .collect(), + positions, + } + } + + /// Called when `serialize_newtype_struct` sees a known interned type. + /// + /// Looks at the immediate parent context to classify the finding. + /// Only struct fields, newtype wrappers, and tuple positions are + /// recorded; sequence elements and the root context are ignored + /// (the interned value is inside a collection, not directly + /// assignable to a named field). + fn record_interned(&mut self) { + let Some(frame) = self.context.last() else { + return; + }; + match frame { + ParentKind::Field(key) => { + self.keys.insert(key.clone()); + } + ParentKind::Newtype(name) => { + self.newtypes.insert(name.clone()); + } + ParentKind::TupleEntry(name, idx) => { + self.positions.entry(name.clone()).or_default().insert(*idx); + } + ParentKind::SeqElement(_) | ParentKind::Root => { + // Inside a sequence or at top level; the enclosing field + // is a collection, not a direct interned-index carrier. + } + } + } + + fn into_receipts(self) -> Receipts { + Receipts { + interned_keys: self.keys, + interned_newtypes: self.newtypes, + interned_positions: self.positions, + } + } +} + +type SharedTracker = Rc>; + +// ── Error type ────────────────────────────────────────────────────────────── + +/// The spy serializer never truly fails; this error type exists only to satisfy +/// serde's trait bounds. +#[derive(Debug)] +struct SpyError; + +impl fmt::Display for SpyError { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + f.write_str("spy serializer error (should never happen)") + } +} + +impl std::error::Error for SpyError {} + +impl ser::Error for SpyError { + fn custom(_msg: T) -> Self { + SpyError + } +} + +// ── Spy serializer ────────────────────────────────────────────────────────── + +struct SpySerializer { + tracker: SharedTracker, +} + +impl SpySerializer { + fn new(tracker: SharedTracker) -> Self { + Self { tracker } + } +} + +/// Compound serializer for sequences, tuples, structs, and maps. +/// +/// A single type handles all compound serde traits. The `parent` field +/// records the context that was pushed when the compound was opened; +/// `index` tracks the current element position for tuple/seq types. +struct SpyCompound { + tracker: SharedTracker, + /// Name of the enclosing construct (struct name, variant name, etc.) + name: String, + /// Current element index (for tuples and sequences). + index: usize, + /// Whether this is a struct-like compound (named fields) vs. tuple-like. + #[allow(dead_code)] // reserved for potential future use in context classification + is_struct: bool, +} + +impl SpyCompound { + fn new(tracker: SharedTracker, name: &str, is_struct: bool) -> Self { + Self { + tracker, + name: name.to_string(), + index: 0, + is_struct, + } + } +} + +// ── Serializer trait impl ─────────────────────────────────────────────────── + +macro_rules! spy_primitive { + ($method:ident, $ty:ty) => { + fn $method(self, _v: $ty) -> Result { + Ok(()) + } + }; +} + +impl ser::Serializer for SpySerializer { + type Ok = (); + type Error = SpyError; + type SerializeSeq = SpyCompound; + type SerializeTuple = SpyCompound; + type SerializeTupleStruct = SpyCompound; + type SerializeTupleVariant = SpyCompound; + type SerializeMap = SpyCompound; + type SerializeStruct = SpyCompound; + type SerializeStructVariant = SpyCompound; + + spy_primitive!(serialize_bool, bool); + spy_primitive!(serialize_i8, i8); + spy_primitive!(serialize_i16, i16); + spy_primitive!(serialize_i32, i32); + spy_primitive!(serialize_i64, i64); + spy_primitive!(serialize_u8, u8); + spy_primitive!(serialize_u16, u16); + spy_primitive!(serialize_u32, u32); + spy_primitive!(serialize_u64, u64); + spy_primitive!(serialize_f32, f32); + spy_primitive!(serialize_f64, f64); + spy_primitive!(serialize_char, char); + spy_primitive!(serialize_str, &str); + spy_primitive!(serialize_bytes, &[u8]); + + fn serialize_none(self) -> Result { + Ok(()) + } + + fn serialize_some(self, value: &T) -> Result { + value.serialize(SpySerializer::new(self.tracker)) + } + + fn serialize_unit(self) -> Result { + Ok(()) + } + + fn serialize_unit_struct(self, _name: &'static str) -> Result { + Ok(()) + } + + fn serialize_unit_variant( + self, + _name: &'static str, + _variant_index: u32, + _variant: &'static str, + ) -> Result { + Ok(()) + } + + fn serialize_newtype_struct( + self, + name: &'static str, + value: &T, + ) -> Result { + { + let tracker = self.tracker.borrow(); + if tracker.interned_names.contains(name) { + drop(tracker); + self.tracker.borrow_mut().record_interned(); + return Ok(()); + } + } + // Not an interned type; push context and recurse. + self.tracker + .borrow_mut() + .context + .push(ParentKind::Newtype(name.to_string())); + let result = value.serialize(SpySerializer::new(self.tracker.clone())); + self.tracker.borrow_mut().context.pop(); + result + } + + fn serialize_newtype_variant( + self, + _name: &'static str, + _variant_index: u32, + variant: &'static str, + value: &T, + ) -> Result { + self.tracker + .borrow_mut() + .context + .push(ParentKind::Newtype(variant.to_string())); + let result = value.serialize(SpySerializer::new(self.tracker.clone())); + self.tracker.borrow_mut().context.pop(); + result + } + + fn serialize_seq(self, _len: Option) -> Result { + Ok(SpyCompound::new(self.tracker, "", false)) + } + + fn serialize_tuple(self, _len: usize) -> Result { + Ok(SpyCompound::new(self.tracker, "", false)) + } + + fn serialize_tuple_struct( + self, + name: &'static str, + _len: usize, + ) -> Result { + // On newer nightlies (post-ThreadLocalIndex), interned types like + // Ty(usize, ThreadLocalIndex) serialize as tuple structs instead of + // newtype structs. Detect them the same way we detect newtypes. + { + let tracker = self.tracker.borrow(); + if tracker.interned_names.contains(name) { + drop(tracker); + self.tracker.borrow_mut().record_interned(); + // Return a compound that ignores its fields; we've already + // recorded the finding and don't need to recurse deeper. + return Ok(SpyCompound::new(self.tracker, "", false)); + } + } + Ok(SpyCompound::new(self.tracker, name, false)) + } + + fn serialize_tuple_variant( + self, + _name: &'static str, + _variant_index: u32, + variant: &'static str, + _len: usize, + ) -> Result { + Ok(SpyCompound::new(self.tracker, variant, false)) + } + + fn serialize_map(self, _len: Option) -> Result { + Ok(SpyCompound::new(self.tracker, "", true)) + } + + fn serialize_struct( + self, + _name: &'static str, + _len: usize, + ) -> Result { + Ok(SpyCompound::new(self.tracker, "", true)) + } + + fn serialize_struct_variant( + self, + _name: &'static str, + _variant_index: u32, + variant: &'static str, + _len: usize, + ) -> Result { + Ok(SpyCompound::new(self.tracker, variant, true)) + } +} + +// ── Compound trait impls ──────────────────────────────────────────────────── + +impl ser::SerializeSeq for SpyCompound { + type Ok = (); + type Error = SpyError; + + fn serialize_element(&mut self, value: &T) -> Result<(), Self::Error> { + self.tracker + .borrow_mut() + .context + .push(ParentKind::SeqElement(self.index)); + let result = value.serialize(SpySerializer::new(self.tracker.clone())); + self.tracker.borrow_mut().context.pop(); + self.index += 1; + result + } + + fn end(self) -> Result { + Ok(()) + } +} + +impl ser::SerializeTuple for SpyCompound { + type Ok = (); + type Error = SpyError; + + fn serialize_element(&mut self, value: &T) -> Result<(), Self::Error> { + let parent = if self.name.is_empty() { + ParentKind::SeqElement(self.index) + } else { + ParentKind::TupleEntry(self.name.clone(), self.index) + }; + self.tracker.borrow_mut().context.push(parent); + let result = value.serialize(SpySerializer::new(self.tracker.clone())); + self.tracker.borrow_mut().context.pop(); + self.index += 1; + result + } + + fn end(self) -> Result { + Ok(()) + } +} + +impl ser::SerializeTupleStruct for SpyCompound { + type Ok = (); + type Error = SpyError; + + fn serialize_field(&mut self, value: &T) -> Result<(), Self::Error> { + let parent = if self.name.is_empty() { + ParentKind::SeqElement(self.index) + } else { + ParentKind::TupleEntry(self.name.clone(), self.index) + }; + self.tracker.borrow_mut().context.push(parent); + let result = value.serialize(SpySerializer::new(self.tracker.clone())); + self.tracker.borrow_mut().context.pop(); + self.index += 1; + result + } + + fn end(self) -> Result { + Ok(()) + } +} + +impl ser::SerializeTupleVariant for SpyCompound { + type Ok = (); + type Error = SpyError; + + fn serialize_field(&mut self, value: &T) -> Result<(), Self::Error> { + self.tracker + .borrow_mut() + .context + .push(ParentKind::TupleEntry(self.name.clone(), self.index)); + let result = value.serialize(SpySerializer::new(self.tracker.clone())); + self.tracker.borrow_mut().context.pop(); + self.index += 1; + result + } + + fn end(self) -> Result { + Ok(()) + } +} + +impl ser::SerializeMap for SpyCompound { + type Ok = (); + type Error = SpyError; + + fn serialize_key(&mut self, _key: &T) -> Result<(), Self::Error> { + // We don't track map keys for now; they'd need to be captured as + // strings for the Field context, which requires serializing the key + // to a string. For SmirJson's purposes, all maps are HashMap + // and the keys aren't interned indices themselves. + Ok(()) + } + + fn serialize_value(&mut self, value: &T) -> Result<(), Self::Error> { + value.serialize(SpySerializer::new(self.tracker.clone())) + } + + fn end(self) -> Result { + Ok(()) + } +} + +impl ser::SerializeStruct for SpyCompound { + type Ok = (); + type Error = SpyError; + + fn serialize_field( + &mut self, + key: &'static str, + value: &T, + ) -> Result<(), Self::Error> { + self.tracker + .borrow_mut() + .context + .push(ParentKind::Field(key.to_string())); + let result = value.serialize(SpySerializer::new(self.tracker.clone())); + self.tracker.borrow_mut().context.pop(); + result + } + + fn end(self) -> Result { + Ok(()) + } +} + +impl ser::SerializeStructVariant for SpyCompound { + type Ok = (); + type Error = SpyError; + + fn serialize_field( + &mut self, + key: &'static str, + value: &T, + ) -> Result<(), Self::Error> { + self.tracker + .borrow_mut() + .context + .push(ParentKind::Field(key.to_string())); + let result = value.serialize(SpySerializer::new(self.tracker.clone())); + self.tracker.borrow_mut().context.pop(); + result + } + + fn end(self) -> Result { + Ok(()) + } +} + +// ── Public API ────────────────────────────────────────────────────────────── + +/// Serialize `value` through the spy serializer to discover which JSON paths +/// carry interned indices. Returns a [`Receipts`] describing the findings. +pub fn collect_receipts(value: &T) -> Receipts { + let tracker = Rc::new(RefCell::new(Tracker::new())); + // The spy serializer never fails (SpyError is unreachable in practice), + // but we handle it gracefully just in case. + let _ = value.serialize(SpySerializer::new(tracker.clone())); + let tracker = Rc::try_unwrap(tracker) + .expect("tracker should have a single owner after serialization") + .into_inner(); + tracker.into_receipts() +} + +#[cfg(test)] +mod tests { + use super::*; + + /// Minimal stand-in for a newtype that mimics stable_mir's Ty(usize). + #[derive(serde::Serialize)] + struct Ty(usize); + + /// Minimal stand-in for Span(usize). + #[derive(serde::Serialize)] + struct Span(usize); + + /// Minimal stand-in for AllocId(usize). + #[derive(serde::Serialize)] + struct AllocId(usize); + + /// An enum variant wrapping Ty, like stable_mir's GenericArgKind::Type. + #[derive(serde::Serialize)] + enum GenArg { + Type(Ty), + Const(u64), + } + + /// A tuple variant carrying a Ty at a known position, like Cast or Closure. + #[derive(serde::Serialize)] + enum Rvalue { + Cast(String, String, Ty), + Use(String), + } + + /// A struct with named fields containing interned types. + #[derive(serde::Serialize)] + struct Statement { + kind: String, + span: Span, + } + + #[derive(serde::Serialize)] + struct TestRoot { + items: Vec, + args: Vec, + rvalues: Vec, + alloc_id: AllocId, + } + + #[test] + fn discovers_interned_keys() { + let root = TestRoot { + items: vec![Statement { + kind: "Assign".into(), + span: Span(42), + }], + args: vec![GenArg::Type(Ty(7)), GenArg::Const(99)], + rvalues: vec![ + Rvalue::Cast("Unsize".into(), "op".into(), Ty(13)), + Rvalue::Use("x".into()), + ], + alloc_id: AllocId(5), + }; + + let receipts = collect_receipts(&root); + + // "span" should be in interned_keys (struct field) + assert!( + receipts.interned_keys.contains("span"), + "expected 'span' in interned_keys, got: {:?}", + receipts.interned_keys + ); + + // "alloc_id" should be in interned_keys (struct field) + assert!( + receipts.interned_keys.contains("alloc_id"), + "expected 'alloc_id' in interned_keys, got: {:?}", + receipts.interned_keys + ); + + // "Type" should be in interned_newtypes (enum newtype variant) + assert!( + receipts.interned_newtypes.contains("Type"), + "expected 'Type' in interned_newtypes, got: {:?}", + receipts.interned_newtypes + ); + + // Cast[2] should be in interned_positions + assert!( + receipts + .interned_positions + .get("Cast") + .map_or(false, |s| s.contains(&2)), + "expected Cast[2] in interned_positions, got: {:?}", + receipts.interned_positions + ); + + // Seeded values should always be present, even if the test type + // doesn't exercise the code paths that produce them. + for key in SEEDED_INTERNED_KEYS { + assert!( + receipts.interned_keys.contains(*key), + "expected seeded key '{}' in interned_keys, got: {:?}", + key, + receipts.interned_keys + ); + } + for nt in SEEDED_INTERNED_NEWTYPES { + assert!( + receipts.interned_newtypes.contains(*nt), + "expected seeded newtype '{}' in interned_newtypes, got: {:?}", + nt, + receipts.interned_newtypes + ); + } + for (name, idxs) in SEEDED_INTERNED_POSITIONS { + for idx in *idxs { + assert!( + receipts + .interned_positions + .get(*name) + .map_or(false, |s| s.contains(idx)), + "expected seeded position {}[{}] in interned_positions, got: {:?}", + name, + idx, + receipts.interned_positions + ); + } + } + } +}