Added a TLA+/TLC trace-validation baseline with trace export, runner integration, and experiment tooling.#90
Conversation
• Those tables are LeanGuard (Lean checkers) vs. a TLA+/TLC trace-validation baseline on the same Days *_events.csv traces: checker_ms is the Lean checker runtime, tlc_cmd_ms is just the TLC process, and tlc_total_ms includes trace export + staging + TLC; the ratio is tlc_total/checker. What’s done - TLC baseline implemented for all 6 protocols you listed: dcqcn, aqm, pfc, wfq, drr, cubic (see task.md#L18). - CUBIC TLC spec fixed/extended so it accepts configs/cubic_simple.toml (fixed-point CA update + small cwnd_bytes tolerance; tla/CubicTrace.tla#L26). - Full benchmarks run (5 reps) and recorded in task.md#L108 (raw JSON: logs/bench_leanguard_vs_tlc_2026-01-29.json). Where the results are - Combined per-protocol table: task.md#L108. - TLC baseline overview: tla/README.md#L1. Re-run - Bench: python3 utils/bench_leanguard_vs_tlc.py --reps 5 - Tests (all passing on my run): cargo test --features test -- --show-output
Bench results (all protocols)
- Updated results are in task.md:110 (raw data: logs/bench_leanguard_vs_tlc_2026-01-30.json).
- We’re benchmarking 6 protocols: dcqcn, aqm, pfc, wfq, drr, cubic (AQM often appears alongside other configs because the switch emits
aqm_events.csv in many runs).
TLC “reject” is now principled (not diameter-only)
- Added ProgressOk == IF l <= LenTrace THEN ENABLED Next ELSE TRUE to each TLC spec (e.g. tla/DcqcnTrace.tla:309) and enabled it via INVARIANT
ProgressOk in each .cfg (e.g. tla/DcqcnTrace.cfg:2).
- Updated leanguard-run to classify TLC invariant violations as reject and to recover the matched prefix even when TLC doesn’t print depth/
diameter (src/bin/leanguard-run.rs:790 and helpers around src/bin/leanguard-run.rs:922).
Correctness / test status
- Added runner-parsing coverage for invariant-based rejects and tool-wrapped output (tests/leanguard_run.rs:288, tests/leanguard_run.rs:401,
tests/leanguard_run.rs:485).
- cargo test --features test -- --show-output passes after these changes.
What we still need (for “paper-grade” confidence)
- Fault-injection agreement studies across protocols (Lean reject vs TLC reject + first-failure alignment) and memory reporting are still
listed as in-progress in task.md:45.
…plumbing in preparation for coverage experiments.
The Lean checkers now accepts: - --emit-coverpoint-catalog This prints the checker's coverpoint catalog as stable, sorted string names, one per line. Example: ```bash /Users/bli/Playground/days/lean/.lake/build/bin/dcqcn_check --emit-coverpoint-catalog /Users/bli/Playground/days/lean/.lake/build/bin/aqm_check --emit-coverpoint-catalog /Users/bli/Playground/days/lean/.lake/build/bin/pfc_check --emit-coverpoint-catalog ```
…N goal campaign. extended leanguard-testgen to support AQM goal calibration and targeted mutations in testgen.rs (line 809), testgen.rs (line 1802), testgen.rs (line 2192). New AQM result (seed dcqcn_simple.toml, goal ecn_threshold_drop_overflow): hit rate improves from 8% (iters=0) to 62% (iters=5) (testgen_dcqcn_calibration_summary.csv (line 6)), and the first time the campaign hits that coverpoint drops from 191 accepted tests (random) to 2 accepted tests (goal-calibrated) (experiments.tex (line 206)).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR added a reproducible TLA+/TLC trace-validation baseline for Days and integrated it into the LeanGuard runner + test-generation workflows so we can do an apples-to-apples comparison on the same canonicalized event traces (same ordering assumptions, same post-state “witness” fields, and comparable diagnostics).
Motivation
tla2tools.jar) rather than relying on research-only NDJSON ingestion modules, so the comparison stayed reproducible.Changes
Introduced a trace export pipeline that canonicalized
*_events.csvrows by(time_ns, event_id), rejected duplicate keys, and produced:TraceDataTLA module (rescales large numeric fields to fit TLC’s 32-bit integer limits).Added a standalone trace-export CLI that exported a single trace to NDJSON or to a generated
TraceDatamodule, enabling manual TLC debugging without the full runner.Extended
leanguard-runwith:simulate-and-checkvscheck-onlymodes,--coverage) and union/per-checker aggregation in the JSON summary.Added an optional TLC execution path (
--tlc-check) that:TraceData),java -cp <tla2tools.jar> tlc2.TLC ...or an explicit wrapper executable,l = ...fallback),(index, time_ns, event_id, kind)by cross-referencing the NDJSON export,Added optional peak RSS sampling (
--measure-rss) for both LeanGuard checkers and TLC runs (via periodic RSS polling) so memory comparisons could be reported alongside runtimes.Implemented protocol-specific TLA trace validators and model configurations for:
Added a coverage-driven test-generation CLI that:
Added experiment automation scripts that:
Added/updated documentation describing the baseline, the rescaling rules, TLC runner behavior, the coverage experiment plan, and end-to-end reproduction steps.
Updated ignore rules to keep local TLC jars, TLC-generated state directories, and Python bytecode out of version control.
How to run (examples)
Run LeanGuard checkers only
Run LeanGuard + TLC baseline
Export a single trace for manual TLC runs
Run test generation (seed index + fuzz)
Notes / caveats
Stored
TraceDatavalues were rescaled to fit TLC’s integer range:Added small tolerances in the specs where rescaling or float→int rounding made exact equality brittle (most notably for WFQ virtual time/finish time and for CUBIC congestion-avoidance updates).
Tests
cargo test --features test -- --show-output.