feat: add #simulate command by rnbguy · Pull Request #6 · verse-lab/veil

rnbguy · 2026-03-16T03:05:01Z

Random-walk state exploration for Veil -- runs random traces checking invariants at each step.

It finds shallow invariant violations faster than #model_check (exhaustive BFS), but it is not complete.

Usage

-- basic
#simulate {}

-- with type/theory instantiation
#simulate { node := Fin 4 } { nextNode := fun n => n + 1 }

-- with config
#simulate {} (seed := 42, maxTraces := 1000, maxSteps := 50)

Benchmark

Search time only (Lean loading overhead subtracted) on my machine:

Example	`#model_check interpreted`	`#simulate`
DieHard	1.4s	111ms
RiverCrossing	2.4s	5ms
BuggyCircularBuffer	.9s	1ms
Traffic	1.2s	2ms
MutexViolation	22s	378ms

All five have known violations.

Disclaimer

Contains LLM generated code.

dranov · 2026-03-16T03:54:56Z

Thank you, @rnbguy! This looks good.

I'll have time to look at this more closely and merge it later in the week, after the OOPSLA deadline. (For future maintainability, I want to make sure #simulate and #model_check share as much code as possible.)

However, I'm wondering whether you're running into a bug with #model_check. We have two modes of operation for the model checker: (1) compiled and (2) interpreted.

By default, the way #model_check is supposed to work is it runs the model checker in interpreted mode while it does the compilation in the background (which can take quite long, as you're seeing). If the interpreted mode finds a violation, that gets displayed — there's no waiting for compilation to finish.

For me, #model_check for the benchmarks in your table all find a violation within 1 second. The timing you're seeing makes me think somehow only the compiled mode runs for you.

What do you see when you run #model_check? Is it something like this? (This shows the interpreted model checker running — states are being explored — whilst compilation happens in the background.)

rnbguy · 2026-03-16T04:50:51Z

hey @dranov ! Good luck with OOPSLA deadline 🍀 I am just playing around with Veil 😄 so, there is no rush.

You're correct. I was using CLI lake lean <example>.lean so I am sure it included the compilation too.

I just ran with #model_check interpreted {} {} and also validated the numbers on VSCode.

Example	`#model_check interpreted`	`#simulate`
DieHard	1.4s	111ms
RiverCrossing	2.4s	5ms
BuggyCircularBuffer	.9s	1ms
Traffic	1.2s	2ms
MutexViolation	22s	378ms

Thanks for taking the time to point this out. 🙌🏼

dranov · 2026-03-25T02:23:21Z

@rnbguy Apologies for the delay. I'll let @zqy1018 handle integrating this. He developed and is in charge of the model checker in Veil.

We'd want #simulate to have a soundness proof, similar to the soundness and completeness proof of #model_check's new version, and that might require a rewrite. @zqy1018 will look into it.

rnbguy · 2026-04-16T16:33:18Z

hey @zqy1018, the PR is ready for review.

#simulate proves soundness of the path now.
I reused #model_checker types for #simulate. please check if they make sense.
I added compiled and interpreted modes for #simulate, just like #model_check.
Please check #simulate config if they look alright.
I added three #simulate friendly examples. let me know if you have any questions about them.
I added multiple tests for #simulate. let me know if it's okay to keep them.

I also re-did the benchmark again from the above.

Example	`#model_check interpreted`	`#simulate interpreted`
DieHard	1s	0.5s
RiverCrossing	1s	0.2s
BuggyCircularBuffer	0.7s	0.2s
Traffic	1.1s	0.3s
MutexViolation	18s	1.2s
CheckpointLeaseFailover	t/o at 1m	1.3s
LeaseKeepaliveRace	t/o at 1m	2.3s
ReliableBroadcast	t/o at 1m	14.5s

zqy1018 · 2026-04-17T09:19:43Z

Great, thanks! I'll take a look now.

zqy1018 · 2026-04-28T06:43:36Z

+deriving Inhabited, Repr
+
+structure SimulateResult (ρ σ κ : Type) where
+  result : ModelCheckingResult ρ σ κ Unit


The use of Unit here feels a bit awkward. Would it make more sense to introduce a dedicated inductive type instead? For example, like:

inductive SimulationResult (ρ σ κ : Type) where | foundViolation (violation : ViolationKind) (viaTrace : Trace ρ σ κ) | cancelled

Since according to how ModelCheckingResult ρ σ κ Unit is produced in simulateOnceLoop, the noViolationFound constructor is not used.

yea. it's true that I tried to reuse some types from ModelCheck. I will keep things separate in later commits. For now, I will focus on your individual review comments.

zqy1018 · 2026-04-28T06:48:17Z

+@[inline]
+def violatedInvariantNames {ρ σ : Type}
+  (params : SearchParameters ρ σ) (th : ρ) (st : σ) : List Lean.Name :=
+  params.invariants.filterMap fun p =>
+    if !p.holdsOn th st then some p.name else none


There seems to be duplication here: violatedInvariantNames overlaps with how safetyViolations is computed in checkViolationsAndMaybeTerminate from Veil/Core/Tools/ModelChecker/Concrete/Core.lean. Can you move violatedInvariantNames to Veil/Core/Tools/ModelChecker/Interface.lean and use it in checkViolationsAndMaybeTerminate?

good catch ! 0ebceb9

zqy1018 · 2026-04-28T06:51:06Z

+@[inline]
+def filterInitStatesByConstraints {ρ σ κ : Type} {th₀ : ρ}
+  (sys : EnumerableTransitionSystem ρ (List ρ) σ (List σ) Int κ (List (κ × ExecutionOutcome Int σ)) th₀)
+  (params : SearchParameters ρ σ) (th : ρ) : List σ :=
+  if params.stateConstraints.isEmpty then sys.initStates
+  else sys.initStates.filter (params.satisfiesConstraints th)
+
+@[inline]
+def filterOutcomesByConstraints {ρ σ κ : Type} {th₀ : ρ}
+  (sys : EnumerableTransitionSystem ρ (List ρ) σ (List σ) Int κ (List (κ × ExecutionOutcome Int σ)) th₀)
+  (params : SearchParameters ρ σ) (th : ρ) (st : σ) : List (κ × ExecutionOutcome Int σ) :=
+  if params.stateConstraints.isEmpty then
+    sys.tr th st
+  else
+    (sys.tr th st).filter fun (_, outcome) =>
+      match outcome with
+      | .success st' => params.satisfiesConstraints th st'
+      | .assertionFailure _ st' => params.satisfiesConstraints th st'
+      | .divergence => true


This overlaps with the beginning let sys part of findReachable from Veil/Core/Tools/ModelChecker/Concrete/Checker.lean. Actually, the state constraints should only be used once throughout the simulation (and also model checking), namely initially using the state constraints to compute a restricted EnumerableTransitionSystem, and then never touch state constraints in the following code. This is exhibited in the code of findReachable. Can you do the same for simulation?

ad7cc3f & ebfbe4b

zqy1018 · 2026-04-28T06:52:24Z

+private def earlyTerminationReasonToJson (reason : EarlyTerminationReason Unit) : Json :=
+  match reason with
+  | .foundViolatingState _ violates => Json.mkObj [
+      ("kind", "found_violating_state"),
+      ("state_fingerprint", Json.null),
+      ("violates", toJson violates)
+    ]
+  | .deadlockOccurred _ => Json.mkObj [
+      ("kind", "deadlock_occurred"),
+      ("state_fingerprint", Json.null)
+    ]
+  | .assertionFailed _ exId => Json.mkObj [
+      ("kind", "assertion_failed"),
+      ("state_fingerprint", Json.null),
+      ("exception_id", toJson exId)
+    ]
+  | .reachedDepthBound depth => Json.mkObj [
+      ("kind", "reached_depth_bound"),
+      ("depth", toJson depth)
+    ]
+  | .reachedTraceLimit maxTraces => Json.mkObj [
+      ("kind", "reached_trace_limit"),
+      ("max_traces", toJson maxTraces)
+    ]
+  | .cancelled => Json.mkObj [("kind", "cancelled")]
+
+private def terminationReasonToJson (reason : TerminationReason Unit) : Json :=
+  match reason with
+  | .exploredAllReachableStates => Json.mkObj [("kind", "explored_all_reachable_states")]
+  | .earlyTermination condition => Json.mkObj [
+      ("kind", "early_termination"),
+      ("condition", earlyTerminationReasonToJson condition)
+    ]
+
+private def resultToJson {ρ σ κ : Type} [ToJson ρ] [ToJson σ] [ToJson κ]
+  (result : ModelCheckingResult ρ σ κ Unit) : Json :=
+  match result with
+  | .foundViolation _ violation trace => Json.mkObj
+      [ ("result", "found_violation")
+      , ("violation", toJson violation)
+      , ("trace", toJson trace)
+      , ("state_fingerprint", Json.null)
+      ]
+  | .noViolationFound exploredStates reason => Json.mkObj
+      [ ("result", "no_violation_found")
+      , ("explored_states", toJson exploredStates)
+      , ("termination_reason", terminationReasonToJson reason)
+      ]
+  | .cancelled => Json.mkObj [("result", "cancelled")]


These seem to overlap with the ToJson instances from Veil/Core/Tools/ModelChecker/Interface.lean?

zqy1018 · 2026-04-28T06:52:56Z

+instance instToJsonSimulateResult {ρ σ κ : Type} [ToJson ρ] [ToJson σ] [ToJson κ] : ToJson (SimulateResult ρ σ κ) where
+  toJson r := Json.mkObj [
+    ("result", resultToJson r.result),
+    ("traces_run", Lean.toJson r.tracesRun),
+    ("max_traces", Lean.toJson r.maxTraces),
+    ("elapsed_ms", Lean.toJson r.elapsedMs),
+    ("seed", Lean.toJson r.seed),
+    ("depth", Lean.toJson r.depth)
+  ]
+
+def SimulateResult.toDisplayJson {ρ σ κ : Type} [ToJson ρ] [ToJson σ] [ToJson κ]
+  (r : SimulateResult ρ σ κ) : Json :=
+  match resultToJson r.result with
+  | Json.obj kvs =>
+      Json.mkObj <| kvs.toList ++ [
+        ("traces_run", Lean.toJson r.tracesRun),
+        ("max_traces", Lean.toJson r.maxTraces),
+        ("elapsed_ms", Lean.toJson r.elapsedMs),
+        ("seed", Lean.toJson r.seed),
+        ("depth", Lean.toJson r.depth)
+      ]
+  | other =>
+      Json.mkObj [
+        ("result", other),
+        ("traces_run", Lean.toJson r.tracesRun),
+        ("max_traces", Lean.toJson r.maxTraces),
+        ("elapsed_ms", Lean.toJson r.elapsedMs),
+        ("seed", Lean.toJson r.seed),
+        ("depth", Lean.toJson r.depth)
+      ]


What is the intended distinction between instToJsonSimulateResult and SimulateResult.toDisplayJson? They look very similar. Could this duplication be eliminated?

zqy1018 · 2026-04-30T10:53:30Z

+structure CompiledCommandSpec where
+  exportedName : String
+  supportsParallelConfig : Bool := false
+
+structure CompilationKey where
+  sourceFile : String
+  exportedName : String
+  commandId : String
+  deriving BEq, Hashable, Inhabited


Can you add some comments to the fields of these two newly introduced structures? It's unclear what they mean just by looking at their names.

zqy1018 · 2026-04-30T12:31:08Z

+checking both explicit cancellation and whether this compilation is still current. -/
+def runProcessWithStatusCallback (sourceFile : String) (command : CompiledCommandSpec) (commandId : String)
+    (cfg : IO.Process.SpawnArgs)
+    (instanceId : Nat) (_statusPrefix : String) (cancelToken : IO.CancelToken)


_statusPrefix seems not used anywhere, can we remove it?

zqy1018 · 2026-04-30T13:55:48Z

+    `({})
+
+/-- Prepend `name` with `mod.name`. -/
+private def mkIdentWithModName' (mod : Module) (name : Name) : Ident :=


What's the meaning of this '?

zqy1018 · 2026-05-01T06:15:19Z

+  /-- Whether this progress entry is for `#simulate` rather than `#model_check`. -/
+  isSimulation : Bool := false
+  /-- Number of traces completed so far (simulation only). -/
+  tracesRun : Nat := 0
+  /-- Configured maximum trace budget (simulation only). -/
+  maxTraces : Nat := 0
+  /-- Depth reached in the current/last trace (simulation only). -/
+  simulationDepth : Nat := 0


The current Progress structure feels somewhat monolithic, especially with simulation-specific fields being added directly to it.

Would it make sense to model this as an inductive type instead, with separate constructors for model checking and simulation (each carrying their own payload)? Or were there specific reasons for not structuring it this way?

zqy1018 · 2026-05-01T06:16:15Z

+        {if p.isSimulation then statRow "Traces Run:" (toString p.tracesRun) else statRow "Diameter:" (toString p.diameter)}
+        {if p.isSimulation then statRow "Max Traces:" (toString p.maxTraces) else statRow "States Found:" (toString p.statesFound)}
+        {if p.isSimulation then statRow "Depth:" (toString p.simulationDepth) else statRow "Distinct States:" (toString p.distinctStates)}
+        {if p.isSimulation then .text "" else statRow "Queue:" (toString p.queue)}
        {statRow "Elapsed time:" (formatElapsedTime p.elapsedMs)}
      </tbody>
    </table>
-    {metricsHistoryHtml p.history}
-    {actionCoverageHtml p.actionStats p.allActionLabels}
+    {if p.isSimulation then .text "" else metricsHistoryHtml p.history}
+    {if p.isSimulation then .text "" else actionCoverageHtml p.actionStats p.allActionLabels}


This seems related to the design of Progress from Veil/Core/Tools/ModelChecker/Concrete/Progress.lean.

The rendering logic here feels somewhat ad-hoc, with many branches of the form if p.isSimulation then ... else .... This may become harder to maintain as the model checking and simulation features evolve independently.

Would it make sense to instead generate the HTML based on the kind of progress (e.g., separate rendering paths), so that the two cases do not share this much conditional logic?

zqy1018 · 2026-05-01T06:41:00Z

Thanks again for the contribution! This is very helpful, and especially for the extensive test cases.

Also, apologies for the delay on my side; it took me a while to go through everything carefully.

I've left quite a number of inline comments. To clarify the intent:

Some of them are concrete issues that should probably be addressed before merging
Some are suggestions for improvement or refactoring
Others are broader design observations that this PR happened to surface, and do not necessarily need to be resolved here

In particular, for the more structural/design-related comments, I'm totally fine with handling them in follow-up changes if that makes things easier. We can decide together what should be in scope for this PR.

There are also a few higher-level points that I didn't put as inline comments:

Compilation / temp folder behavior

The change to assign distinct names to temp folders makes sense in principle, to avoid conflicts across compilations of the same module. However, we've run into a couple of practical issues with this approach:

The compiled binaries can be quite large (~200MB+), so repeated compilations without cleanup may lead to significant disk usage
We also observe that compilation in a fresh temp folder sometimes fails on the first attempt, with the following error message:
```
uncaught exception: Failed to prune ProofWidgets cloud release: no such file or directory (error code: 2)
  file: .lake/packages/proofwidgets/.lake/build/lib
error: mathlib: failed to fetch cache
```
This currently makes it difficult to reliably run simulation and model checking in compilation mode (at least on my machine). Curious whether you've seen similar behavior on your side.

Given these issues, it might make sense to temporarily avoid introducing random suffixes for temp folders, and revisit this once the underlying problems are resolved.

Simulation config / elaboration

Regarding the changes in Elaborators.lean: it seems that for simulation config, each field can come from three sources (explicit arguments, options, or defaults), and the current logic is trying to distinguish these cases.

I'm slightly concerned that the current approach may be somewhat hardcoded, or relies on less common Lean mechanisms, which makes it a bit unclear how robust or maintainable it will be in the long run.

One possible direction could be to rely more directly on option resolution when defining defaults (e.g., via a small elaboration helper), so that the behavior is more uniform.

I also tried a small prototype along these lines; happy to share if useful, but this is definitely out of scope for this PR.

Test cases (compilation mode)

Thanks again for adding such extensive test cases! This is really valuable.

One small note: in our existing test suite, we currently do not include model checking in compilation mode. The main reason is the compilation workflow and binary size issues mentioned above.

Given that, would it be possible to temporarily switch those compilation-mode test cases to interpreted mode? We can revisit adding compilation-mode tests once the underlying issues are addressed.

On `Elaborators.lean`

I also noticed that Elaborators.lean has grown quite a bit in this PR. Since it now handles both model checking and simulation, the file is becoming fairly large and complex.

We may want to refactor this file after merging, to improve structure and enable better reuse across the two functionalities.

Happy to discuss any of the above, especially regarding what should be in scope for this PR. No rush at all — feel free to take your time going through the comments.

Thanks again for the great contribution!

rnbguy · 2026-05-05T11:11:07Z

hey @zqy1018 ! Many thanks for all the detailed comments. I will respond to all of them this week.

rnbguy added 2 commits March 16, 2026 04:04

add #simulate command for random-walk state exploration

434f5e1

lazy trace recording with replay-on-violation

f15aca5

rnbguy force-pushed the feat/simulate branch 2 times, most recently from 211e20b to f822ebe Compare March 16, 2026 05:10

rnbguy added 2 commits March 16, 2026 06:24

try/catch error handling with seed reporting in simulate loop

df4f8e5

share helpers between #model_check and #simulate, add widget support

543cce4

rnbguy force-pushed the feat/simulate branch from f822ebe to 543cce4 Compare March 16, 2026 05:37

restore docstrings and inline comments removed during refactor

1e9c481

rnbguy force-pushed the feat/simulate branch from d8c6b28 to 1e9c481 Compare March 16, 2026 05:47

rnbguy added 5 commits March 16, 2026 18:37

uncomment violationIsError option in MutexViolation example

2f12dd7

simplify assertion failure handling to single-pass in simulateOnceLoop

c066bee

remove totalSteps tracking from simulate pipeline

53cbc95

remove unused displayResultWidget, clean up formatting

e2783dd

add SharedCounter example where #simulate outperforms #model_check

542db5c

rnbguy added 13 commits April 12, 2026 05:22

Merge branch 'veil-2.0-preview' into feat/simulate

b5c3bef

fix: build

2c1598e

fix: qualify compilation status constructors

8201073

feat: align command architecture with #model_check

a8e9419

test: cover parity regressions

ed7f810

refactor: share executed path semantics

534c9f2

refactor: share pure and runtime trace loops

2fc5d47

feat: use runtime runner for command execution

96d39f0

refactor: clean result rendering semantics

7ad5f60

refactor: split simulation into modular files

12adb4c

feat: support assumptions checks

55e15e4

feat: add simulation-native results and progress

7f58696

refactor: add theorem-level soundness bridges

6075426

rnbguy added 16 commits April 15, 2026 01:07

refactor: drop unused path helpers

d81587c

refactor: remove unused simulate names

dd97553

chore: remove new proof warnings

d09d7d2

fix: make trace limits part of core results

dcf6735

fix: align display trace-limit counts

41f2602

test: add simulate violation mode regression

e9ad7d3

fix(model-checker): isolate compiled command instances

797f9f6

fix(simulate): keep default compilation running

7140a15

fix(simulate): tighten handoff cancellation and parity

c27a938

fix(examples): replace SharedCounter with lease race examples

ed23a67

fix(examples): add simulate-friendly reliable broadcast

3f9605f

fix(simulate): preserve explicit default config values

9971ba1

fix(simulate): show chosen seed in output

6da6208

fix(simulate): short-circuit empty initial states

48fdf21

test(simulate): make interpreted mode explicit

312d8e2

chore: reduce noises

17228a1

rnbguy marked this pull request as ready for review April 16, 2026 16:26

fix(simulate): align violation soundness with runtime semantics

fd86b56

fix(simulate): respect config bounds and omitted theory

6dd0370

zqy1018 reviewed May 1, 2026

View reviewed changes

fix(simulate): use dedicated simulation result

ec620dd

rnbguy marked this pull request as draft May 7, 2026 14:47

rnbguy added 4 commits May 7, 2026 18:43

refactor(model-checker): share invariant violation helper

0ebceb9

refactor(simulate): restrict constrained system once

18746f9

refactor(model-checker): move state constraint helper

ad7cc3f

refactor(model-checker): reuse state constraint helper

ebfbe4b

Conversation

rnbguy commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage

Benchmark

Disclaimer

Uh oh!

dranov commented Mar 16, 2026

Uh oh!

rnbguy commented Mar 16, 2026

Uh oh!

dranov commented Mar 25, 2026

Uh oh!

rnbguy commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zqy1018 commented Apr 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rnbguy May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zqy1018 commented May 1, 2026

Compilation / temp folder behavior

Simulation config / elaboration

Test cases (compilation mode)

On Elaborators.lean

Uh oh!

rnbguy commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rnbguy commented Mar 16, 2026 •

edited

Loading

rnbguy commented Apr 16, 2026 •

edited

Loading

rnbguy May 7, 2026 •

edited

Loading

On `Elaborators.lean`