Skip to content

feat: add #simulate command#6

Draft
rnbguy wants to merge 61 commits into
verse-lab:veil-2.0-previewfrom
rnbguy:feat/simulate
Draft

feat: add #simulate command#6
rnbguy wants to merge 61 commits into
verse-lab:veil-2.0-previewfrom
rnbguy:feat/simulate

Conversation

@rnbguy
Copy link
Copy Markdown

@rnbguy rnbguy commented Mar 16, 2026

Random-walk state exploration for Veil -- runs random traces checking invariants at each step.

It finds shallow invariant violations faster than #model_check (exhaustive BFS), but it is not complete.

Usage

-- basic
#simulate {}

-- with type/theory instantiation
#simulate { node := Fin 4 } { nextNode := fun n => n + 1 }

-- with config
#simulate {} (seed := 42, maxTraces := 1000, maxSteps := 50)

Benchmark

Search time only (Lean loading overhead subtracted) on my machine:

Example #model_check interpreted #simulate
DieHard 1.4s 111ms
RiverCrossing 2.4s 5ms
BuggyCircularBuffer .9s 1ms
Traffic 1.2s 2ms
MutexViolation 22s 378ms

All five have known violations.

Disclaimer

Contains LLM generated code.

@dranov
Copy link
Copy Markdown
Contributor

dranov commented Mar 16, 2026

Thank you, @rnbguy! This looks good.

I'll have time to look at this more closely and merge it later in the week, after the OOPSLA deadline. (For future maintainability, I want to make sure #simulate and #model_check share as much code as possible.)

However, I'm wondering whether you're running into a bug with #model_check. We have two modes of operation for the model checker: (1) compiled and (2) interpreted.

By default, the way #model_check is supposed to work is it runs the model checker in interpreted mode while it does the compilation in the background (which can take quite long, as you're seeing). If the interpreted mode finds a violation, that gets displayed — there's no waiting for compilation to finish.

For me, #model_check for the benchmarks in your table all find a violation within 1 second. The timing you're seeing makes me think somehow only the compiled mode runs for you.

What do you see when you run #model_check? Is it something like this? (This shows the interpreted model checker running — states are being explored — whilst compilation happens in the background.)

image

@rnbguy
Copy link
Copy Markdown
Author

rnbguy commented Mar 16, 2026

hey @dranov ! Good luck with OOPSLA deadline 🍀 I am just playing around with Veil 😄 so, there is no rush.

You're correct. I was using CLI lake lean <example>.lean so I am sure it included the compilation too.

I just ran with #model_check interpreted {} {} and also validated the numbers on VSCode.

Example #model_check interpreted #simulate
DieHard 1.4s 111ms
RiverCrossing 2.4s 5ms
BuggyCircularBuffer .9s 1ms
Traffic 1.2s 2ms
MutexViolation 22s 378ms

Thanks for taking the time to point this out. 🙌🏼

@rnbguy rnbguy force-pushed the feat/simulate branch 2 times, most recently from 211e20b to f822ebe Compare March 16, 2026 05:10
@dranov
Copy link
Copy Markdown
Contributor

dranov commented Mar 25, 2026

@rnbguy Apologies for the delay. I'll let @zqy1018 handle integrating this. He developed and is in charge of the model checker in Veil.

We'd want #simulate to have a soundness proof, similar to the soundness and completeness proof of #model_check's new version, and that might require a rewrite. @zqy1018 will look into it.

@rnbguy rnbguy marked this pull request as ready for review April 16, 2026 16:26
@rnbguy
Copy link
Copy Markdown
Author

rnbguy commented Apr 16, 2026

hey @zqy1018, the PR is ready for review.

  • #simulate proves soundness of the path now.
  • I reused #model_checker types for #simulate. please check if they make sense.
  • I added compiled and interpreted modes for #simulate, just like #model_check.
  • Please check #simulate config if they look alright.
  • I added three #simulate friendly examples. let me know if you have any questions about them.
  • I added multiple tests for #simulate. let me know if it's okay to keep them.

I also re-did the benchmark again from the above.

Example #model_check interpreted #simulate interpreted
DieHard 1s 0.5s
RiverCrossing 1s 0.2s
BuggyCircularBuffer 0.7s 0.2s
Traffic 1.1s 0.3s
MutexViolation 18s 1.2s
CheckpointLeaseFailover t/o at 1m 1.3s
LeaseKeepaliveRace t/o at 1m 2.3s
ReliableBroadcast t/o at 1m 14.5s

@zqy1018
Copy link
Copy Markdown
Contributor

zqy1018 commented Apr 17, 2026

Great, thanks! I'll take a look now.

deriving Inhabited, Repr

structure SimulateResult (ρ σ κ : Type) where
result : ModelCheckingResult ρ σ κ Unit
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of Unit here feels a bit awkward. Would it make more sense to introduce a dedicated inductive type instead? For example, like:

inductive SimulationResult (ρ σ κ : Type) where
  | foundViolation (violation : ViolationKind) (viaTrace : Trace ρ σ κ)
  | cancelled

Since according to how ModelCheckingResult ρ σ κ Unit is produced in simulateOnceLoop, the noViolationFound constructor is not used.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea. it's true that I tried to reuse some types from ModelCheck. I will keep things separate in later commits. For now, I will focus on your individual review comments.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +19 to +23
@[inline]
def violatedInvariantNames {ρ σ : Type}
(params : SearchParameters ρ σ) (th : ρ) (st : σ) : List Lean.Name :=
params.invariants.filterMap fun p =>
if !p.holdsOn th st then some p.name else none
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be duplication here: violatedInvariantNames overlaps with how safetyViolations is computed in checkViolationsAndMaybeTerminate from Veil/Core/Tools/ModelChecker/Concrete/Core.lean. Can you move violatedInvariantNames to Veil/Core/Tools/ModelChecker/Interface.lean and use it in checkViolationsAndMaybeTerminate?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch ! 0ebceb9

Comment on lines +25 to +43
@[inline]
def filterInitStatesByConstraints {ρ σ κ : Type} {th₀ : ρ}
(sys : EnumerableTransitionSystem ρ (List ρ) σ (List σ) Int κ (List (κ × ExecutionOutcome Int σ)) th₀)
(params : SearchParameters ρ σ) (th : ρ) : List σ :=
if params.stateConstraints.isEmpty then sys.initStates
else sys.initStates.filter (params.satisfiesConstraints th)

@[inline]
def filterOutcomesByConstraints {ρ σ κ : Type} {th₀ : ρ}
(sys : EnumerableTransitionSystem ρ (List ρ) σ (List σ) Int κ (List (κ × ExecutionOutcome Int σ)) th₀)
(params : SearchParameters ρ σ) (th : ρ) (st : σ) : List (κ × ExecutionOutcome Int σ) :=
if params.stateConstraints.isEmpty then
sys.tr th st
else
(sys.tr th st).filter fun (_, outcome) =>
match outcome with
| .success st' => params.satisfiesConstraints th st'
| .assertionFailure _ st' => params.satisfiesConstraints th st'
| .divergence => true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlaps with the beginning let sys part of findReachable from Veil/Core/Tools/ModelChecker/Concrete/Checker.lean. Actually, the state constraints should only be used once throughout the simulation (and also model checking), namely initially using the state constraints to compute a restricted EnumerableTransitionSystem, and then never touch state constraints in the following code. This is exhibited in the code of findReachable. Can you do the same for simulation?

Copy link
Copy Markdown
Author

@rnbguy rnbguy May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +6 to +54
private def earlyTerminationReasonToJson (reason : EarlyTerminationReason Unit) : Json :=
match reason with
| .foundViolatingState _ violates => Json.mkObj [
("kind", "found_violating_state"),
("state_fingerprint", Json.null),
("violates", toJson violates)
]
| .deadlockOccurred _ => Json.mkObj [
("kind", "deadlock_occurred"),
("state_fingerprint", Json.null)
]
| .assertionFailed _ exId => Json.mkObj [
("kind", "assertion_failed"),
("state_fingerprint", Json.null),
("exception_id", toJson exId)
]
| .reachedDepthBound depth => Json.mkObj [
("kind", "reached_depth_bound"),
("depth", toJson depth)
]
| .reachedTraceLimit maxTraces => Json.mkObj [
("kind", "reached_trace_limit"),
("max_traces", toJson maxTraces)
]
| .cancelled => Json.mkObj [("kind", "cancelled")]

private def terminationReasonToJson (reason : TerminationReason Unit) : Json :=
match reason with
| .exploredAllReachableStates => Json.mkObj [("kind", "explored_all_reachable_states")]
| .earlyTermination condition => Json.mkObj [
("kind", "early_termination"),
("condition", earlyTerminationReasonToJson condition)
]

private def resultToJson {ρ σ κ : Type} [ToJson ρ] [ToJson σ] [ToJson κ]
(result : ModelCheckingResult ρ σ κ Unit) : Json :=
match result with
| .foundViolation _ violation trace => Json.mkObj
[ ("result", "found_violation")
, ("violation", toJson violation)
, ("trace", toJson trace)
, ("state_fingerprint", Json.null)
]
| .noViolationFound exploredStates reason => Json.mkObj
[ ("result", "no_violation_found")
, ("explored_states", toJson exploredStates)
, ("termination_reason", terminationReasonToJson reason)
]
| .cancelled => Json.mkObj [("result", "cancelled")]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem to overlap with the ToJson instances from Veil/Core/Tools/ModelChecker/Interface.lean?

Comment on lines +56 to +85
instance instToJsonSimulateResult {ρ σ κ : Type} [ToJson ρ] [ToJson σ] [ToJson κ] : ToJson (SimulateResult ρ σ κ) where
toJson r := Json.mkObj [
("result", resultToJson r.result),
("traces_run", Lean.toJson r.tracesRun),
("max_traces", Lean.toJson r.maxTraces),
("elapsed_ms", Lean.toJson r.elapsedMs),
("seed", Lean.toJson r.seed),
("depth", Lean.toJson r.depth)
]

def SimulateResult.toDisplayJson {ρ σ κ : Type} [ToJson ρ] [ToJson σ] [ToJson κ]
(r : SimulateResult ρ σ κ) : Json :=
match resultToJson r.result with
| Json.obj kvs =>
Json.mkObj <| kvs.toList ++ [
("traces_run", Lean.toJson r.tracesRun),
("max_traces", Lean.toJson r.maxTraces),
("elapsed_ms", Lean.toJson r.elapsedMs),
("seed", Lean.toJson r.seed),
("depth", Lean.toJson r.depth)
]
| other =>
Json.mkObj [
("result", other),
("traces_run", Lean.toJson r.tracesRun),
("max_traces", Lean.toJson r.maxTraces),
("elapsed_ms", Lean.toJson r.elapsedMs),
("seed", Lean.toJson r.seed),
("depth", Lean.toJson r.depth)
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intended distinction between instToJsonSimulateResult and SimulateResult.toDisplayJson? They look very similar. Could this duplication be eliminated?

Comment on lines +22 to +30
structure CompiledCommandSpec where
exportedName : String
supportsParallelConfig : Bool := false

structure CompilationKey where
sourceFile : String
exportedName : String
commandId : String
deriving BEq, Hashable, Inhabited
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments to the fields of these two newly introduced structures? It's unclear what they mean just by looking at their names.

checking both explicit cancellation and whether this compilation is still current. -/
def runProcessWithStatusCallback (sourceFile : String) (command : CompiledCommandSpec) (commandId : String)
(cfg : IO.Process.SpawnArgs)
(instanceId : Nat) (_statusPrefix : String) (cancelToken : IO.CancelToken)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_statusPrefix seems not used anywhere, can we remove it?

`({})

/-- Prepend `name` with `mod.name`. -/
private def mkIdentWithModName' (mod : Module) (name : Name) : Ident :=
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the meaning of this '?

Comment on lines +85 to +92
/-- Whether this progress entry is for `#simulate` rather than `#model_check`. -/
isSimulation : Bool := false
/-- Number of traces completed so far (simulation only). -/
tracesRun : Nat := 0
/-- Configured maximum trace budget (simulation only). -/
maxTraces : Nat := 0
/-- Depth reached in the current/last trace (simulation only). -/
simulationDepth : Nat := 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current Progress structure feels somewhat monolithic, especially with simulation-specific fields being added directly to it.

Would it make sense to model this as an inductive type instead, with separate constructors for model checking and simulation (each carrying their own payload)? Or were there specific reasons for not structuring it this way?

Comment on lines +522 to +530
{if p.isSimulation then statRow "Traces Run:" (toString p.tracesRun) else statRow "Diameter:" (toString p.diameter)}
{if p.isSimulation then statRow "Max Traces:" (toString p.maxTraces) else statRow "States Found:" (toString p.statesFound)}
{if p.isSimulation then statRow "Depth:" (toString p.simulationDepth) else statRow "Distinct States:" (toString p.distinctStates)}
{if p.isSimulation then .text "" else statRow "Queue:" (toString p.queue)}
{statRow "Elapsed time:" (formatElapsedTime p.elapsedMs)}
</tbody>
</table>
{metricsHistoryHtml p.history}
{actionCoverageHtml p.actionStats p.allActionLabels}
{if p.isSimulation then .text "" else metricsHistoryHtml p.history}
{if p.isSimulation then .text "" else actionCoverageHtml p.actionStats p.allActionLabels}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems related to the design of Progress from Veil/Core/Tools/ModelChecker/Concrete/Progress.lean.

The rendering logic here feels somewhat ad-hoc, with many branches of the form if p.isSimulation then ... else .... This may become harder to maintain as the model checking and simulation features evolve independently.

Would it make sense to instead generate the HTML based on the kind of progress (e.g., separate rendering paths), so that the two cases do not share this much conditional logic?

@zqy1018
Copy link
Copy Markdown
Contributor

zqy1018 commented May 1, 2026

Thanks again for the contribution! This is very helpful, and especially for the extensive test cases.

Also, apologies for the delay on my side; it took me a while to go through everything carefully.

I've left quite a number of inline comments. To clarify the intent:

  • Some of them are concrete issues that should probably be addressed before merging
  • Some are suggestions for improvement or refactoring
  • Others are broader design observations that this PR happened to surface, and do not necessarily need to be resolved here

In particular, for the more structural/design-related comments, I'm totally fine with handling them in follow-up changes if that makes things easier. We can decide together what should be in scope for this PR.


There are also a few higher-level points that I didn't put as inline comments:

Compilation / temp folder behavior

The change to assign distinct names to temp folders makes sense in principle, to avoid conflicts across compilations of the same module. However, we've run into a couple of practical issues with this approach:

  • The compiled binaries can be quite large (~200MB+), so repeated compilations without cleanup may lead to significant disk usage

  • We also observe that compilation in a fresh temp folder sometimes fails on the first attempt, with the following error message:

    uncaught exception: Failed to prune ProofWidgets cloud release: no such file or directory (error code: 2)
      file: .lake/packages/proofwidgets/.lake/build/lib
    error: mathlib: failed to fetch cache
    

    This currently makes it difficult to reliably run simulation and model checking in compilation mode (at least on my machine). Curious whether you've seen similar behavior on your side.

Given these issues, it might make sense to temporarily avoid introducing random suffixes for temp folders, and revisit this once the underlying problems are resolved.


Simulation config / elaboration

Regarding the changes in Elaborators.lean: it seems that for simulation config, each field can come from three sources (explicit arguments, options, or defaults), and the current logic is trying to distinguish these cases.

I'm slightly concerned that the current approach may be somewhat hardcoded, or relies on less common Lean mechanisms, which makes it a bit unclear how robust or maintainable it will be in the long run.

One possible direction could be to rely more directly on option resolution when defining defaults (e.g., via a small elaboration helper), so that the behavior is more uniform.

I also tried a small prototype along these lines; happy to share if useful, but this is definitely out of scope for this PR.


Test cases (compilation mode)

Thanks again for adding such extensive test cases! This is really valuable.

One small note: in our existing test suite, we currently do not include model checking in compilation mode. The main reason is the compilation workflow and binary size issues mentioned above.

Given that, would it be possible to temporarily switch those compilation-mode test cases to interpreted mode? We can revisit adding compilation-mode tests once the underlying issues are addressed.


On Elaborators.lean

I also noticed that Elaborators.lean has grown quite a bit in this PR. Since it now handles both model checking and simulation, the file is becoming fairly large and complex.

We may want to refactor this file after merging, to improve structure and enable better reuse across the two functionalities.


Happy to discuss any of the above, especially regarding what should be in scope for this PR. No rush at all — feel free to take your time going through the comments.

Thanks again for the great contribution!

@rnbguy
Copy link
Copy Markdown
Author

rnbguy commented May 5, 2026

hey @zqy1018 ! Many thanks for all the detailed comments. I will respond to all of them this week.

@rnbguy rnbguy marked this pull request as draft May 7, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants