Skip to content

Commit bfc2bac

Browse files
committed
feat: simulation suite runner (npm run sim)
## ELI5 **Problem.** The engine could *create* simulation suites and track them in state, and AGENTS.md described `simulations/suites/` as a first-class resource type. But there was no `npm run` command to actually *execute* a suite. `npm run eval` exists but runs the *legacy* `/evals` endpoint — a different thing — and the naming overlap actively misled engineers into running the wrong command. To fire a simulation suite from the CLI you had to write raw curl or go to the dashboard UI (losing reproducibility). **What this fix does.** Adds `npm run sim`. Two shapes: ``` npm run sim -- <org> --suite <name> --target <assistant-or-squad> npm run sim -- <org> --simulations <n1>,<n2> --target <assistant> ``` Resolves local resource names → state-file UUIDs the same way `npm run call` does, POSTs `/eval/simulation/run`, polls the run status, prints a summary table (pass/fail per simulation, mean run time, structured-output evals). **Outcome you'll notice.** Simulation suites become a normal part of the gitops workflow: author the suite as YAML, push it via `npm run push`, run it via `npm run sim`. No more dashboard clicking. Note the AGENTS.md call-out clarifying the difference between `npm run sim` (unified `/eval/simulation/*`) and `npm run eval` (legacy `/evals`) — renaming `eval` to disambiguate is a separate, backwards-incompatible follow-up. --- Engine fully tracks simulation suites in state and AGENTS.md describes simulations/suites/ as a first-class resource type, but there's no npm run command to actually execute one. npm run eval runs the legacy /evals endpoint, not the unified simulation runner. Customers go to the dashboard UI to trigger runs (losing reproducibility) or write per-customer shell wrappers. - src/sim.ts (NEW): runSimulationSuite + runSimulationsByName helpers. Resolves local-name → UUID via state file; POSTs /eval/simulation/run; polls /eval/simulation/run/:id until completion; prints pass/fail summary per simulation with mean run time + structured-output evals. Reuses src/api.ts:vapiRequest for HTTP and the local-name → UUID resolution pattern from src/eval.ts. - src/sim-cmd.ts (NEW): CLI entry. Args: npm run sim -- <org> --suite <name> --target <assistant-or-squad> npm run sim -- <org> --simulations <n1>,<n2> --target <assistant> npm run sim -- <org> --suite <name> --watch - package.json: sim script. - AGENTS.md: document npm run sim alongside npm run eval (call out the legacy /evals vs unified /eval/simulation/* distinction). - tests/sim.test.ts: arg parsing, UUID resolution, status polling, summary table formatting. Note: renaming npm run eval to disambiguate is a follow-up — that's a backwards-incompatible script-name change. For now the AGENTS.md note calls out the distinction. Closes improvements.md #16. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
1 parent cd00da7 commit bfc2bac

6 files changed

Lines changed: 604 additions & 1 deletion

File tree

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -751,6 +751,7 @@ npm run push -- <org> --dry-run # Preview without applying an
751751
npm run push -- <org> --strict # Abort push if any validator returns an error
752752
npm run apply -- <org> # Pull then push (full sync)
753753
npm run validate -- <org> # Lint resources locally (fails fast on schema drift)
754+
npm run sim -- <org> --suite <name> --target <name> # Run a simulation suite against an assistant/squad
754755
755756
# Testing
756757
npm run call -- <org> -a <assistant-name> # Call an assistant via WebSocket

improvements.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ you which stack PR closes the row.**
6767
| 13 | `.agent/` and `.claude/handoffs/` not gitignored | `git add -A` sweeps PII handoff scratch | None | RESOLVED 2026-04-30 (Stack A) |
6868
| 14 | Multi-file push undocumented | Discoverability | None | RESOLVED 2026-04-30 (Stack A) |
6969
| 15 | Scoped push rewrites entire state file | Pre-existing drift sweeps into focused commits | #4 | Open (Stack J planned) |
70-
| 16 | No CLI runner for simulation suites | Engine pushes them, can't run them | None | Open (Stack E planned) |
70+
| 16 | No CLI runner for simulation suites | Engine pushes them, can't run them | None | RESOLVED 2026-04-30 (Stack E) |
7171
| 17 | State file key-order churn produces noisy diffs | Reorderings hide real changes | None | RESOLVED 2026-04-30 (Stack B) |
7272
| 18 | Structured-output `name` capped at 40 chars (no warning) | Push fails partway after partial application | None | RESOLVED 2026-04-30 (Stack D) |
7373
| 19 | No `maxTokens` floor warning for tool-using assistants | `maxTokens: 1` bricks the assistant silently | None | RESOLVED 2026-04-30 (Stack D) |

package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
"cleanup": "tsx src/cleanup-cmd.ts",
1515
"eval": "tsx src/eval.ts",
1616
"validate": "tsx src/validate-cmd.ts",
17+
"sim": "tsx src/sim-cmd.ts",
1718
"build": "tsc --noEmit",
1819
"test": "node --import tsx --test tests/*.test.ts"
1920
},

src/sim-cmd.ts

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
// CLI entry: `npm run sim -- <org> --suite <name> --target <name>`
2+
//
3+
// Distinct from `npm run eval` (legacy /evals endpoint). See AGENTS.md and
4+
// improvements.md #16 for the rationale.
5+
6+
import {
7+
formatSummary,
8+
loadEnvFile,
9+
loadStateFile,
10+
resolveSelection,
11+
resolveTarget,
12+
runSimulation,
13+
} from "./sim.ts";
14+
15+
function printUsage(): void {
16+
console.error(
17+
[
18+
"Usage:",
19+
" npm run sim -- <org> --suite <suite-name> --target <assistant-or-squad-name>",
20+
" npm run sim -- <org> --simulations <name1>,<name2> --target <assistant-name>",
21+
"",
22+
"Options:",
23+
" --suite <name> Run an entire simulation suite by local resource name",
24+
" --simulations <list> Run one or more simulations by comma-separated local names",
25+
" --target <name> Local assistant or squad name (resolves to UUID via state)",
26+
" --transport voice|chat Transport (default: voice; chat is faster/cheaper)",
27+
" --iterations N Override default iteration count",
28+
" --watch Tail status until completion (default: on)",
29+
"",
30+
"Examples:",
31+
" npm run sim -- my-org --suite booking-tests --target intake-agent",
32+
" npm run sim -- my-org --simulations happy-path,edge-case --target main-agent --transport chat",
33+
].join("\n"),
34+
);
35+
}
36+
37+
interface ParsedArgs {
38+
env: string;
39+
suite?: string;
40+
simulations?: string;
41+
assistant?: string;
42+
squad?: string;
43+
transport?: "voice" | "chat";
44+
iterations?: number;
45+
watch: boolean;
46+
}
47+
48+
function parseArgs(): ParsedArgs {
49+
const args = process.argv.slice(2);
50+
const env = args[0];
51+
if (!env) {
52+
printUsage();
53+
process.exit(1);
54+
}
55+
const SLUG_RE = /^[a-z0-9]([a-z0-9-]*[a-z0-9])?$/;
56+
if (!SLUG_RE.test(env)) {
57+
console.error(`❌ Invalid org name: ${env}`);
58+
process.exit(1);
59+
}
60+
61+
const parsed: ParsedArgs = { env, watch: true };
62+
63+
for (let i = 1; i < args.length; i++) {
64+
const arg = args[i];
65+
if (arg === "--suite") parsed.suite = args[++i];
66+
else if (arg === "--simulations") parsed.simulations = args[++i];
67+
else if (arg === "--target") {
68+
// We don't know yet whether target is an assistant or squad — defer
69+
// resolution to the state lookup. Try assistant first; resolveTarget()
70+
// accepts either argument key, so we set the candidate in `assistant`
71+
// and let `resolveTarget` fall through to `squad` if not found.
72+
// For clarity, we accept --assistant / --squad as explicit alternatives.
73+
parsed.assistant = args[++i];
74+
} else if (arg === "--assistant") parsed.assistant = args[++i];
75+
else if (arg === "--squad") parsed.squad = args[++i];
76+
else if (arg === "--transport") {
77+
const v = args[++i];
78+
if (v === "voice" || v === "chat") parsed.transport = v;
79+
else {
80+
console.error(`❌ --transport must be "voice" or "chat" (got "${v}")`);
81+
process.exit(1);
82+
}
83+
} else if (arg === "--iterations") {
84+
parsed.iterations = parseInt(args[++i] ?? "", 10);
85+
if (Number.isNaN(parsed.iterations)) {
86+
console.error("❌ --iterations requires a number");
87+
process.exit(1);
88+
}
89+
} else if (arg === "--no-watch") parsed.watch = false;
90+
else if (arg === "--watch") parsed.watch = true;
91+
else if (arg === "--help" || arg === "-h") {
92+
printUsage();
93+
process.exit(0);
94+
}
95+
}
96+
97+
return parsed;
98+
}
99+
100+
async function main(): Promise<void> {
101+
const args = parseArgs();
102+
const cfg = loadEnvFile(args.env);
103+
const state = loadStateFile(args.env);
104+
105+
// Disambiguate --target: if the bare value matches a squad name in state
106+
// and not an assistant, treat it as a squad. Explicit --assistant / --squad
107+
// override the heuristic.
108+
let assistant = args.assistant;
109+
let squad = args.squad;
110+
if (assistant && !squad) {
111+
const isSquad =
112+
typeof state.squads[assistant] !== "undefined" &&
113+
typeof state.assistants[assistant] === "undefined";
114+
if (isSquad) {
115+
squad = assistant;
116+
assistant = undefined;
117+
}
118+
}
119+
120+
console.log(
121+
"═══════════════════════════════════════════════════════════════",
122+
);
123+
console.log(`🧪 Vapi GitOps Sim Runner — Environment: ${args.env}`);
124+
console.log(` API: ${cfg.baseUrl}`);
125+
console.log(
126+
"═══════════════════════════════════════════════════════════════\n",
127+
);
128+
129+
const selection = resolveSelection(state, {
130+
suite: args.suite,
131+
simulations: args.simulations,
132+
});
133+
const target = resolveTarget(state, { assistant, squad });
134+
135+
const summary = await runSimulation(cfg, selection, target, {
136+
watch: args.watch,
137+
iterations: args.iterations,
138+
transport: args.transport,
139+
});
140+
141+
console.log(`\n${formatSummary(summary)}\n`);
142+
143+
if (summary.fail > 0) {
144+
console.error(
145+
`❌ Simulation run failed (${summary.fail} fail / ${summary.pass} pass)`,
146+
);
147+
process.exit(1);
148+
}
149+
console.log("✅ Simulation run passed.");
150+
}
151+
152+
main().catch((error) => {
153+
console.error("\n❌ Sim failed:", error instanceof Error ? error.message : error);
154+
process.exit(1);
155+
});

0 commit comments

Comments
 (0)