diff --git a/README.md b/README.md index 382c9c4..ced454d 100644 --- a/README.md +++ b/README.md @@ -34,6 +34,25 @@ EvalMonkey natively supports evaluating ANY LLM: **AWS Bedrock**, **Azure**, **G --- +## 📊 EvalMonkey Web Dashboard + +Visualize all your benchmark runs, track reliability scores over time, and inspect failure traces interactively! + +

+ EvalMonkey Dashboard +
+ EvalMonkey Main Dashboard showing scenario trends and score histories. +

+ +

+ Benchmark Run Results + MMLU Scenario View +
+ Deep-dive into specific benchmark runs and chaos tests. +

+ +--- + ## ⚡️ Quick Start ### Option A — Let Claude Code or Cursor set it up for you (30 seconds) diff --git a/assets/em-benchmark-run.png b/assets/em-benchmark-run.png new file mode 100644 index 0000000..909c4cf Binary files /dev/null and b/assets/em-benchmark-run.png differ diff --git a/assets/em-mmlu-run.png b/assets/em-mmlu-run.png new file mode 100644 index 0000000..c71e411 Binary files /dev/null and b/assets/em-mmlu-run.png differ diff --git a/assets/evalmonkey-dashboard.png b/assets/evalmonkey-dashboard.png new file mode 100644 index 0000000..46b85d6 Binary files /dev/null and b/assets/evalmonkey-dashboard.png differ