saagpatel · saagpatel · May 11, 2026 · May 11, 2026
diff --git a/.codex/verify.commands b/.codex/verify.commands
@@ -0,0 +1,4 @@
+# codex-os-managed
+pnpm install
+pnpm run build
+cargo test --manifest-path src-tauri/Cargo.toml
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -0,0 +1,18 @@
+name: Test (Rust)
+on:
+  push:
+    branches: [main, 'feat/**']
+  pull_request:
+    branches: [main]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: dtolnay/rust-toolchain@stable
+      - uses: Swatinem/rust-cache@v2
+        with:
+          workspaces: src-tauri
+      - run: cd src-tauri && cargo clippy -- -D warnings
+      - run: cd src-tauri && cargo nextest run || cargo test
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,43 @@
+# ModelColosseum Codex Playbook
+
+## Communication Contract
+
+Follow the global Codex communication contract. Keep updates short, beginner-friendly, and focused on what changed, what passed, and what still needs attention.
+
+## Project Goal
+
+ModelColosseum is a local-first Tauri 2 desktop app for evaluating Ollama models through arenas, benchmarks, sparring, scorecards, and a SQLite-backed leaderboard.
+
+## First Read
+
+- `README.md`
+- `CLAUDE.md`
+- `src-tauri/Cargo.toml`
+- `.codex/verify.commands`
+
+## Core Rules
+
+- Keep all model calls local to Ollama unless the user explicitly changes the product contract.
+- Do not add telemetry, cloud sync, or remote judging.
+- Keep SQLite as the source of truth under the app data path.
+- Keep Rust responsible for Ollama communication, scoring, Elo calculations, database writes, and streaming events.
+- Frontend should stay presentational/stateful; avoid duplicating scoring or persistence rules in React.
+- Do not assume Ollama is running; health check and fail gracefully.
+
+## Codex App Usage
+
+- Use Codex App Projects for repo-scoped implementation, debugging, and verification.
+- Use Worktrees for debate engine, benchmark runner, auto-judge, Elo, database migration, Ollama streaming, import/export, or Tauri capability changes.
+- Use file search before editing because behavior spans Rust engines, SQLite schema, prompt templates, Tauri commands/events, and React mode views.
+- Use app-window or browser evidence when arena, benchmark, sparring, leaderboard, settings, or export UI changes.
+- Use artifacts when benchmark results, scorecards, or comparison reports need reusable review.
+
+## Verification
+
+Use `.codex/verify.commands` as the canonical local gate. Current session note: Rust tests pass, while frontend build is blocked until `esbuild` is approved through pnpm build approval.
+
+## Done Criteria
+
+- The relevant verifier commands have been run, or the exact blocker is recorded.
+- Scoring, Elo, benchmark, and database changes have focused tests or fixture evidence.
+- UI changes have app-window or screenshot evidence when visual behavior matters.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -69,3 +69,64 @@ Key modules:
 - Do not use class components in React — hooks only
 - Do not store any data outside `~/.model-colosseum/` — single source of truth
 - Do not assume Ollama is running — always health check first and handle absence gracefully
+
+<!-- portfolio-context:start -->
+# Portfolio Context
+
+## What This Project Is
+
+ModelColosseum is an active local project in the /Users/d/Projects portfolio.
+
+## Current State
+
+**v1.0.0 — Feature Complete** (all phases done, audit remediation applied)
+
+- [x] **Phase 0: Foundation** — Tauri 2.0 scaffold, SQLite (13 tables, WAL), Ollama REST client, Elo module
+- [x] **Phase 1: Arena Mode** — Debate engine (freestyle/formal/socratic), vote + Elo, leaderboard, history
+- [x] **Phase 2: Benchmark** — CRUD suites/prompts, runner with TTFT/TPS metrics, manual + auto-judge scoring, blind comparison, hardware metrics, import/export
+- [x] **Phase 3: Sparring Ring** — Human vs AI debates, 3 difficulty levels, 4-phase structure, scorecards, user Elo
+- [x] **Phase 4: Polish** — 3 debate formats, topic suggestions, settings page, blind test, animations, skeleton loading, export (Markdown/CSV/JSON)
+- [x] **Audit** — Security hardening (configurable Ollama URL, query limit caps, settings key whitelist), accessibility (ARIA attributes), error handling, 67 Rust tests
+
+## Stack
+
+- Runtime: Tauri 2.x (Rust backend + webview frontend)
+- Frontend: React 19 + TypeScript 5.x strict mode
+- Build: Vite 6.x with `@tauri-apps/vite-plugin`
+- Styling: Tailwind CSS 4.x (dark theme, gold/amber accents)
+- State: Zustand 5.x
+- Routing: React Router 7.x
+- Charts: Recharts 2.x
+- Database: SQLite via `rusqlite` 0.31+ (bundled, WAL mode)
+- HTTP: `reqwest` 0.12+ (async streaming)
+- Async: `tokio` 1.x
+- System info: `sysinfo` 0.31+
+- LLM: Ollama REST API (localhost:11434)
+
+## How To Run
+
+- TypeScript strict mode. No `any` types.
+- React: Functional components with hooks only. No class components.
+- Rust: `clippy` clean. `cargo fmt` on save.
+- File naming: `snake_case.rs` for Rust, `PascalCase.tsx` for React components, `camelCase.ts` for utilities
+- Git commits: conventional commits (`feat:`, `fix:`, `refactor:`, `chore:`)
+- All Tauri commands return `Result<T, String>` — handle errors in Rust, display in frontend
+- Database writes wrapped in explicit transactions
+- No unwrap() in production Rust code — use ? operator or proper error handling
+
+## Known Risks
+
+- Do not scaffold the entire project in one session — follow the phased plan strictly
+- Do not use Tauri v1 APIs or import paths — this is Tauri 2.x (`@tauri-apps/api` v2)
+- Do not use `tauri-plugin-sql` — we use `rusqlite` directly
+- Do not use `unwrap()` in Rust production code — use `?` or proper error handling
+- Do not make any network calls except to localhost Ollama (no telemetry, no cloud)
+- Do not use class components in React — hooks only
+- Do not store any data outside `~/.model-colosseum/` — single source of truth
+- Do not assume Ollama is running — always health check first and handle absence gracefully
+
+## Next Recommended Move
+
+Use this context plus the README and supporting docs to resume the next active task, then promote the repo beyond minimum-viable by capturing a dedicated handoff, roadmap, or discovery artifact.
+
+<!-- portfolio-context:end -->
diff --git a/pnpm-workspace.yaml b/pnpm-workspace.yaml
@@ -0,0 +1,2 @@
+allowBuilds:
+  esbuild: true