Use Opus as the brain. Let cheap models do the work.
A pattern for Claude Code that cuts API costs by 60-80% on multi-step tasks. Opus stays in the orchestrator seat (planning, decisions, review). Haiku, Sonnet, Gemini, Kimi, MiniMax, and local models (Gemma) handle the mechanical execution.
Built on two dispatch paths:
- Agent tool (Claude models) — subagents with full file access
- ask.py (any provider) — text-in/text-out CLI for code generation, research, video analysis
Running Claude Code on Opus is powerful but expensive. Most turns in a session are mechanical — reading files, writing boilerplate, simple edits. You're paying Opus rates for work that Haiku could handle.
Anthropic's Advisor Tool (beta) solves this at the API level — a cheap executor model consults Opus for strategic guidance mid-generation. But it's API-only. You can't use it inside Claude Code sessions.
This repo recreates that pattern inside Claude Code using existing tools:
- Opus orchestrates from the main session
- Subagents on cheaper models do the file work
- A CLI tool (
ask.py) routes to any provider for text generation - The orchestrator reviews everything before it ships
You (Opus) ─── plan ──────────────────────────────────────── review
│ ▲
├── Agent(haiku) → edit file A ────────────────┤
├── Agent(haiku) → edit file B ────────────────┤
├── ask.py -m kimi → generate util code ──────┤
├── ask.py -m gemini --video → analyze clip ──┤
└── Agent(sonnet) → complex refactor ─────────┘
Opus tokens go to: planning, architecture, tradeoff decisions, reviewing output. Cheap model tokens go to: file reads, edits, code generation, boilerplate, research.
Copy ask.py to your project:
cp ask.py your-project/tools/ask.pyThe tool reads from macOS Keychain. Store your keys:
# Anthropic (required)
security add-generic-password -s "com.your-project.keys" -a "apiKey_anthropic" -w "sk-ant-..." -U
# Google Gemini (optional)
security add-generic-password -s "com.your-project.keys" -a "apiKey_gemini" -w "AI..." -U
# Moonshot/Kimi (optional)
security add-generic-password -s "com.your-project.keys" -a "apiKey_moonshot" -w "sk-..." -U
# MiniMax (optional)
security add-generic-password -s "com.your-project.keys" -a "apiKey_minimax" -w "..." -UOr modify the KEYCHAIN_SERVICE and KEYCHAIN_ACCOUNTS in ask.py to match your own keychain setup.
Not on macOS? Replace the get_key() function with environment variable reads:
def get_key(provider: str) -> str:
env_map = {
"anthropic": "ANTHROPIC_API_KEY",
"moonshot": "MOONSHOT_API_KEY",
"google": "GEMINI_API_KEY",
"minimax": "MINIMAX_API_KEY",
}
key = os.environ.get(env_map.get(provider, ""))
if not key:
print(f"Missing env var for {provider}", file=sys.stderr)
sys.exit(1)
return key# Install Ollama
brew install ollama
brew services start ollama
# Pull Gemma 4 (26B, ~10GB)
ollama pull gemma4
# Pull Gemma 3 (4B, ~3GB) for fast lightweight tasks
ollama pull gemma3:4bPaste the routing instructions into your project's CLAUDE.md so every Claude Code session knows the pattern. See claude-md-snippet.md for a ready-to-paste block.
Copy advisor.md to .claude/commands/advisor.md in your project. This gives you /advisor as a slash command for structured multi-task execution.
# Anthropic models
python3 tools/ask.py -m haiku "Simple question"
python3 tools/ask.py -m sonnet "Moderate complexity task"
# Kimi (Moonshot) — cheap, 128k context
python3 tools/ask.py -m kimi "Quick code generation task"
python3 tools/ask.py -m kimi-think "Complex reasoning problem"
python3 tools/ask.py -m kimi-swarm "Research task needing parallel decomposition"
# Gemini — strong coder, video analysis
python3 tools/ask.py -m gemini "Generate a React component for..."
python3 tools/ask.py -m gemini --video clip.mp4 "Review this edit for pacing"
# MiniMax — cheap, multilingual
python3 tools/ask.py -m minimax "Simple utility function"
# Local (Ollama) — free, private, offline
python3 tools/ask.py -m gemma "Draft some copy for..."
python3 tools/ask.py -m gemma-small "Reformat this JSON"
# With system prompt
python3 tools/ask.py -m kimi -s "You are a Python expert" "Write a decorator for..."
# Pipe input
cat large_file.txt | python3 tools/ask.py -m kimi --stdinInside a Claude Code session, dispatch work to cheaper models:
Agent({
description: "Add input validation to form.tsx",
model: "haiku",
prompt: "Add email validation to the signup form at src/components/form.tsx. Use zod schema validation. The form currently has name and email fields with no validation."
})
/advisor Add a password reset flow to the auth system
Opus breaks it down, dispatches each task to the cheapest capable model, handles escalations, reviews everything.
| Complexity | Model | Path | Why |
|---|---|---|---|
| Complex (multi-file, refactors) | sonnet |
Agent tool | Strong coder, file access |
| Complex (second opinion) | gemini |
ask.py | Different perspective |
| Simple (single file, clear spec) | haiku |
Agent tool | Cheap, file access |
| Simple (single function, util) | kimi / minimax |
ask.py | Cheaper than Haiku |
| Simple (no cost needed) | gemma |
ask.py (local) | Free |
| Task | Model | Why |
|---|---|---|
| Multi-step research | kimi-swarm |
100 parallel sub-agents |
| Deep analysis | kimi-think |
Extended reasoning |
| Long documents | kimi |
128k context |
| Video analysis | gemini --video |
Native video understanding |
| Drafts, copy | gemma |
Free, local |
| Bulk generation | gemma |
No API cost at scale |
Kimi K2.5 has four operating modes, all on the same model:
| Mode | Flag | What it does |
|---|---|---|
| Instant | kimi |
Standard fast inference |
| Thinking | kimi-think |
Extended reasoning (like Claude's thinking) |
| Agent | kimi-agent |
Single agentic loop with tool use |
| Swarm | kimi-swarm |
Spawns up to 100 parallel sub-agents, 1500 tool calls |
Agent Swarm is automatic — you give it a complex task and the model decomposes and parallelizes it internally. No framework needed.
Gemini natively accepts video files. No frame extraction, no local vision model — it sees the actual video with temporal understanding.
# Scene descriptions
python3 tools/ask.py -m gemini --video footage.mp4 \
"Describe each scene chronologically, noting camera movement, subjects, and mood"
# Edit review
python3 tools/ask.py -m gemini --video rough_cut.mp4 \
"Review this edit for pacing. Note cuts that feel too fast or too slow."
# Shot list
python3 tools/ask.py -m gemini --video clip.mov \
"Create a timestamped shot list with scene descriptions and shot types"
# Social content review
python3 tools/ask.py -m gemini --video reel.mp4 \
"Is this suitable for Instagram? Check visual quality, pacing, hook in first 3s"The video is uploaded to Gemini's Files API, analyzed, then deleted. Works with .mp4, .mov, .mkv, .avi, .webm.
If you're building your own agents via the API (not Claude Code), you can use Anthropic's actual Advisor Tool (beta). This is a server-side tool where the executor model consults Opus mid-generation — all within a single API request.
import anthropic
client = anthropic.Anthropic()
response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
betas=["advisor-tool-2026-03-01"],
tools=[
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6",
"max_uses": 2,
}
],
messages=[
{"role": "user", "content": "Build a concurrent worker pool in Go with graceful shutdown."}
],
)The executor (Sonnet) decides when to consult the advisor (Opus). The advisor sees the full transcript and returns strategic guidance. One API request, no extra round trips.
| Executor | Advisor |
|---|---|
| Haiku 4.5 | Opus 4.6 |
| Sonnet 4.6 | Opus 4.6 |
| Opus 4.6 | Opus 4.6 |
Rough example — a 10-task coding session:
| Approach | Token distribution | Relative cost |
|---|---|---|
| All Opus | 100% Opus | 1x |
| Advisor pattern | 20% Opus (plan + review) + 60% Haiku + 20% Sonnet | ~0.25x |
| With local models | 20% Opus + 40% Haiku + 20% Sonnet + 20% Gemma (free) | ~0.20x |
The exact savings depend on your workload, but the principle holds: most turns in a coding session are mechanical. Pay for intelligence only when you need it.
- macOS (for Keychain — see setup for env var alternative)
- Python 3.10+
- Claude Code CLI
- API keys for providers you want to use
- Ollama (optional, for local models)
| File | Purpose |
|---|---|
ask.py |
Multi-provider LLM CLI |
advisor.md |
Slash command for /advisor |
claude-md-snippet.md |
Ready-to-paste CLAUDE.md routing instructions |
README.md |
This file |
MIT