Skip to content

feat: plan mode — /plan toggle, submit_plan tool, and plan_model for dual-model execution #2788

@aheritier

Description

@aheritier

Problem

Agentic workflows that span many tool calls (write files, run commands, call GitHub APIs) give users no control point between "I described what I want" and "the agent already did it."

The only safeguard today is an instruction asking agents to present a plan and wait — a soft control that works most of the time but has no runtime enforcement, and cannot be toggled at runtime without modifying config files.

Two concrete gaps:

  1. No user-controlled mode switch. There is no way to type /plan in the TUI to enter a careful, plan-then-approve mode for a risky task — and then /plan off to return to fluid execution for a simple one.
  2. No cost-optimised planning. Planning requires deep reasoning (costly model, used once). Execution is mechanical (cheaper model, used N times). There is no way to route the planning turn to a different model than the execution turns.

Prior Art

Every mature agentic system has converged on this pattern:

System Plan format User toggle Dual-model?
Cursor Structured file list with diff preview Ask vs Agent mode; /multitask slash command; per-subagent model setting (May 2026) Yes — subagent model configurable separately from parent
GitHub Copilot Workspace JSON plan_action function call Phase gate: Task → Plan → Code; natural-language modify Yes — o1-preview for planning, gpt-4o for execution
OpenAI Operator JSON schema Structured approval gate in UI Yes — reasoning model for plan, fast model for execution
Devin JSON step tree Checkpoint-based pauses; user steers mid-execution Implicit (planning phase uses extended thinking budget)
Claude extended thinking <thinking> block + text response Application-layer convention (no built-in gate) Single model, separate token budget for reasoning
LangGraph Free-form or JSON HumanNode blocks graph traversal Framework-dependent

Key pattern across all: plan generation is separated from execution, the plan is shown to the user, and execution is gated on approval.

Cursor model-routing (verified, May 2026 changelog):

"Added the ability to control Explore subagent behavior from settings: choose a specific model for Explore subagents to run on, inherit the same model as the parent agent, or disable Explore subagents altogether."
"Added support for general model names for subagent configuration."
"/multitask is now available in the editor for running async subagents."

Anthropic extended thinking API (confirmed from SDK):

response = client.messages.create(
    model="claude-sonnet-4-5",
    thinking={"type": "enabled", "budget_tokens": 1600},
    ...
)
# Response contains separate `thinking` block and `text` block.
# The approval gate is the application's responsibility — no built-in pause.

Proposed Solution

Three composable additions:

1. submit_plan built-in tool

Agent calls submit_plan(title, steps, rationale) before any side-effecting action. The runtime renders the plan in the TUI, pauses, and waits for explicit user approval.

{
  "title": "Implement configurable retry policy",
  "steps": [
    {
      "description": "Add RetryConfig struct to pkg/config/latest/types.go",
      "tool": "edit_file",
      "target": "pkg/config/latest/types.go",
      "expected_outcome": "RetryConfig embedded in ToolsetConfig"
    },
    {
      "description": "task build / task test / task lint",
      "tool": "shell",
      "target": "task build && task test && task lint"
    }
  ],
  "rationale": "Follows existing sessions config pattern; backwards compat — defaults to no retry."
}

Tool output on approval:

{ "approved": true, "plan_id": "uuid" }

On rejection with user note:

{ "approved": false, "reason": "Don't touch dispatcher.go yet, design first" }

TUI rendering (proposed):

╭─ Plan: "Implement configurable retry policy" ──────────────────────╮
│  1. Edit pkg/config/latest/types.go → add RetryConfig struct       │
│  2. Edit pkg/runtime/dispatcher.go  → read RetryConfig per toolset │
│  3. Update agent-schema.json                                        │
│  4. task build / task test / task lint                              │
│                                                                     │
│  Rationale: follows sessions config pattern; backwards compat.      │
│                                                                     │
│  [Approve ↵]   [Reject esc]   [Modify: type changes ...]           │
╰─────────────────────────────────────────────────────────────────────╯

2. /plan TUI command (session-level toggle)

/plan           → enter plan mode for this session
/plan off       → exit plan mode; return to normal execution
/plan status    → show current mode

When plan mode is active, the runtime sets an in-memory flag that enforces plan_mode: strict for the session — no config file change needed. Users decide per-task whether they want a careful review cycle or fluid execution.

This follows the same pattern as Cursor's /multitask command: a slash command that toggles a session-level behaviour without touching config.

3. plan_model agent config field (dual-model execution)

agents:
  coder:
    model: sonnet       # execution model — fast, cheap, used every step
    plan_model: opus    # planning model — deep reasoning, used once per task
    plan_mode: soft     # off | soft | strict

When plan_model is set, the runtime switches to it for the submit_plan turn only, then reverts to model for all execution turns.

Why this matters:

  • Planning is a reasoning task. Opus/o1-class models do it significantly better.
  • Execution turns are mechanical. Sonnet is sufficient and ~5x cheaper.
  • OpenAI's Operator uses exactly this pattern (o1-preview for planning, gpt-4o for execution).
  • Cursor now exposes the same knob for subagents (May 2026).
  • docker-agent already has this pattern: Toolset.Model for per-toolset routing and model: frontmatter on skills. plan_model on AgentConfig is a natural extension.

plan_mode config values

Value Behaviour
off submit_plan tool available but not enforced. Model calls it voluntarily. (default)
soft Runtime warns in TUI if a side-effecting tool is called without an active approved plan_id. Does not block.
strict Runtime blocks write_file, edit_file, shell, create_directory, and MCP mutating tools until a plan is approved. Enforced at pre_tool_use.

/plan in the TUI activates strict mode for the session regardless of config.


Affected Components

Component Change
pkg/tools/builtin/plan/ New submit_plan tool; session-scoped plan_id state
pkg/tui/ /plan command registration; plan confirmation widget (Approve / Reject / Modify)
pkg/runtime/ plan_mode enforcement at pre_tool_use; plan_model routing for planning turn
pkg/config/latest/types.go PlanMode string + PlanModel string on AgentConfig
agent-schema.json plan_mode and plan_model fields
examples/plan-mode.yaml Working example

Phasing

Phase Scope Effort
1 submit_plan tool: renders plan as formatted message block; no enforcement; model calls voluntarily effort:small
2 /plan TUI toggle; proper confirmation widget (Approve/Reject/Modify); plan_id in session state effort:medium
3 plan_mode: strict enforcement at pre_tool_use; plan_model routing for planning turn effort:large

Phase 1 alone is already a meaningful improvement over instruction-only plan-first.


Alternatives Considered

  • permissions: ask: list — exists today; forces per-call confirmation (N prompts instead of 1 plan). High friction for multi-step tasks.
  • Hook-based enforcementuser_prompt_submit sets a token; pre_tool_use checks it. Stateful shell scripts, fragile, no UX.
  • Instruction-only — current approach; works with capable models but degrades under long context and cannot be toggled.
  • Skill-based plan template — a /plan-first skill providing a structured format. Still a soft control; useful as a complement, not a replacement.

submit_plan + /plan toggle + plan_model is the only combination that gives good UX (one approval per plan, toggleable per-task), optional enforcement (strict mode), and cost-optimised dual-model execution.

Metadata

Metadata

Assignees

Labels

area/agentFor work that has to do with the general agent loop/agentic features of the apparea/toolsFor features/issues/fixes related to the usage of built-in and MCP toolsarea/tuiFor features/issues/fixes related to the TUIeffort:largeCross-cutting concern, complex design, broad codebase knowledge requiredpriority:mediumNormal priority, standard sprint workstatus/needs-designRequires architectural discussion or design review

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions