You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
afx spawn currently launches the builder with whatever model and effort the local agent CLI defaults to. There's no way for codev to express that a SPIR cycle needs deeper reasoning than an AIR cycle, or that this specific issue warrants more thinking than the protocol default. The HarnessProvider abstraction at packages/codev/src/agent-farm/utils/harness.ts injects only the role / system prompt; effort selection is invisible to codev.
This issue introduces a stable codev-side complexity dimension that translates to each agent's native effort flag (Claude's --effort, Codex's reasoning effort, Gemini's thinking budget) without binding codev's vocabulary to any one tool.
Three-layer design
Layer
Vocabulary
Purpose
GitHub label (issue metadata)
complexity/<level>
Describes a property of the task. Pairs with area/* (same metadata-on-issue convention).
The stable enum codev passes around. CLI flag --complexity, config section complexity:, harness API parameter complexity.
Per-agent CLI flag (each harness translates)
each tool's native term
claude --effort high, codex --reasoning-effort high, gemini --thinking ... (exact flags subject to verification at plan-gate).
The translation happens inside each HarnessProvider. Codev's enum stays stable; each provider speaks its own native dialect when invoking the underlying CLI.
Why complexity/* over effort/* for the label
Two reasons:
Labels describe issues, not actions.complexity/high is a fact about the work. effort/high reads as an instruction to the framework. The first matches the area/* precedent (metadata about the issue); the second blurs the line between metadata and directive.
Framework-neutral. No agent calls the abstract concept "effort" by universal agreement. Claude uses "effort", Codex uses "reasoning effort", Gemini uses "thinking". "Complexity" describes the task itself, which is constant regardless of which agent processes it.
Inside codev's code (config keys, CLI flags, harness parameter names), using complexity consistently matches the label and keeps the layered vocabulary clean.
Resolution chain
When a builder spawns, codev picks the complexity in this order (first match wins):
1. CLI flag afx spawn 42 --complexity max
2. GitHub label complexity/<level> on the issue
3. .codev/config.json complexity.<protocol> override
4. PROTOCOL_DEFAULTS built-in per-protocol default
But for user-facing labels, expose only three (complexity/low, complexity/medium, complexity/high). Reserve xhigh and max for the CLI flag and config override, where they're explicit power-user choices. This keeps the common case discoverable while leaving room for the rare "this design call really wants the deepest reasoning" override.
For agents whose native scale is narrower (suppose Codex offers only low / medium / high), the harness clamps: xhigh and max both map to that agent's highest available setting.
Implementation surface
Approximately 100-150 LOC across these files:
New: packages/codev/src/agent-farm/utils/complexity.ts (the enum + per-protocol policy + label-parser)
Modified: packages/codev/src/agent-farm/utils/harness.ts (extend HarnessProvider interface with optional complexity parameter; update CLAUDE_HARNESS, CODEX_HARNESS, GEMINI_HARNESS, OPENCODE_HARNESS to translate)
Modified: afx spawn CLI command (new --complexity flag)
Modified: .codev/config.json schema docs (new complexity section)
Modified: spawn pipeline call sites that invoke harness.buildRoleInjection() and harness.buildScriptRoleInjection() to pass through the resolved complexity
New tests: complexity-policy unit tests; per-harness translation tests; label-parser tests
Design calls for plan-approval
Real decisions worth pinning at the plan-gate rather than during implementation:
Exact CLI flag and effort-equivalent for each provider. Claude's --effort is verified; Codex's and Gemini's effort-equivalent flags need to be confirmed against their current CLI surfaces. Plan should run claude --help / codex --help / gemini --help and document the exact translation table.
Level mapping when agent scale is narrower than codev's. Recommended: xhigh and max both clamp to the agent's highest level. Plan should confirm this is the right policy versus erroring out (which would force the user to pick a level the agent supports).
What happens for an issue with no complexity/* label? Recommended: fall through to the protocol default silently. No warning, no implicit complexity/medium auto-applied. Plan should confirm this is the desired UX rather than nagging the user to label.
Architect default. Should the architect terminal (long-running, design-heavy by nature) always launch at high regardless of any per-protocol logic? Recommended yes; the architect's role is more design-heavy on average than any single builder's work. Plan should confirm.
Cost guardrails. Once teams can spawn at xhigh / max freely, the bills could surprise. Should afx spawn warn before spawning at max, or after N high-effort spawns in a day? Lightest version: no guardrail in v1; can add later. Heaviest: an opt-in soft cap in .codev/config.json. Plan should pick the v1 floor.
Backward-compat default. Existing users who don't add the complexity section to their config should see no behavior change. Recommended: codev defaults to a no-op (don't emit any effort flag) until the user explicitly opts in via the config section or a label. This ensures the change is additive and reversible.
Acceptance
afx spawn 42 --protocol pir --complexity high passes --effort high to Claude (or the appropriate translation for Codex / Gemini / OpenCode).
An issue labeled complexity/high spawns at high complexity even without the CLI flag.
A repo with no .codev/config.jsoncomplexity section and no per-issue label uses the protocol default.
Each built-in harness (claude, codex, gemini, opencode) emits the correct provider-specific flag for each codev complexity level, verified by harness unit tests.
When a level isn't directly supported by an agent's CLI, the harness clamps to the nearest available level (does not crash, does not pass through unrecognised flags).
No regression for users who don't configure any complexity policy (existing spawns continue to work without effort-related flags being emitted, per the backward-compat default).
Out of scope
Model selection (Opus vs Sonnet vs Haiku). That's a separate axis from complexity. Models stay in shell.builder / shell.architect config as today. A high-complexity spawn on Sonnet is meaningful; mixing the two axes into one knob would be wrong.
Per-phase complexity adjustment. Effort locks at process spawn; switching mid-session isn't possible with the current Claude Code CLI. Complexity is calibrated once to the protocol's hardest phase.
A complexity:critical or similar fifth label level. Stick with three labels (low/medium/high) for discoverability. xhigh / max are reachable via the CLI flag for explicit power-user choices.
Related
packages/codev/src/agent-farm/utils/harness.ts — where the harness providers live (the load-bearing extension point).
area/* label discipline — the precedent this issue follows for the complexity/* label family.
Memory feedback_respect_harness_abstraction.md — the principle of routing builder-specific behavior through HarnessProvider, which this issue extends rather than bypasses.
Problem
afx spawncurrently launches the builder with whatever model and effort the local agent CLI defaults to. There's no way for codev to express that a SPIR cycle needs deeper reasoning than an AIR cycle, or that this specific issue warrants more thinking than the protocol default. TheHarnessProviderabstraction atpackages/codev/src/agent-farm/utils/harness.tsinjects only the role / system prompt; effort selection is invisible to codev.This issue introduces a stable codev-side complexity dimension that translates to each agent's native effort flag (Claude's
--effort, Codex's reasoning effort, Gemini's thinking budget) without binding codev's vocabulary to any one tool.Three-layer design
complexity/<level>area/*(same metadata-on-issue convention).complexity--complexity, config sectioncomplexity:, harness API parametercomplexity.claude --effort high,codex --reasoning-effort high,gemini --thinking ...(exact flags subject to verification at plan-gate).The translation happens inside each
HarnessProvider. Codev's enum stays stable; each provider speaks its own native dialect when invoking the underlying CLI.Why
complexity/*overeffort/*for the labelTwo reasons:
complexity/highis a fact about the work.effort/highreads as an instruction to the framework. The first matches thearea/*precedent (metadata about the issue); the second blurs the line between metadata and directive.Inside codev's code (config keys, CLI flags, harness parameter names), using
complexityconsistently matches the label and keeps the layered vocabulary clean.Resolution chain
When a builder spawns, codev picks the complexity in this order (first match wins):
Per-protocol built-in defaults (recommended starting point):
Reasoning:
Level granularity
Claude exposes five levels:
low, medium, high, xhigh, max. The codev enum should match for full coverage:But for user-facing labels, expose only three (
complexity/low,complexity/medium,complexity/high). Reservexhighandmaxfor the CLI flag and config override, where they're explicit power-user choices. This keeps the common case discoverable while leaving room for the rare "this design call really wants the deepest reasoning" override.For agents whose native scale is narrower (suppose Codex offers only
low / medium / high), the harness clamps:xhighandmaxboth map to that agent's highest available setting.Implementation surface
Approximately 100-150 LOC across these files:
packages/codev/src/agent-farm/utils/complexity.ts(the enum + per-protocol policy + label-parser)packages/codev/src/agent-farm/utils/harness.ts(extendHarnessProviderinterface with optionalcomplexityparameter; updateCLAUDE_HARNESS,CODEX_HARNESS,GEMINI_HARNESS,OPENCODE_HARNESSto translate)afx spawnCLI command (new--complexityflag).codev/config.jsonschema docs (newcomplexitysection)harness.buildRoleInjection()andharness.buildScriptRoleInjection()to pass through the resolved complexityDesign calls for plan-approval
Real decisions worth pinning at the plan-gate rather than during implementation:
--effortis verified; Codex's and Gemini's effort-equivalent flags need to be confirmed against their current CLI surfaces. Plan should runclaude --help/codex --help/gemini --helpand document the exact translation table.xhighandmaxboth clamp to the agent's highest level. Plan should confirm this is the right policy versus erroring out (which would force the user to pick a level the agent supports).complexity/*label? Recommended: fall through to the protocol default silently. No warning, no implicitcomplexity/mediumauto-applied. Plan should confirm this is the desired UX rather than nagging the user to label.[<area>]. Adding[<complexity>]would crowd it. Recommended: complexity does NOT appear in any tree row prefix; the area prefix is the at-a-glance triage, and complexity matters at spawn time, not at scan time. Plan should confirm.highregardless of any per-protocol logic? Recommended yes; the architect's role is more design-heavy on average than any single builder's work. Plan should confirm.xhigh/maxfreely, the bills could surprise. Shouldafx spawnwarn before spawning atmax, or after N high-effort spawns in a day? Lightest version: no guardrail in v1; can add later. Heaviest: an opt-in soft cap in.codev/config.json. Plan should pick the v1 floor.Acceptance
afx spawn 42 --protocol pir --complexity highpasses--effort highto Claude (or the appropriate translation for Codex / Gemini / OpenCode).complexity/highspawns at high complexity even without the CLI flag..codev/config.jsoncomplexitysection and no per-issue label uses the protocol default.claude,codex,gemini,opencode) emits the correct provider-specific flag for each codev complexity level, verified by harness unit tests.Out of scope
shell.builder/shell.architectconfig as today. A high-complexity spawn on Sonnet is meaningful; mixing the two axes into one knob would be wrong.complexity:criticalor similar fifth label level. Stick with three labels (low/medium/high) for discoverability. xhigh / max are reachable via the CLI flag for explicit power-user choices.Related
packages/codev/src/agent-farm/utils/harness.ts— where the harness providers live (the load-bearing extension point).area/*label discipline — the precedent this issue follows for thecomplexity/*label family.feedback_respect_harness_abstraction.md— the principle of routing builder-specific behavior throughHarnessProvider, which this issue extends rather than bypasses.