codev: per-task complexity / per-protocol effort with cross-agent translation (complexity/* labels + harness flag emission)

## Problem

`afx spawn` currently launches the builder with whatever model and effort the local agent CLI defaults to. There's no way for codev to express that a SPIR cycle needs deeper reasoning than an AIR cycle, or that *this specific issue* warrants more thinking than the protocol default. The `HarnessProvider` abstraction at `packages/codev/src/agent-farm/utils/harness.ts` injects only the role / system prompt; effort selection is invisible to codev.

This issue introduces a stable codev-side **complexity** dimension that translates to each agent's native effort flag (Claude's `--effort`, Codex's reasoning effort, Gemini's thinking budget) without binding codev's vocabulary to any one tool.

## Three-layer design

| Layer | Vocabulary | Purpose |
|---|---|---|
| **GitHub label** (issue metadata) | `complexity/<level>` | Describes a property of the task. Pairs with `area/*` (same metadata-on-issue convention). |
| **Codev internal** (config, CLI flag, harness interface) | `complexity` | The stable enum codev passes around. CLI flag `--complexity`, config section `complexity:`, harness API parameter `complexity`. |
| **Per-agent CLI flag** (each harness translates) | each tool's native term | `claude --effort high`, `codex --reasoning-effort high`, `gemini --thinking ...` (exact flags subject to verification at plan-gate). |

The translation happens inside each `HarnessProvider`. Codev's enum stays stable; each provider speaks its own native dialect when invoking the underlying CLI.

### Why `complexity/*` over `effort/*` for the label

Two reasons:

1. **Labels describe issues, not actions.** `complexity/high` is a fact about the work. `effort/high` reads as an instruction to the framework. The first matches the `area/*` precedent (metadata about the issue); the second blurs the line between metadata and directive.
2. **Framework-neutral.** No agent calls the abstract concept "effort" by universal agreement. Claude uses "effort", Codex uses "reasoning effort", Gemini uses "thinking". "Complexity" describes the task itself, which is constant regardless of which agent processes it.

Inside codev's code (config keys, CLI flags, harness parameter names), using `complexity` consistently matches the label and keeps the layered vocabulary clean.

## Resolution chain

When a builder spawns, codev picks the complexity in this order (first match wins):

```
1. CLI flag                  afx spawn 42 --complexity max
2. GitHub label              complexity/<level> on the issue
3. .codev/config.json        complexity.<protocol> override
4. PROTOCOL_DEFAULTS         built-in per-protocol default
```

Per-protocol built-in defaults (recommended starting point):

```ts
const PROTOCOL_DEFAULTS: Record<string, Complexity> = {
  spir:       'high',
  aspir:      'high',
  pir:        'high',
  research:   'high',
  bugfix:     'medium',
  maintain:   'medium',
  experiment: 'medium',
  air:        'low',
};
```

Reasoning:

- SPIR / ASPIR / PIR involve real design calls (plan and review phases) and benefit from depth.
- RESEARCH is multi-agent investigation by nature.
- BUGFIX / MAINTAIN / EXPERIMENT are typically more constrained but can have subtle diagnostic work.
- AIR is by definition well-spec'd and mechanical; deeper reasoning is wasted budget.

## Level granularity

Claude exposes five levels: `low, medium, high, xhigh, max`. The codev enum should match for full coverage:

```ts
export type Complexity = 'low' | 'medium' | 'high' | 'xhigh' | 'max';
```

But for **user-facing labels**, expose only three (`complexity/low`, `complexity/medium`, `complexity/high`). Reserve `xhigh` and `max` for the CLI flag and config override, where they're explicit power-user choices. This keeps the common case discoverable while leaving room for the rare "this design call really wants the deepest reasoning" override.

For agents whose native scale is narrower (suppose Codex offers only `low / medium / high`), the harness clamps: `xhigh` and `max` both map to that agent's highest available setting.

## Implementation surface

Approximately 100-150 LOC across these files:

- **New**: `packages/codev/src/agent-farm/utils/complexity.ts` (the enum + per-protocol policy + label-parser)
- **Modified**: `packages/codev/src/agent-farm/utils/harness.ts` (extend `HarnessProvider` interface with optional `complexity` parameter; update `CLAUDE_HARNESS`, `CODEX_HARNESS`, `GEMINI_HARNESS`, `OPENCODE_HARNESS` to translate)
- **Modified**: `afx spawn` CLI command (new `--complexity` flag)
- **Modified**: `.codev/config.json` schema docs (new `complexity` section)
- **Modified**: spawn pipeline call sites that invoke `harness.buildRoleInjection()` and `harness.buildScriptRoleInjection()` to pass through the resolved complexity
- **New tests**: complexity-policy unit tests; per-harness translation tests; label-parser tests

## Design calls for plan-approval

Real decisions worth pinning at the plan-gate rather than during implementation:

1. **Exact CLI flag and effort-equivalent for each provider.** Claude's `--effort` is verified; Codex's and Gemini's effort-equivalent flags need to be confirmed against their current CLI surfaces. Plan should run `claude --help` / `codex --help` / `gemini --help` and document the exact translation table.
2. **Level mapping when agent scale is narrower than codev's.** Recommended: `xhigh` and `max` both clamp to the agent's highest level. Plan should confirm this is the right policy versus erroring out (which would force the user to pick a level the agent supports).
3. **What happens for an issue with no `complexity/*` label?** Recommended: fall through to the protocol default silently. No warning, no implicit `complexity/medium` auto-applied. Plan should confirm this is the desired UX rather than nagging the user to label.
4. **Should the label appear in the row prefix anywhere?** The Builders tree's row prefix from #952 currently shows `[<area>]`. Adding `[<complexity>]` would crowd it. Recommended: complexity does NOT appear in any tree row prefix; the area prefix is the at-a-glance triage, and complexity matters at spawn time, not at scan time. Plan should confirm.
5. **Architect default**. Should the architect terminal (long-running, design-heavy by nature) always launch at `high` regardless of any per-protocol logic? Recommended yes; the architect's role is more design-heavy on average than any single builder's work. Plan should confirm.
6. **Cost guardrails**. Once teams can spawn at `xhigh` / `max` freely, the bills could surprise. Should `afx spawn` warn before spawning at `max`, or after N high-effort spawns in a day? Lightest version: no guardrail in v1; can add later. Heaviest: an opt-in soft cap in `.codev/config.json`. Plan should pick the v1 floor.
7. **Backward-compat default**. Existing users who don't add the complexity section to their config should see no behavior change. Recommended: codev defaults to a no-op (don't emit any effort flag) until the user explicitly opts in via the config section or a label. This ensures the change is additive and reversible.

## Acceptance

- [ ] `afx spawn 42 --protocol pir --complexity high` passes `--effort high` to Claude (or the appropriate translation for Codex / Gemini / OpenCode).
- [ ] An issue labeled `complexity/high` spawns at high complexity even without the CLI flag.
- [ ] A repo with no `.codev/config.json` `complexity` section and no per-issue label uses the protocol default.
- [ ] Each built-in harness (`claude`, `codex`, `gemini`, `opencode`) emits the correct provider-specific flag for each codev complexity level, verified by harness unit tests.
- [ ] When a level isn't directly supported by an agent's CLI, the harness clamps to the nearest available level (does not crash, does not pass through unrecognised flags).
- [ ] No regression for users who don't configure any complexity policy (existing spawns continue to work without effort-related flags being emitted, per the backward-compat default).

## Out of scope

- **Model selection** (Opus vs Sonnet vs Haiku). That's a separate axis from complexity. Models stay in `shell.builder` / `shell.architect` config as today. A high-complexity spawn on Sonnet is meaningful; mixing the two axes into one knob would be wrong.
- **Per-phase complexity adjustment**. Effort locks at process spawn; switching mid-session isn't possible with the current Claude Code CLI. Complexity is calibrated once to the protocol's hardest phase.
- **A `complexity:critical` or similar fifth label level**. Stick with three labels (low/medium/high) for discoverability. xhigh / max are reachable via the CLI flag for explicit power-user choices.

## Related

- `packages/codev/src/agent-farm/utils/harness.ts` — where the harness providers live (the load-bearing extension point).
- `area/*` label discipline — the precedent this issue follows for the `complexity/*` label family.
- Memory `feedback_respect_harness_abstraction.md` — the principle of routing builder-specific behavior through `HarnessProvider`, which this issue extends rather than bypasses.

Layer	Vocabulary	Purpose
GitHub label (issue metadata)	`complexity/<level>`	Describes a property of the task. Pairs with `area/*` (same metadata-on-issue convention).
Codev internal (config, CLI flag, harness interface)	`complexity`	The stable enum codev passes around. CLI flag `--complexity`, config section `complexity:`, harness API parameter `complexity`.
Per-agent CLI flag (each harness translates)	each tool's native term	`claude --effort high`, `codex --reasoning-effort high`, `gemini --thinking ...` (exact flags subject to verification at plan-gate).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codev: per-task complexity / per-protocol effort with cross-agent translation (complexity/* labels + harness flag emission) #998

Problem

Three-layer design

Why `complexity/` over `effort/` for the label

Resolution chain

Level granularity

Implementation surface

Design calls for plan-approval

Acceptance

Out of scope

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

codev: per-task complexity / per-protocol effort with cross-agent translation (complexity/* labels + harness flag emission) #998

Description

Problem

Three-layer design

Why complexity/* over effort/* for the label

Resolution chain

Level granularity

Implementation surface

Design calls for plan-approval

Acceptance

Out of scope

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Why `complexity/` over `effort/` for the label