nnemirovsky · nnemirovsky · May 18, 2026 · May 18, 2026 · May 18, 2026 · May 18, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -191,7 +191,7 @@ Auto-failover on 429/401 is primary; `pool rotate` is an override.
 
 - **Single chokepoint (I2):** every `binding.Credential` / `OAuthIndex.Has` / `extractInjectableSecret` / persist consumer on the HTTP/HTTPS OAuth path routes through `PoolResolver.ResolveActive` (`resolveInjectionTarget` for pass-1 header + pass-2 swap; `resolveOAuthResponseAttribution` for response/persist). `idx.Has` is always called with the resolved member, never the pool. Plain credentials pass through unchanged; SSH/mail are non-OAuth, out of scope.
 - **QUIC scope:** the HTTP/1.x/HTTP/2 MITM addon implements the full feature set (R1, R3, Phase 2). The HTTP/3/QUIC path (`QUICProxy.buildPhantomPairs`, binding-header injection in `quic.go`) is a request-side buffered swap with **no response-side OAuth interception**, but IS pool-aware on the request side: `QUICProxy.resolvePoolTarget` (via `NewQUICProxy`'s `poolResolver`) selects the active member's real secret and routes through `buildPooledOAuthPhantomPairs` so the access phantom is the same pool-stable JWT (R3 holds over QUIC). QUIC does **not** do R1 attribution or Phase 2 failover — the injected member is whatever the HTTP path / `pool rotate` last made active, and a QUIC-only 429/401 or refresh is not acted on. Deployments needing R1/auto-failover must route the pooled upstream over HTTP/HTTPS.
-- **Active-member selection:** healthy or expired-cooldown members first, by configured position; if all are in cooldown, the soonest-recovering is returned with a WARNING (degrade, never hard-fail). Recovery is lazy (evaluated in `ResolveActive`, no scheduler).
+- **Active-member selection (sticky, no "main" account):** `ResolveActive` is sticky. There is no position-0 "main": once selection settles on a member sluice keeps returning it while it is healthy, even if a lower-position member recovers from cooldown. A new active member is chosen only when the current one cools. The chosen member is the next eligible one by position **starting after the current member and wrapping** (advance forward, never snap back to position 0). The sticky current-active pointer lives on the shared swap-surviving `PoolHealth` under the same mutex as the cooldown map, so it survives `NewPoolResolverShared` regeneration and atomic pointer swaps (CRITICAL-1) and a stale resolver generation cannot clobber it, exactly like cooldowns. This kills the live flap where a 60s `RateLimitCooldown` lapse re-selected an upstream-exhausted account and respammed `cred_failover` plus a Telegram notice every cooldown window. If every member is cooling, the existing degrade applies (operator-parked-but-healthy `ManualRotateReason` first, else soonest-recovering, with a WARNING, never hard-fail) and the sticky pointer is left untouched so a recovery advances forward rather than snapping to position 0. `sluice pool rotate` still works: it parks the active member, so the next `ResolveActive` advances to the next member and then stays there (no snap-back). Recovery is lazy (evaluated in `ResolveActive`, no scheduler). A selectable position-priority-vs-sticky strategy mode is a possible follow-up and is out of scope here.
 - **R1 refresh-token attribution / fail-closed:** when pass-2 swaps `SLUICE_PHANTOM:<pool>.refresh`, sluice records `realRefreshToken -> member` in a short-TTL map; on the token-endpoint response it recovers the member by that real refresh token and persists to it (`persistAddonOAuthTokens(member, ...)`, singleflight `"persist:"+member`). The join key is the real **refresh** token — never the access token, connection, or `OAuthIndex.Match` (two pooled members share `auth.openai.com`'s token URL and collide). Unrecoverable -> WARNING + skip the write (rotating refresh tokens are single-use; a mis-attributed write bricks both accounts). **Plain-credential disambiguation:** a plain OAuth credential sharing a pool's token URL also tags its injected refresh token `realRefreshToken -> <plain name>` (plain path in `buildPhantomPairs`/`buildOAuthPhantomPairs`'s `onRefreshInject`, incl. split-host expansion); on response a recovered non-member (`PoolForMember == ""`) is attributed 1:1 to that plain credential, NOT fail-closed as pooled. The pooled fail-closed path applies only when recovery fails or resolves to an actual member; `poolForResponse` gates the same on an independent `flowInjected` tag (set post-swap only if a pool phantom was present) before cooling a member.
 - **R3 pool-stable phantom JWT:** Codex access tokens are JWTs; the per-real-token `resignJWT` would emit a different phantom after each cross-member refresh, breaking "agent never notices". `poolStablePhantomAccess` (`internal/proxy/oauth_response.go`) builds the phantom JWT from a deterministic synthetic payload keyed on the **pool name** (`sub: sluice-pool:<pool>`, `iss: sluice-phantom`, fixed far-future `exp`, no `iat`), HMAC-SHA256 with the fixed key — byte-identical across switches, structurally valid. Pool name is JSON-marshaled (never concatenated) so quotes/control chars can't inject claims. Static-form fallback (`SLUICE_PHANTOM:<pool>.access`) only on the unreachable `json.Marshal` failure. Refresh phantom stays static `SLUICE_PHANTOM:<pool>.refresh`.
 

diff --git a/docs/plans/20260518-sticky-failover.md b/docs/plans/20260518-sticky-failover.md
@@ -0,0 +1,103 @@
+# Sticky Pool Failover (no "main" account)
+
+## Problem
+
+Live on knuth: `pool openai_pool failed over openai_oauth -> openai_oauth_2 (429)` repeats
+every few minutes (20+ messages). Hermes keeps working. Root cause is a flap:
+
+- `openai_oauth` is position 0 (the de-facto "main"). Its OpenAI quota is exhausted, so it
+  429s on every request.
+- On 429 sluice fails over to `openai_oauth_2` and parks `openai_oauth` for only
+  `vault.RateLimitCooldown` = 60s.
+- `PoolResolver.ResolveActive` picks the **first member in position order** that is healthy
+  or whose cooldown expired. 60s later `openai_oauth`'s cooldown lapses, so it is re-selected
+  even though `openai_oauth_2` was serving fine.
+- `openai_oauth` is still quota-exhausted upstream -> 429 -> failover again -> another
+  identical Telegram notice. Repeats on every request gap > 60s.
+
+The real OpenAI Codex quota window is hours/weekly, so re-probing the exhausted account
+every 60s and snapping back to it is wrong.
+
+## Desired behavior (user decision)
+
+Sticky failover: there is no "main" account. Stay on whichever member is currently active
+until **that** member exhausts, then advance to the next member. A lower-position member
+recovering from cooldown must NOT cause a switch back to it. A new failover (and its single
+Telegram notice) happens only on a genuine exhaustion transition, not on cooldown lapse.
+
+Mode toggle (position-priority vs sticky as selectable strategies) is explicitly out of
+scope for this change; sticky becomes the behavior. Note it as a possible follow-up.
+
+## Design
+
+`ResolveActive` is the single source of truth: `internal/proxy/pool_failover.go` calls
+`pr.ResolveActive(pool)` to compute the new active ("to") after cooling the old member, so
+making selection sticky fixes the flap, the failover events, and the notification spam in
+one place.
+
+1. Add a per-pool current-active pointer to the swap-surviving shared `PoolHealth`
+   (`internal/vault/pool.go`), guarded by the same mutex as the cooldown map, so it
+   survives `NewPoolResolverShared` regeneration and atomic swaps (same lifecycle as
+   cooldowns; carry it the way cooldowns are carried).
+2. Sticky `ResolveActive(pool)`:
+   - Let `cur` = shared current-active for the pool, if set and still a member of this
+     generation.
+   - If `cur` is set and healthy (no active cooldown) -> return `cur` (sticky hold).
+   - Else pick the next eligible member by position **starting after `cur`'s position and
+     wrapping** (so we advance forward, never snap back to position 0); skip members in
+     active cooldown; treat `ManualRotateReason` parks with the existing
+     better-degrade-target semantics. Set shared current-active to the picked member and
+     return it.
+   - If every member is cooling: keep the existing degrade behavior (operator-parked-but-
+     healthy first, else soonest-recovering) and do NOT move the sticky pointer, so when a
+     member recovers the next call advances to a healthy one rather than the absolute
+     position-0 member.
+3. Operator `sluice pool rotate` semantics unchanged at the surface: it still parks the
+   current active so the next `ResolveActive` advances; with sticky that advance lands on
+   the next member and stays there (no snap-back), which is the intended rotate behavior.
+4. Keep all existing concurrency invariants: sticky pointer reads/writes under the same
+   `PoolHealth.mu`; never lost across atomic pointer swap; a stale resolver generation must
+   not clobber it (mirror the cooldown CRITICAL-1 handling).
+
+## Out of scope
+
+- Strategy/mode toggle across CLI/REST/Telegram + schema (`credential_pools.strategy`) —
+  follow-up only; note the synergy, do not build.
+- Honoring upstream `Retry-After` / changing `RateLimitCooldown` constant — sticky makes
+  the short cooldown harmless (we no longer re-probe the cooled member until forced), so a
+  cooldown-duration change is unnecessary for this fix.
+
+## Testing strategy
+
+- Unit (vault): sticky hold — active member stays selected across many `ResolveActive`
+  calls while a lower-position member is healthy.
+- Unit (vault): flap regression — fail member A (cooldown), `ResolveActive` returns B; A's
+  cooldown expires; `ResolveActive` still returns B (no snap-back). B then cools ->
+  advance to next, wrapping.
+- Unit (vault): all-cooling degrade unchanged; operator-parked degrade-target preserved.
+- Unit (vault): sticky pointer survives `NewPoolResolverShared` regeneration + swap, and a
+  stale generation cannot clobber it (extend the existing CRITICAL-1 style test).
+- Unit (proxy): the failover path emits exactly ONE `cred_failover` + one notice per real
+  exhaustion transition, and emits NOTHING when a non-active member's cooldown merely
+  lapses (the spam regression, fail-before/pass-after).
+- Full `go test ./...`, `-race` on `internal/vault` and `internal/proxy`, gofumpt,
+  golangci-lint, `go vet ./...` and `-tags=e2e ./e2e/`.
+
+## Steps
+
+### Task 1: Sticky selection in vault.PoolResolver
+- [x] Add swap-surviving per-pool current-active to shared `PoolHealth` (same mutex)
+- [x] Rewrite `ResolveActive` to the sticky algorithm above; preserve degrade + parked semantics
+- [x] Preserve CRITICAL-1 invariants (no loss/clobber across swap; stale generation safe)
+- [x] Unit tests: sticky hold, flap regression (no snap-back), advance+wrap, degrade unchanged, swap-survival
+- [x] `go test ./internal/vault/ -race`, gofumpt, vet
+
+### Task 2: Failover path + notification spam regression
+- [x] Confirm `pool_failover.go` from->to now changes only on real exhaustion (sticky source of truth); adjust only if it bypasses `ResolveActive`
+- [x] Test: one `cred_failover`+notice per real transition; zero events when a non-active member's cooldown lapses (fail-before/pass-after)
+- [x] `go test ./internal/proxy/ -race`, gofumpt, vet
+
+### Task 3: Docs + final validation
+- [x] Update CLAUDE.md credential-pools section to describe sticky selection (replace the position-priority wording) and note the mode-toggle follow-up
+- [x] `gofumpt -l` clean; `golangci-lint run ./...` 0 issues; full `go test ./...`; `go vet ./...`; `go vet -tags=e2e ./e2e/`
+- [x] Independently verify committed HEAD builds and tests pass (do not trust subagent green)
diff --git a/internal/proxy/pool_failover_test.go b/internal/proxy/pool_failover_test.go
@@ -193,11 +193,23 @@ func TestFailoverCooldownTTLAndLazyRecovery(t *testing.T) {
 		t.Fatalf("auth-fail cooldown TTL = %v, want ~%v", gotTTL, vault.AuthFailCooldown)
 	}
 
-	// Lazy recovery: force the cooldown to the past; ResolveActive must
-	// treat memA as eligible again with no background scheduler involved.
+	// Lazy recovery: force the cooldown to the past. memA becomes ELIGIBLE
+	// again with no background scheduler involved (CooldownUntil reports it
+	// is no longer cooling). Selection is STICKY, so memB - which we failed
+	// over to and which is healthy - remains active: a recovered lower-
+	// position member must NOT snap back (the flap fix). memA's eligibility
+	// only matters as a future advance target if memB later exhausts.
 	pr.MarkCooldown("memA", time.Now().Add(-time.Second), "expired")
+	if _, cooling := pr.CooldownUntil("memA"); cooling {
+		t.Fatal("after expiry memA must no longer be cooling (lazy recovery, no scheduler)")
+	}
+	if active, _ := pr.ResolveActive("codex_pool"); active != "memB" {
+		t.Fatalf("after expiry active = %q, want memB (sticky: recovered memA must NOT snap back)", active)
+	}
+	// Confirm memA IS the advance target once memB exhausts (wrap forward).
+	pr.MarkCooldown("memB", time.Now().Add(vault.RateLimitCooldown), "429")
 	if active, _ := pr.ResolveActive("codex_pool"); active != "memA" {
-		t.Fatalf("after expiry active = %q, want memA (lazy recovery)", active)
+		t.Fatalf("after memB exhausts active = %q, want memA (advance forward to recovered member)", active)
 	}
 }
 
@@ -346,6 +358,95 @@ func TestFailoverAuditEvent(t *testing.T) {
 	}
 }
 
+// TestStickyFailoverNoSpamOnNonActiveCooldownLapse is the notification-spam
+// regression for the live knuth flap. memA (position 0) is upstream-
+// exhausted. Sequence:
+//
+//  1. memA 429 -> exactly ONE cred_failover (memA->memB) + one notice.
+//  2. memA's short cooldown lapses. Under the OLD position-priority
+//     ResolveActive, the next request's injection would re-select memA
+//     (lower position, no longer cooling), hit the still-exhausted upstream,
+//     429 again, and emit ANOTHER identical failover + Telegram notice -
+//     repeating every cooldown window (the 20+ message spam).
+//  3. With sticky selection memB stays active after memA recovers, so the
+//     subsequent request resolves to memB, succeeds (200), and produces NO
+//     additional failover event or notice.
+//
+// Fail-before: step 3 would record a 2nd cred_failover (the flap).
+// Pass-after: exactly one cred_failover total, one notice total.
+func TestStickyFailoverNoSpamOnNonActiveCooldownLapse(t *testing.T) {
+	dir := t.TempDir()
+	logPath := filepath.Join(dir, "audit.log")
+	logger, err := audit.NewFileLogger(logPath)
+	if err != nil {
+		t.Fatalf("NewFileLogger: %v", err)
+	}
+	t.Cleanup(func() { _ = logger.Close() })
+
+	addon, _, prPtr := setupPoolAddon(t, "memA", "memB")
+	addon.auditLog = logger
+	client := setupAddonConn(addon, "auth.example.com:443")
+	pr := prPtr.Load()
+
+	var notices int
+	addon.SetOnFailover(func(FailoverEvent) { notices++ })
+
+	if got, _ := pr.ResolveActive("codex_pool"); got != "memA" {
+		t.Fatalf("pre = %q, want memA", got)
+	}
+
+	// 1. memA exhausts -> one real failover to memB.
+	f1 := newPoolRespFlow(client, 429, []byte(`{"error":"rate_limited"}`))
+	addon.flowInjected.Tag(f1.Id, "memA")
+	addon.Response(f1)
+	if got, _ := pr.ResolveActive("codex_pool"); got != "memB" {
+		t.Fatalf("after 429 active = %q, want memB", got)
+	}
+
+	// 2. memA's short cooldown lapses (still upstream-exhausted in reality).
+	pr.MarkCooldown("memA", time.Now().Add(-time.Second), "expired")
+
+	// The next request's injection resolves the pool. STICKY: it must stay
+	// on memB, NOT snap back to the recovered-but-still-exhausted memA. This
+	// is the single source of truth that kills the flap.
+	if got, _ := pr.ResolveActive("codex_pool"); got != "memB" {
+		t.Fatalf("FLAP: after memA cooldown lapse, injection resolves %q, want memB", got)
+	}
+
+	// 3. The subsequent legitimate request therefore hits memB and succeeds;
+	// a 200 is a documented no-op (no failover, no notice).
+	f2 := newPoolRespFlow(client, 200, []byte(`{"ok":true}`))
+	addon.flowInjected.Tag(f2.Id, "memB")
+	addon.Response(f2)
+
+	if err := logger.Close(); err != nil {
+		t.Fatalf("logger close: %v", err)
+	}
+	data, err := os.ReadFile(logPath)
+	if err != nil {
+		t.Fatalf("read audit log: %v", err)
+	}
+	failovers := 0
+	for _, line := range strings.Split(strings.TrimSpace(string(data)), "\n") {
+		if line == "" {
+			continue
+		}
+		var evt audit.Event
+		if uerr := json.Unmarshal([]byte(line), &evt); uerr != nil {
+			t.Fatalf("unmarshal %q: %v", line, uerr)
+		}
+		if evt.Action == "cred_failover" || evt.Action == "pool_exhausted" {
+			failovers++
+		}
+	}
+	if failovers != 1 {
+		t.Fatalf("got %d failover audit events, want exactly 1 (sticky must not flap on non-active cooldown lapse)\n%s", failovers, data)
+	}
+	if notices != 1 {
+		t.Fatalf("got %d failover notices, want exactly 1 (no Telegram spam on cooldown lapse)", notices)
+	}
+}
+
 // TestPoolForResponseResolvesActiveMember sanity-checks the destination ->
 // pool reverse mapping used by handlePoolFailover.
 func TestPoolForResponseResolvesActiveMember(t *testing.T) {