You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/freebuff-waiting-room.md
+3-18Lines changed: 3 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,22 +8,14 @@ The waiting room is the admission control layer for **free-mode** requests again
8
8
2.**Gate on per-deployment health and hours** — a single fleet probe per tick (`getFleetHealth` in `web/src/server/free-session/fireworks-health.ts`) hits the Fireworks metrics endpoint and classifies each dedicated deployment as `healthy | degraded | unhealthy`. Only models whose deployment is `healthy` and currently available admit that tick; GLM 5.1 is available during 9am ET-5pm PT on weekdays, while MiniMax M2.7 is serverless and always available.
9
9
3.**One instance per account** — prevent a single user from running N concurrent freebuff CLIs to get N× throughput.
10
10
11
-
Users who cannot be admitted immediately are placed in the queue for their chosen model and given an estimated wait time. Admitted users get a fixed-length session (default 1h) bound to the model they were admitted on; chat completions use that model for the life of the session.
11
+
Users who cannot be admitted immediately are placed in the queue for their chosen model and given an estimated wait time. With the current high instant-admit capacities, most users go straight from model selection to an active session; the queue only appears when a model is actually saturated. Admitted users get a fixed-length session (default 1h) bound to the model they were admitted on; chat completions use that model for the life of the session.
12
12
13
-
The entire system is gated by the env flag `FREEBUFF_WAITING_ROOM_ENABLED`. When `false`, the gate is a no-op and the admission ticker does not start; free-mode traffic flows through unchanged.
14
-
15
-
## Kill Switch
13
+
## Configuration
16
14
17
15
```bash
18
-
# Disable entirely (both the gate on chat/completions and the admission loop)
19
-
FREEBUFF_WAITING_ROOM_ENABLED=false
20
-
21
-
# Other knob (only read when enabled)
22
16
FREEBUFF_SESSION_LENGTH_MS=3600000 # 1 hour
23
17
```
24
18
25
-
Flipping the flag is safe at runtime: existing rows stay in the DB and will be admitted / expired correctly whenever the flag is flipped back on.
26
-
27
19
## Architecture
28
20
29
21
```mermaid
@@ -186,9 +178,6 @@ Before any of those state transitions, the handler requires a resolved allowlist
186
178
Response shapes:
187
179
188
180
```jsonc
189
-
// Waiting room disabled — CLI should treat this as "always admitted"
190
-
{ "status":"disabled" }
191
-
192
181
// In queue
193
182
{
194
183
"status":"queued",
@@ -272,9 +261,7 @@ For free-mode requests (`codebuff_metadata.cost_mode === 'free'`), `_post.ts` ca
272
261
| 409 |`session_superseded`| Claimed `instance_id` does not match stored one — another CLI took over. |
273
262
| 410 |`session_expired`|`expires_at + grace < now()` (past the hard cutoff). Client should POST /session to re-queue. |
274
263
275
-
Successful results carry one of three reasons: `disabled` (gate is off), `active` (`expires_at > now()`, `remainingMs` provided), or `draining` (`expires_at <= now() < expires_at + grace`, `gracePeriodRemainingMs` provided). The CLI should treat `draining` as "let any in-flight agent run finish, but block new user prompts" — see [Drain / Grace Window](#drain--grace-window) below. The corresponding wire status from `getSessionState` is `ended`.
276
-
277
-
When the waiting room is disabled, the gate returns `{ ok: true, reason: 'disabled' }` without touching the DB.
264
+
Successful results carry one of two reasons: `active` (`expires_at > now()`, `remainingMs` provided), or `draining` (`expires_at <= now() < expires_at + grace`, `gracePeriodRemainingMs` provided). The CLI should treat `draining` as "let any in-flight agent run finish, but block new user prompts" — see [Drain / Grace Window](#drain--grace-window) below. The corresponding wire status from `getSessionState` is `ended`.
278
265
279
266
## Drain / Grace Window
280
267
@@ -314,8 +301,6 @@ The CLI:
314
301
8.**Handles chat-gate errors:** the same statuses are reachable via the gate's 409/410/428/429 for fast in-flight feedback, and the CLI calls the matching `markFreebuff*` helper to flip local state without waiting for the next poll.
315
302
9.**On clean exit**, calls `DELETE /api/v1/freebuff/session` so the next user can be admitted sooner.
316
303
317
-
The `disabled` response means the server has the waiting room turned off. CLI treats it identically to `active` with infinite remaining time — no countdown, and chat requests can omit `freebuff_instance_id` entirely.
318
-
319
304
## Multi-pod Behavior
320
305
321
306
-**`/api/v1/freebuff/session` routes** are stateless per pod; all state lives in Postgres. Any pod can serve any request.
0 commit comments