Caller-controlled model and max_tokens enables unlimited OpenAI cost abuse

Severity : Medium
CVSS     : 6.5 (AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H)
Endpoint : https://vectorbase.dev/api/v1/chat


Caller-controlled model and max_tokens enables unlimited OpenAI cost abuse
--------------------------------------------------------------------------

Summary
-------
The `/api/v1/chat` endpoint deserializes `model` and `max_tokens` directly from the request body with no server-side allowlist or cap. Any API key holder can specify arbitrarily expensive OpenAI models (e.g., `o1-preview`, `gpt-4`) and unrestricted token counts, billing the service operator for the full cost with no rate limit that holds in multi-instance deployments.

Steps / PoC
-----------
1. Send a chat request with an expensive model override:

```
curl -s -X POST https://vectorbase.dev/api/v1/chat \
  -H "Authorization: Bearer <vb_sk_...>" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Write a 10,000-word essay on AI."}],
    "model": "gpt-4",
    "max_tokens": 50000
  }'
```

Expected: the server calls OpenAI with `model: "gpt-4"` and `max_tokens: 50000` instead of the default `gpt-4o-mini`/1000, incurring ~50x the intended cost per request.

2. Root cause in `src/app/api/v1/chat/route.ts`:

```typescript
const {
  messages,
  session_id,
  stream = true,
  model = 'gpt-4o-mini',
  temperature = 0.7,
  max_tokens = 1000
} = body

// model and max_tokens passed directly to OpenAI call — no allowlist, no cap
const response = await openai.chat.completions.create({
  model,
  messages: systemMessages,
  stream,
  temperature,
  max_tokens,
})
```

3. Compound factor — in-process rate limiter is per-instance (`Map<string, RateLimitEntry>` in module scope). On Vercel or any serverless/multi-container deployment, each instance has an independent store, so the 60 req/min limit is easily bypassed by spreading requests across instances, making cost abuse unlimited in practice.

Impact
------
Any API key holder can exhaust the operator's OpenAI budget by specifying expensive models and high token counts; repeated at scale this forces the operator into OpenAI overage or service shutdown.

Fix
---
1. Validate `model` against an explicit allowlist (e.g., `['gpt-4o-mini', 'gpt-4o']`); reject unknown values.
2. Cap `max_tokens` server-side at the plan limit; ignore caller-supplied values above the cap.
3. Replace the in-process Map rate limiter with a shared store (Redis, Upstash, Supabase) so limits hold across instances.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caller-controlled model and max_tokens enables unlimited OpenAI cost abuse #2