Skip to content

Caller-controlled model and max_tokens enables unlimited OpenAI cost abuse #2

@noobx123

Description

@noobx123

Severity : Medium
CVSS : 6.5 (AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H)
Endpoint : https://vectorbase.dev/api/v1/chat

Caller-controlled model and max_tokens enables unlimited OpenAI cost abuse

Summary

The /api/v1/chat endpoint deserializes model and max_tokens directly from the request body with no server-side allowlist or cap. Any API key holder can specify arbitrarily expensive OpenAI models (e.g., o1-preview, gpt-4) and unrestricted token counts, billing the service operator for the full cost with no rate limit that holds in multi-instance deployments.

Steps / PoC

  1. Send a chat request with an expensive model override:
curl -s -X POST https://vectorbase.dev/api/v1/chat \
  -H "Authorization: Bearer <vb_sk_...>" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Write a 10,000-word essay on AI."}],
    "model": "gpt-4",
    "max_tokens": 50000
  }'

Expected: the server calls OpenAI with model: "gpt-4" and max_tokens: 50000 instead of the default gpt-4o-mini/1000, incurring ~50x the intended cost per request.

  1. Root cause in src/app/api/v1/chat/route.ts:
const {
  messages,
  session_id,
  stream = true,
  model = 'gpt-4o-mini',
  temperature = 0.7,
  max_tokens = 1000
} = body

// model and max_tokens passed directly to OpenAI call — no allowlist, no cap
const response = await openai.chat.completions.create({
  model,
  messages: systemMessages,
  stream,
  temperature,
  max_tokens,
})
  1. Compound factor — in-process rate limiter is per-instance (Map<string, RateLimitEntry> in module scope). On Vercel or any serverless/multi-container deployment, each instance has an independent store, so the 60 req/min limit is easily bypassed by spreading requests across instances, making cost abuse unlimited in practice.

Impact

Any API key holder can exhaust the operator's OpenAI budget by specifying expensive models and high token counts; repeated at scale this forces the operator into OpenAI overage or service shutdown.

Fix

  1. Validate model against an explicit allowlist (e.g., ['gpt-4o-mini', 'gpt-4o']); reject unknown values.
  2. Cap max_tokens server-side at the plan limit; ignore caller-supplied values above the cap.
  3. Replace the in-process Map rate limiter with a shared store (Redis, Upstash, Supabase) so limits hold across instances.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions