Severity : Medium
CVSS : 6.5 (AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H)
Endpoint : https://vectorbase.dev/api/v1/chat
Caller-controlled model and max_tokens enables unlimited OpenAI cost abuse
Summary
The /api/v1/chat endpoint deserializes model and max_tokens directly from the request body with no server-side allowlist or cap. Any API key holder can specify arbitrarily expensive OpenAI models (e.g., o1-preview, gpt-4) and unrestricted token counts, billing the service operator for the full cost with no rate limit that holds in multi-instance deployments.
Steps / PoC
- Send a chat request with an expensive model override:
curl -s -X POST https://vectorbase.dev/api/v1/chat \
-H "Authorization: Bearer <vb_sk_...>" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Write a 10,000-word essay on AI."}],
"model": "gpt-4",
"max_tokens": 50000
}'
Expected: the server calls OpenAI with model: "gpt-4" and max_tokens: 50000 instead of the default gpt-4o-mini/1000, incurring ~50x the intended cost per request.
- Root cause in
src/app/api/v1/chat/route.ts:
const {
messages,
session_id,
stream = true,
model = 'gpt-4o-mini',
temperature = 0.7,
max_tokens = 1000
} = body
// model and max_tokens passed directly to OpenAI call — no allowlist, no cap
const response = await openai.chat.completions.create({
model,
messages: systemMessages,
stream,
temperature,
max_tokens,
})
- Compound factor — in-process rate limiter is per-instance (
Map<string, RateLimitEntry> in module scope). On Vercel or any serverless/multi-container deployment, each instance has an independent store, so the 60 req/min limit is easily bypassed by spreading requests across instances, making cost abuse unlimited in practice.
Impact
Any API key holder can exhaust the operator's OpenAI budget by specifying expensive models and high token counts; repeated at scale this forces the operator into OpenAI overage or service shutdown.
Fix
- Validate
model against an explicit allowlist (e.g., ['gpt-4o-mini', 'gpt-4o']); reject unknown values.
- Cap
max_tokens server-side at the plan limit; ignore caller-supplied values above the cap.
- Replace the in-process Map rate limiter with a shared store (Redis, Upstash, Supabase) so limits hold across instances.
Severity : Medium
CVSS : 6.5 (AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H)
Endpoint : https://vectorbase.dev/api/v1/chat
Caller-controlled model and max_tokens enables unlimited OpenAI cost abuse
Summary
The
/api/v1/chatendpoint deserializesmodelandmax_tokensdirectly from the request body with no server-side allowlist or cap. Any API key holder can specify arbitrarily expensive OpenAI models (e.g.,o1-preview,gpt-4) and unrestricted token counts, billing the service operator for the full cost with no rate limit that holds in multi-instance deployments.Steps / PoC
Expected: the server calls OpenAI with
model: "gpt-4"andmax_tokens: 50000instead of the defaultgpt-4o-mini/1000, incurring ~50x the intended cost per request.src/app/api/v1/chat/route.ts:Map<string, RateLimitEntry>in module scope). On Vercel or any serverless/multi-container deployment, each instance has an independent store, so the 60 req/min limit is easily bypassed by spreading requests across instances, making cost abuse unlimited in practice.Impact
Any API key holder can exhaust the operator's OpenAI budget by specifying expensive models and high token counts; repeated at scale this forces the operator into OpenAI overage or service shutdown.
Fix
modelagainst an explicit allowlist (e.g.,['gpt-4o-mini', 'gpt-4o']); reject unknown values.max_tokensserver-side at the plan limit; ignore caller-supplied values above the cap.