How to handle rate limits when you can't change the rate-limit in the external model provider ? #10780
Replies: 2 comments 1 reply
-
|
This is a bigger topic but the short version is to use a queue system to throttle requests or configure |
Beta Was this translation helpful? Give feedback.
-
|
Worth being precise about which rate-limit you're hitting — your error says What Continue exposes today (in models:
- name: Claude Sonnet 4.6
provider: anthropic
model: claude-sonnet-4-6
defaultCompletionOptions:
contextLength: 100000 # ← this directly reduces input tokens per request
maxTokens: 8000 # output cap (doesn't affect input rate)
requestOptions:
timeout: 600 # seconds; gives retries time on slow paths
The cost lever (separate from rate limit, but worth mentioning): Continue does support Anthropic prompt caching — Model swap as a tactical fix: rate limits are per-model-tier. What Continue doesn't have: a client-side request throttle / queue. There's no Rate-limit headers from Anthropic come back in every response ( |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I just started using Claude models with Continue and immediately hit rate limit errors like the following when asking a question in the chat that involves multiple tool calls ( e.g. searching in code base for a topic / issue ).
{"type":"rate_limit_error","message":"This request would exceed your organization's rate limit of 10,000 input tokens per minute (org: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, model: claude-sonnet-4-6). For details, refer to: https://docs.claude.com/en/api/rate-limits. You can see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}
Is there a way to "slow down" or "reduce speed" of Continue when sending request to the model provider ?
I.e. I can't change the minute-rate-limit within the Claude account, but just getting "slower" results would be fine for me.
Looking at the recent issues I saw tons of those regarding rate-limit, thus assuming this seems to be a general issue for lots of users.
How do you workaround such external limitations ?
Is there a configuration that could be used to stay below a given rate-limit and still being able to send a chat message that triggers more tool calls and automatic follow-up actions of the model ?
Beta Was this translation helpful? Give feedback.
All reactions