Skip to content

Commit bc3d35a

Browse files
authored
AWS Bedrock / Vertex Claude reasoning docs; fix issue with autocomplete vs. codeCompletion (#1165)
@MaedahBatool I reviewed your PR #1162 but needed a bunch of changes to it so it was easier to send like this This is improved docs around reasoning for Bedrock and Vertex (the claude 4 changes) I also bundled in a small fix for where our docs referred to `"autocomplete"` in the modelConfiguration `defaultModels` section - which has never been a valid option except under `capabilities` for models - so I corrected it to `codeCompletion` which is the correct option to use (a customer ran into this, so wanted to get this fix in) Signed-off-by: Emi <emi@sourcegraph.com>
1 parent fc3520f commit bc3d35a

4 files changed

Lines changed: 63 additions & 20 deletions

File tree

docs/cody/capabilities/supported-models.mdx

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -29,17 +29,19 @@ Cody supports a variety of cutting-edge large language models for use in chat an
2929

3030
<Callout type="note">To use Claude 3 Sonnet models with Cody Enterprise, make sure you've upgraded your Sourcegraph instance to the latest version. </Callout>
3131

32-
### Claude 3.7 Sonnet
32+
### Claude 3.7 and 4 Sonnet
3333

34-
Claude 3.7 has two variants — Claude 3.7 Sonnet and Claude 3.7 Extended Thinking — to support deep reasoning and fast, responsive edit workflows. This means you can use Claude 3.7 in different contexts depending on whether long-form reasoning is required or for tasks where speed and performance are a priority.
34+
Claude 3.7 and 4 Sonnet have two variants; the base version, and the 'extended thinking' version which supports deep reasoning and fast, responsive edit workflows. Cody enables using both, and lets the user select which to use in the model dropdown selector, so the user can choose whether to use extended thinkig depending on their work task.
3535

36-
Claude 3.7 Extended Thinking is the recommended default chat model for Cloud customers. Self-hosted customers are encouraged to follow this recommendation, as Claude 3.7 outperforms 3.5 in most scenarios.
36+
<Callout type="note">
37+
Claude 4 support is available starting in Sourcegraph v6.4+ and v6.3.4167.
38+
</Callout>
3739

38-
#### Claude 3.7 for GCP
40+
#### Claude 3.7 and 4 via Google Vertex, via AWS Bedrock
3941

40-
In addition, Sourcegraph Enterprise customers using GCP Vertex (Google Cloud Platform) for Claude models can use both these variants of Claude 3.7 to optimize extended reasoning and deeper understanding. Customers using AWS Bedrock do not have the Claude 3.7 Extended Thinking variant.
42+
Starting in Sourcegraph v6.4+ and v6.3.416, Claude 3.7 Extended Thinking - as well as Claude 4 base and extended thinking variants - are available in Sourcegraph when using Claude through either Google Vertex or AWS Bedrock.
4143

42-
<Callout type="info">Claude 3.7 Sonnet with thinking is not supported for BYOK deployments.</Callout>
44+
See [Model Configuration: Reasoning models](/cody/enterprise/model-configuration#reasoning-models) for more information.
4345

4446
## Autocomplete
4547

docs/cody/enterprise/model-config-examples.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ In the configuration above, we:
104104
- Define a new provider with the ID `"anthropic-byok"` and configure it to use the Anthropic API
105105
- Since this provider is unknown to Sourcegraph, no Sourcegraph-supplied models are available. Therefore, we add a custom model in the `"modelOverrides"` section
106106
- Use the custom model configured in the previous step (`"anthropic-byok::2024-10-22::claude-3.5-sonnet"`) for `"chat"`. Requests are sent directly to the Anthropic API as set in the provider override
107-
- For `"fastChat"` and `"autocomplete"`, we use Sourcegraph-provided models via Cody Gateway
107+
- For `"fastChat"` and `"codeCompletion"`, we use Sourcegraph-provided models via Cody Gateway
108108

109109
## Config examples for various LLM providers
110110

@@ -244,7 +244,7 @@ In the configuration above,
244244
- Set up a provider override for Fireworks, routing requests for this provider directly to the specified Fireworks endpoint (bypassing Cody Gateway)
245245
- Add two Fireworks models:
246246
- `"fireworks::v1::mixtral-8x7b-instruct"` with "chat" capabiity - used for "chat" and "fastChat"
247-
- `"fireworks::v1::starcoder-16b"` with "autocomplete" capability - used for "autocomplete"
247+
- `"fireworks::v1::starcoder-16b"` with "autocomplete" capability - used for "codeCompletion"
248248

249249
</Accordion>
250250

@@ -721,7 +721,7 @@ In the configuration above,
721721
In the configuration above,
722722

723723
- Set up a provider override for Google Anthropic, routing requests for this provider directly to the specified endpoint (bypassing Cody Gateway)
724-
- Add two Anthropic models: - `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat" - `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "autocomplete"
724+
- Add two Anthropic models: - `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat" - `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "codeCompletion"
725725

726726
</Accordion>
727727

docs/cody/enterprise/model-configuration.mdx

Lines changed: 45 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ To disable all Sourcegraph-provided models and use only the models explicitly de
8989

9090
## Default models
9191

92-
The `"modelConfiguration"` setting includes a `"defaultModels"` field, which allows you to specify the LLM model used for each Cody feature (`"chat"`, `"fastChat"`, and `"autocomplete"`). The values for each feature should be `modelRef`s of either Sourcegraph-provided models or models configured in the `modelOverrides` section.
92+
The `"modelConfiguration"` setting includes a `"defaultModels"` field, which allows you to specify the LLM model used for each Cody feature (`"chat"`, `"fastChat"`, and `"codeCompletion"`). The values for each feature should be `modelRef`s of either Sourcegraph-provided models or models configured in the `modelOverrides` section.
9393

9494
If no default is specified or the specified model is not found, the configuration will silently fall back to a suitable alternative.
9595

@@ -168,7 +168,7 @@ Example configuration:
168168
"defaultModels": {
169169
"chat": "google::v1::gemini-1.5-pro",
170170
"fastChat": "anthropic::2023-06-01::claude-3-haiku",
171-
"autocomplete": "fireworks::v1::deepseek-coder-v2-lite-base"
171+
"codeCompletion": "fireworks::v1::deepseek-coder-v2-lite-base"
172172
}
173173
}
174174
```
@@ -291,7 +291,7 @@ For OpenAI reasoning models, the `reasoningEffort` field value corresponds to th
291291
"defaultModels": {
292292
"chat": "google::v1::gemini-1.5-pro",
293293
"fastChat": "anthropic::2023-06-01::claude-3-haiku",
294-
"autocomplete": "huggingface-codellama::v1::CodeLlama-7b-hf"
294+
"codeCompletion": "huggingface-codellama::v1::CodeLlama-7b-hf"
295295
}
296296
}
297297
```
@@ -303,7 +303,7 @@ In the example above:
303303
- A custom model, `"CodeLlama-7b-hf"`, is added using the `"huggingface-codellama"` provider
304304
- Default models are set up as follows:
305305
- Sourcegraph-provided models are used for `"chat"` and `"fastChat"` (accessed via Cody Gateway)
306-
- The newly configured model, `"huggingface-codellama::v1::CodeLlama-7b-hf"`, is used for `"autocomplete"` (connecting directly to Hugging Face’s OpenAI-compatible API)
306+
- The newly configured model, `"huggingface-codellama::v1::CodeLlama-7b-hf"`, is used for `"codeCompletion"` (connecting directly to Hugging Face’s OpenAI-compatible API)
307307

308308
#### Example configuration with Claude 3.7 Sonnet
309309

@@ -478,3 +478,44 @@ The response includes:
478478
"codeCompletion": "fireworks::v1::deepseek-coder-v2-lite-base"
479479
}
480480
```
481+
482+
## Reasoning models
483+
484+
<Callout type="note">
485+
Claude 3.7 and 4 support is available starting in Sourcegraph v6.4+ and v6.3.4167 out of-the-box when using Cody Gateway.
486+
487+
This section is primarily relevant to Sourcegraph Enterprise customers using AWS Bedrock or Google Vertex.
488+
</Callout>
489+
490+
Reasoning models can be added via `modelOverrides` in the site configuration by adding the `reasoning` capability to the `capabilities` list, and setting the `reasoningEffort` field on the model. Both must be set for the models' reasoning functionality to be used (otherwise the base model without reasoning / exteded thinking will be used.)
491+
492+
For example, this `modelOverride` would create a `Claude Sonnet 4 with Thinking` option in the Cody model selector menu, and when the user chats with Cody with that model selected, it would use Claude Sonnet 4's Extended Thinking support with a `low` reasoning effort for the users' chat:
493+
494+
```json
495+
{
496+
"modelRef": "bedrock::2024-10-22::claude-sonnet-4-thinking-latest",
497+
"displayName": "Claude Sonnet 4 with Thinking",
498+
"modelName": "claude-sonnet-4-20250514",
499+
"contextWindow": {
500+
"maxInputTokens": 93000,
501+
"maxOutputTokens": 64000,
502+
"maxUserInputTokens": 18000
503+
},
504+
"capabilities": [
505+
"chat",
506+
"reasoning"
507+
],
508+
"reasoningEffort": "low",
509+
"category": "accuracy",
510+
"status": "stable"
511+
}
512+
```
513+
514+
<Accordion title="Understading reasoningEffort">
515+
516+
The `reasoningEffort` field is only used by reasoning models (those having `reasoning` in their `capabilities` section). Supported values are `high`, `medium`, `low`. How this value is treated depends on the specific provider:
517+
518+
* `anthropic` provider treats e.g. `low` effort to mean that the minimum [`thinking.budget_tokens`](https://docs.anthropic.com/en/api/messages#body-thinking) value (1024) will be used. For other `reasoningEffort` values, the `contextWindow.maxOutputTokens / 2` value will be used.
519+
* `openai` provider maps the `reasoningEffort` field value to the [OpenAI `reasoning_effort`](https://platform.openai.com/docs/api-reference/chat/create#chat-create-reasoning_effort) request body value.
520+
521+
</Accordion>

public/llms.txt

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14532,7 +14532,7 @@ To disable all Sourcegraph-provided models and use only the models explicitly de
1453214532

1453314533
## Default models
1453414534

14535-
The `"modelConfiguration"` setting includes a `"defaultModels"` field, which allows you to specify the LLM model used for each Cody feature (`"chat"`, `"fastChat"`, and `"autocomplete"`). The values for each feature should be `modelRef`s of either Sourcegraph-provided models or models configured in the `modelOverrides` section.
14535+
The `"modelConfiguration"` setting includes a `"defaultModels"` field, which allows you to specify the LLM model used for each Cody feature (`"chat"`, `"fastChat"`, and `"codeCompletion"`). The values for each feature should be `modelRef`s of either Sourcegraph-provided models or models configured in the `modelOverrides` section.
1453614536

1453714537
If no default is specified or the specified model is not found, the configuration will silently fall back to a suitable alternative.
1453814538

@@ -14611,7 +14611,7 @@ Example configuration:
1461114611
"defaultModels": {
1461214612
"chat": "google::v1::gemini-1.5-pro",
1461314613
"fastChat": "anthropic::2023-06-01::claude-3-haiku",
14614-
"autocomplete": "fireworks::v1::deepseek-coder-v2-lite-base"
14614+
"codeCompletion": "fireworks::v1::deepseek-coder-v2-lite-base"
1461514615
}
1461614616
}
1461714617
```
@@ -14725,7 +14725,7 @@ For OpenAI reasoning models, the `reasoningEffort` field value corresponds to th
1472514725
"defaultModels": {
1472614726
"chat": "google::v1::gemini-1.5-pro",
1472714727
"fastChat": "anthropic::2023-06-01::claude-3-haiku",
14728-
"autocomplete": "huggingface-codellama::v1::CodeLlama-7b-hf"
14728+
"codeCompletion": "huggingface-codellama::v1::CodeLlama-7b-hf"
1472914729
}
1473014730
}
1473114731
```
@@ -14737,7 +14737,7 @@ In the example above:
1473714737
- A custom model, `"CodeLlama-7b-hf"`, is added using the `"huggingface-codellama"` provider
1473814738
- Default models are set up as follows:
1473914739
- Sourcegraph-provided models are used for `"chat"` and `"fastChat"` (accessed via Cody Gateway)
14740-
- The newly configured model, `"huggingface-codellama::v1::CodeLlama-7b-hf"`, is used for `"autocomplete"` (connecting directly to Hugging Face’s OpenAI-compatible API)
14740+
- The newly configured model, `"huggingface-codellama::v1::CodeLlama-7b-hf"`, is used for `"codeCompletion"` (connecting directly to Hugging Face’s OpenAI-compatible API)
1474114741

1474214742
#### Example configuration with Claude 3.7 Sonnet
1474314743

@@ -15162,7 +15162,7 @@ In the configuration above,
1516215162
- Set up a provider override for Fireworks, routing requests for this provider directly to the specified Fireworks endpoint (bypassing Cody Gateway)
1516315163
- Add two Fireworks models:
1516415164
- `"fireworks::v1::mixtral-8x7b-instruct"` with "chat" capabiity - used for "chat" and "fastChat"
15165-
- `"fireworks::v1::starcoder-16b"` with "autocomplete" capability - used for "autocomplete"
15165+
- `"fireworks::v1::starcoder-16b"` with "autocomplete" capability - used for "codeCompletion"
1516615166

1516715167
</Accordion>
1516815168

@@ -15327,7 +15327,7 @@ In the configuration above,
1532715327
**Note:** For Azure OpenAI, ensure that the `modelName` matches the name defined in your Azure portal configuration for the model.
1532815328
- Add four OpenAI models:
1532915329
- `"azure-openai::unknown::gpt-4o"` with chat capability - used as a default model for chat
15330-
- `"azure-openai::unknown::gpt-4.1-nano"` with chat, edit and autocomplete capabilities - used as a default model for fast chat and autocomplete
15330+
- `"azure-openai::unknown::gpt-4.1-nano"` with chat, edit and autocomplete capabilities - used as a default model for fast chat and codeCompletion
1533115331
- `"azure-openai::unknown::o3-mini"` with chat and reasoning capabilities - o-series model that supports thinking, can be used for chat (note: to enable thinking, model override should include "reasoning" capability and have "reasoningEffort" defined)
1533215332
- `"azure-openai::unknown::gpt-35-turbo-instruct-test"` with "autocomplete" capability - included as an alternative model
1533315333
- Since `"azure-openai::unknown::gpt-35-turbo-instruct-test"` is not supported on the newer OpenAI `"v1/chat/completions"` endpoint, we set `"useDeprecatedCompletionsAPI"` to `true` to route requests to the legacy `"v1/completions"` endpoint. This setting is unnecessary if you are using a model supported on the `"v1/chat/completions"` endpoint.
@@ -15597,7 +15597,7 @@ In the configuration above,
1559715597
In the configuration above,
1559815598

1559915599
- Set up a provider override for Google Anthropic, routing requests for this provider directly to the specified endpoint (bypassing Cody Gateway)
15600-
- Add two Anthropic models: - `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat" - `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "autocomplete"
15600+
- Add two Anthropic models: - `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat" - `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "codeCompletion"
1560115601

1560215602
</Accordion>
1560315603

0 commit comments

Comments
 (0)