You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AWS Bedrock / Vertex Claude reasoning docs; fix issue with autocomplete vs. codeCompletion (#1165)
@MaedahBatool I reviewed your PR
#1162 but needed a bunch of
changes to it so it was easier to send like this
This is improved docs around reasoning for Bedrock and Vertex (the
claude 4 changes)
I also bundled in a small fix for where our docs referred to
`"autocomplete"` in the modelConfiguration `defaultModels` section -
which has never been a valid option except under `capabilities` for
models - so I corrected it to `codeCompletion` which is the correct
option to use (a customer ran into this, so wanted to get this fix in)
Signed-off-by: Emi <emi@sourcegraph.com>
Copy file name to clipboardExpand all lines: docs/cody/capabilities/supported-models.mdx
+8-6Lines changed: 8 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,17 +29,19 @@ Cody supports a variety of cutting-edge large language models for use in chat an
29
29
30
30
<Callouttype="note">To use Claude 3 Sonnet models with Cody Enterprise, make sure you've upgraded your Sourcegraph instance to the latest version. </Callout>
31
31
32
-
### Claude 3.7 Sonnet
32
+
### Claude 3.7 and 4 Sonnet
33
33
34
-
Claude 3.7 has two variants — Claude 3.7 Sonnet and Claude 3.7 Extended Thinking — to support deep reasoning and fast, responsive edit workflows. This means you can use Claude 3.7 in different contexts depending on whether long-form reasoning is required or for tasks where speed and performance are a priority.
34
+
Claude 3.7 and 4 Sonnet have two variants; the base version, and the 'extended thinking' version which supports deep reasoning and fast, responsive edit workflows. Cody enables using both, and lets the user select which to use in the model dropdown selector, so the user can choose whether to use extended thinkig depending on their work task.
35
35
36
-
Claude 3.7 Extended Thinking is the recommended default chat model for Cloud customers. Self-hosted customers are encouraged to follow this recommendation, as Claude 3.7 outperforms 3.5 in most scenarios.
36
+
<Callouttype="note">
37
+
Claude 4 support is available starting in Sourcegraph v6.4+ and v6.3.4167.
38
+
</Callout>
37
39
38
-
#### Claude 3.7 for GCP
40
+
#### Claude 3.7 and 4 via Google Vertex, via AWS Bedrock
39
41
40
-
In addition, Sourcegraph Enterprise customers using GCP Vertex (Google Cloud Platform) for Claude models can use both these variants of Claude 3.7 to optimize extended reasoning and deeper understanding. Customers using AWS Bedrock do not have the Claude 3.7 Extended Thinking variant.
42
+
Starting in Sourcegraph v6.4+ and v6.3.416, Claude 3.7 Extended Thinking - as well as Claude 4 base and extended thinking variants - are available in Sourcegraph when using Claude through either Google Vertex or AWS Bedrock.
41
43
42
-
<Callouttype="info">Claude 3.7 Sonnet with thinking is not supported for BYOK deployments.</Callout>
44
+
See [Model Configuration: Reasoning models](/cody/enterprise/model-configuration#reasoning-models)for more information.
Copy file name to clipboardExpand all lines: docs/cody/enterprise/model-config-examples.mdx
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -104,7 +104,7 @@ In the configuration above, we:
104
104
- Define a new provider with the ID `"anthropic-byok"` and configure it to use the Anthropic API
105
105
- Since this provider is unknown to Sourcegraph, no Sourcegraph-supplied models are available. Therefore, we add a custom model in the `"modelOverrides"` section
106
106
- Use the custom model configured in the previous step (`"anthropic-byok::2024-10-22::claude-3.5-sonnet"`) for `"chat"`. Requests are sent directly to the Anthropic API as set in the provider override
107
-
- For `"fastChat"` and `"autocomplete"`, we use Sourcegraph-provided models via Cody Gateway
107
+
- For `"fastChat"` and `"codeCompletion"`, we use Sourcegraph-provided models via Cody Gateway
108
108
109
109
## Config examples for various LLM providers
110
110
@@ -244,7 +244,7 @@ In the configuration above,
244
244
- Set up a provider override for Fireworks, routing requests for this provider directly to the specified Fireworks endpoint (bypassing Cody Gateway)
245
245
- Add two Fireworks models:
246
246
- `"fireworks::v1::mixtral-8x7b-instruct"` with "chat" capabiity - used for "chat" and "fastChat"
247
-
- `"fireworks::v1::starcoder-16b"` with "autocomplete" capability - used for "autocomplete"
247
+
- `"fireworks::v1::starcoder-16b"` with "autocomplete" capability - used for "codeCompletion"
248
248
249
249
</Accordion>
250
250
@@ -721,7 +721,7 @@ In the configuration above,
721
721
In the configuration above,
722
722
723
723
- Set up a provider override for Google Anthropic, routing requests for this provider directly to the specified endpoint (bypassing Cody Gateway)
724
-
- Add two Anthropic models: - `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat" - `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "autocomplete"
724
+
- Add two Anthropic models: - `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat" - `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "codeCompletion"
Copy file name to clipboardExpand all lines: docs/cody/enterprise/model-configuration.mdx
+45-4Lines changed: 45 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -89,7 +89,7 @@ To disable all Sourcegraph-provided models and use only the models explicitly de
89
89
90
90
## Default models
91
91
92
-
The `"modelConfiguration"` setting includes a `"defaultModels"` field, which allows you to specify the LLM model used for each Cody feature (`"chat"`, `"fastChat"`, and `"autocomplete"`). The values for each feature should be `modelRef`s of either Sourcegraph-provided models or models configured in the `modelOverrides` section.
92
+
The `"modelConfiguration"` setting includes a `"defaultModels"` field, which allows you to specify the LLM model used for each Cody feature (`"chat"`, `"fastChat"`, and `"codeCompletion"`). The values for each feature should be `modelRef`s of either Sourcegraph-provided models or models configured in the `modelOverrides` section.
93
93
94
94
If no default is specified or the specified model is not found, the configuration will silently fall back to a suitable alternative.
- A custom model, `"CodeLlama-7b-hf"`, is added using the `"huggingface-codellama"` provider
304
304
- Default models are set up as follows:
305
305
- Sourcegraph-provided models are used for `"chat"` and `"fastChat"` (accessed via Cody Gateway)
306
-
- The newly configured model, `"huggingface-codellama::v1::CodeLlama-7b-hf"`, is used for `"autocomplete"` (connecting directly to Hugging Face’s OpenAI-compatible API)
306
+
- The newly configured model, `"huggingface-codellama::v1::CodeLlama-7b-hf"`, is used for `"codeCompletion"` (connecting directly to Hugging Face’s OpenAI-compatible API)
Claude 3.7 and 4 support is available starting in Sourcegraph v6.4+ and v6.3.4167 out of-the-box when using Cody Gateway.
486
+
487
+
This section is primarily relevant to Sourcegraph Enterprise customers using AWS Bedrock or Google Vertex.
488
+
</Callout>
489
+
490
+
Reasoning models can be added via `modelOverrides` in the site configuration by adding the `reasoning` capability to the `capabilities` list, and setting the `reasoningEffort` field on the model. Both must be set for the models' reasoning functionality to be used (otherwise the base model without reasoning / exteded thinking will be used.)
491
+
492
+
For example, this `modelOverride` would create a `Claude Sonnet 4 with Thinking` option in the Cody model selector menu, and when the user chats with Cody with that model selected, it would use Claude Sonnet 4's Extended Thinking support with a `low` reasoning effort for the users' chat:
The `reasoningEffort` field is only used by reasoning models (those having `reasoning` in their `capabilities` section). Supported values are `high`, `medium`, `low`. How this value is treated depends on the specific provider:
517
+
518
+
*`anthropic` provider treats e.g. `low` effort to mean that the minimum [`thinking.budget_tokens`](https://docs.anthropic.com/en/api/messages#body-thinking) value (1024) will be used. For other `reasoningEffort` values, the `contextWindow.maxOutputTokens / 2` value will be used.
519
+
*`openai` provider maps the `reasoningEffort` field value to the [OpenAI `reasoning_effort`](https://platform.openai.com/docs/api-reference/chat/create#chat-create-reasoning_effort) request body value.
Copy file name to clipboardExpand all lines: public/llms.txt
+7-7Lines changed: 7 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -14532,7 +14532,7 @@ To disable all Sourcegraph-provided models and use only the models explicitly de
14532
14532
14533
14533
## Default models
14534
14534
14535
-
The `"modelConfiguration"` setting includes a `"defaultModels"` field, which allows you to specify the LLM model used for each Cody feature (`"chat"`, `"fastChat"`, and `"autocomplete"`). The values for each feature should be `modelRef`s of either Sourcegraph-provided models or models configured in the `modelOverrides` section.
14535
+
The `"modelConfiguration"` setting includes a `"defaultModels"` field, which allows you to specify the LLM model used for each Cody feature (`"chat"`, `"fastChat"`, and `"codeCompletion"`). The values for each feature should be `modelRef`s of either Sourcegraph-provided models or models configured in the `modelOverrides` section.
14536
14536
14537
14537
If no default is specified or the specified model is not found, the configuration will silently fall back to a suitable alternative.
- A custom model, `"CodeLlama-7b-hf"`, is added using the `"huggingface-codellama"` provider
14738
14738
- Default models are set up as follows:
14739
14739
- Sourcegraph-provided models are used for `"chat"` and `"fastChat"` (accessed via Cody Gateway)
14740
-
- The newly configured model, `"huggingface-codellama::v1::CodeLlama-7b-hf"`, is used for `"autocomplete"` (connecting directly to Hugging Face’s OpenAI-compatible API)
14740
+
- The newly configured model, `"huggingface-codellama::v1::CodeLlama-7b-hf"`, is used for `"codeCompletion"` (connecting directly to Hugging Face’s OpenAI-compatible API)
14741
14741
14742
14742
#### Example configuration with Claude 3.7 Sonnet
14743
14743
@@ -15162,7 +15162,7 @@ In the configuration above,
15162
15162
- Set up a provider override for Fireworks, routing requests for this provider directly to the specified Fireworks endpoint (bypassing Cody Gateway)
15163
15163
- Add two Fireworks models:
15164
15164
- `"fireworks::v1::mixtral-8x7b-instruct"` with "chat" capabiity - used for "chat" and "fastChat"
15165
-
- `"fireworks::v1::starcoder-16b"` with "autocomplete" capability - used for "autocomplete"
15165
+
- `"fireworks::v1::starcoder-16b"` with "autocomplete" capability - used for "codeCompletion"
15166
15166
15167
15167
</Accordion>
15168
15168
@@ -15327,7 +15327,7 @@ In the configuration above,
15327
15327
**Note:** For Azure OpenAI, ensure that the `modelName` matches the name defined in your Azure portal configuration for the model.
15328
15328
- Add four OpenAI models:
15329
15329
- `"azure-openai::unknown::gpt-4o"` with chat capability - used as a default model for chat
15330
-
- `"azure-openai::unknown::gpt-4.1-nano"` with chat, edit and autocomplete capabilities - used as a default model for fast chat and autocomplete
15330
+
- `"azure-openai::unknown::gpt-4.1-nano"` with chat, edit and autocomplete capabilities - used as a default model for fast chat and codeCompletion
15331
15331
- `"azure-openai::unknown::o3-mini"` with chat and reasoning capabilities - o-series model that supports thinking, can be used for chat (note: to enable thinking, model override should include "reasoning" capability and have "reasoningEffort" defined)
15332
15332
- `"azure-openai::unknown::gpt-35-turbo-instruct-test"` with "autocomplete" capability - included as an alternative model
15333
15333
- Since `"azure-openai::unknown::gpt-35-turbo-instruct-test"` is not supported on the newer OpenAI `"v1/chat/completions"` endpoint, we set `"useDeprecatedCompletionsAPI"` to `true` to route requests to the legacy `"v1/completions"` endpoint. This setting is unnecessary if you are using a model supported on the `"v1/chat/completions"` endpoint.
@@ -15597,7 +15597,7 @@ In the configuration above,
15597
15597
In the configuration above,
15598
15598
15599
15599
- Set up a provider override for Google Anthropic, routing requests for this provider directly to the specified endpoint (bypassing Cody Gateway)
15600
-
- Add two Anthropic models: - `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat" - `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "autocomplete"
15600
+
- Add two Anthropic models: - `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat" - `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "codeCompletion"
0 commit comments