Before submitting your bug report
Relevant environment info
- OS: Ubuntu 22.04
- Continue version: 0.8.66
- IDE version: VS Code 1.96.4
- Model: any, but in the example mostly TechxGenus/Mistral-Large-Instruct-2407-GPTQ
- config.json:
{
"models": [
{
"title": "Mistral Large vLLM",
"model": "TechxGenus/Mistral-Large-Instruct-2407-GPTQ",
"provider": "vllm",
"apiBase": "http://<open-webui-server>:8080/v1",
},
{
"title": "Mistral Large webui",
"model": "TechxGenus/Mistral-Large-Instruct-2407-GPTQ",
"provider": "openai",
"apiBase": "https://<open-webui-server>/api",
"apiKey": "sk-<apikey>",
"useLegacyCompletionsEndpoint": false
},
{
"title": "Mistral Large webui localhost",
"model": "TechxGenus/Mistral-Large-Instruct-2407-GPTQ",
"provider": "openai",
"apiBase": "http://localhost:3000/api",
"apiKey": "sk-<local apikey>",
"useLegacyCompletionsEndpoint": false
}
],
"contextProviders": [
{
"name": "code",
"params": {}
},
{
"name": "docs",
"params": {}
},
{
"name": "diff",
"params": {}
},
{
"name": "terminal",
"params": {}
},
{
"name": "problems",
"params": {}
},
{
"name": "folder",
"params": {}
},
{
"name": "codebase",
"params": {}
}
],
"slashCommands": [
{
"name": "share",
"description": "Export the current chat session to markdown"
},
{
"name": "cmd",
"description": "Generate a shell command"
},
{
"name": "commit",
"description": "Generate a git commit message"
}
]
}
Description
I got Open-WebUI running with three backends for models: Ollama, vLLM, and Llama.cpp. All 4 services are running in separate docker containers, the "Mistral Large" model is running in vLLM. In the config, In the Continue config, I have an API call directly to vllm (this works), a call to an Open-WebUI instance running behind a nginx server, and a locally hosted Open-WebUI to test without nginx.
The Problem is that with the above config, I get errors when trying to chat with the Open-WebUI "models" in the VS Code Continue plugin window (the errors are different to each other, depending on the nginx server or localhost).
Localhost (VS Code Dev Tools window):
[Extension Host] Error handling webview message: {
"msg": {
"messageId": "0e06bbbb-f108-4b16-af64-6cbbdc813f20",
"messageType": "llm/streamChat",
"data": {
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "test"
}
]
},
{
"role": "assistant",
"content": ""
}
],
"title": "Mistral Large webui local",
"completionOptions": {}
}
}
}
Error: Request timed out.
The time-out takes some time, in this time the vLLM engine/model is responsive to other requests.
nginx-server (VS Code Dev Tools window):
[Extension Host] Error handling webview message: {
"msg": {
"messageId": "450d58ba-f9ae-4b5a-b62a-116e57b6694d",
"messageType": "llm/streamChat",
"data": {
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "test"
}
]
},
{
"role": "assistant",
"content": ""
}
],
"title": "Mistral Large webui",
"completionOptions": {}
}
}
}
Error: Connection error.
notificationsAlerts.ts:42 The request failed with "FetchError": request to https://<open-webui-server>/api/chat/completions failed, reason: Parse Error: Invalid header value char. If you're having trouble setting up Continue, please see the troubleshooting guide for help.
This error comes pretty quick. So at least I know from the second request, that the correct API point (/api/chat/completions) is used. Unfortunately, I don't really know what to do with this error message.
The weird thing is, that I can connect to both APIs with a simple bash script (nginx-server example here):
curl -v -X POST --location 'https://<open-webui-server>/api/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-<apikey>' \
-d '{
"model": "TechxGenus/Mistral-Large-Instruct-2407-GPTQ",
"messages": [
{
"role": "user",
"content": "Write 50 words."
}
]
}'
This produces an answer. I get the same problem with Ollama models, and with Llama.cpp models. I can use Continue with the backend apis directly, and I can curl to every model through Open-WebUI (and the backends), but Continue can't use the Open-WebUI API. Because I have changing models/backends and I don't want to expose the backends directly to users, which want to use the API, it would be really great to use the unified Open-WebUI API calls.
PS: I tried to append header information to the model config but it didn't change the outcome:
"requestOptions": {
"headers": {
"Authorization":"Bearer sk-5111770f0a6a4fb08c3d7fd1673f4309",
"Content-Type": "application/json"
}
}
PPS: I found #3620 and #1380, but both use the ollama endpoint, which I can't use with the vLLM/Llama.cpp models.
To reproduce
No response
Log output
Before submitting your bug report
Relevant environment info
Description
I got Open-WebUI running with three backends for models: Ollama, vLLM, and Llama.cpp. All 4 services are running in separate docker containers, the "Mistral Large" model is running in vLLM. In the config, In the
Continueconfig, I have an API call directly to vllm (this works), a call to an Open-WebUI instance running behind a nginx server, and a locally hosted Open-WebUI to test without nginx.The Problem is that with the above config, I get errors when trying to chat with the Open-WebUI "models" in the VS Code Continue plugin window (the errors are different to each other, depending on the nginx server or localhost).
Localhost (VS Code Dev Tools window):
The time-out takes some time, in this time the vLLM engine/model is responsive to other requests.
nginx-server (VS Code Dev Tools window):
This error comes pretty quick. So at least I know from the second request, that the correct API point (/api/chat/completions) is used. Unfortunately, I don't really know what to do with this error message.
The weird thing is, that I can connect to both APIs with a simple bash script (nginx-server example here):
This produces an answer. I get the same problem with Ollama models, and with Llama.cpp models. I can use
Continuewith the backend apis directly, and I cancurlto every model through Open-WebUI (and the backends), butContinuecan't use the Open-WebUI API. Because I have changing models/backends and I don't want to expose the backends directly to users, which want to use the API, it would be really great to use the unified Open-WebUI API calls.PS: I tried to append header information to the model config but it didn't change the outcome:
PPS: I found #3620 and #1380, but both use the ollama endpoint, which I can't use with the vLLM/Llama.cpp models.
To reproduce
No response
Log output