Skip to content

Commit b186b73

Browse files
authored
feat: removing vllm backend (#781)
* first pass at removing vllm * adding an externally managed vllm for tests * removing vllm backend references from docs * adding back some comments * adding some logic to use a custom venv for vllm tests * adding some vllm params to reduc gpu footprint * defaulting to granite 4 micro * adding a skip warmup flag (making it general
1 parent b8c29e6 commit b186b73

18 files changed

Lines changed: 1342 additions & 3794 deletions

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ print(user.age) # 31 — always an int, guaranteed by the schema
5555
- **Structured output**`@generative` turns typed functions into LLM calls; Pydantic schemas are enforced at generation time
5656
- **Requirements & repair** — attach natural-language requirements to any call; Mellea validates and retries automatically
5757
- **Sampling strategies** — run a generation multiple times and pick the best result; swap between rejection sampling, majority voting, and more with one parameter change
58-
- **Multiple backends** — Ollama, OpenAI, vLLM, HuggingFace, WatsonX, LiteLLM, Bedrock
58+
- **Multiple backends** — Ollama, OpenAI, HuggingFace, WatsonX, LiteLLM, Bedrock
5959
- **Legacy integration** — easily drop Mellea into existing codebases with `mify`
6060
- **MCP compatible** — expose any generative program as an MCP tool
6161

docs/docs/community/contributing-guide.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,6 @@ Tests use a four-tier granularity system. Every test belongs to exactly one tier
215215
| `openai` | OpenAI API or compatible | API calls (may use Ollama `/v1`) |
216216
| `watsonx` | Watsonx API | API calls, requires credentials |
217217
| `huggingface` | HuggingFace transformers | Local, GPU required |
218-
| `vllm` | vLLM | Local, GPU required |
219218
| `litellm` | LiteLLM (wraps other backends) | Depends on underlying backend |
220219
| `bedrock` | AWS Bedrock | API calls, requires credentials |
221220

docs/docs/docs.json

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,6 @@
8787
"pages": [
8888
"integrations/ollama",
8989
"integrations/huggingface",
90-
"integrations/vllm",
9190
"integrations/openai",
9291
"integrations/vertex-ai",
9392
"integrations/bedrock",

docs/docs/guide/glossary.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,7 @@ See: [Generative Functions](./generative-functions)
6060
## Backend
6161

6262
A backend is an inference engine that Mellea uses to run LLM calls. Examples:
63-
`OllamaModelBackend`, `OpenAIBackend`, `LocalHFBackend`, `LocalVLLMBackend`,
64-
`WatsonxAIBackend`. Backends are configured via `MelleaSession` or
63+
`OllamaModelBackend`, `OpenAIBackend`, `LocalHFBackend`, `WatsonxAIBackend`. Backends are configured via `MelleaSession` or
6564
`start_session()`.
6665

6766
See: [Backends and Configuration](./backends-and-configuration)

docs/docs/how-to/use-images-and-vision.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,6 @@ To remove images from context on the next turn, pass `images=[]` explicitly.
114114
| `OpenAIBackend` || Use with `gpt-4o`, or a local vision model via OpenAI-compatible endpoint |
115115
| `LiteLLMBackend` || Depends on the underlying provider |
116116
| `LocalHFBackend` | Partial | Model-dependent; experimental |
117-
| `LocalVLLMBackend` | Partial | Model-dependent |
118117
| `WatsonxAIBackend` || Not currently supported |
119118

120119
> **Full example (Ollama):** [`docs/examples/image_text_models/vision_ollama_chat.py`](https://github.com/generative-computing/mellea/blob/main/docs/examples/image_text_models/vision_ollama_chat.py)

docs/docs/index.mdx

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -97,9 +97,6 @@ Mellea is backend-agnostic. The same program runs on any inference engine.
9797
<Card title="HuggingFace" icon="microchip" href="/integrations/huggingface">
9898
Local inference with Transformers — aLoRA and constrained decoding.
9999
</Card>
100-
<Card title="vLLM" icon="microchip" href="/integrations/vllm">
101-
High-throughput batched local inference on Linux + CUDA.
102-
</Card>
103100
<Card title="LiteLLM / Vertex AI" icon="cloud" href="/integrations/vertex-ai">
104101
Google Vertex AI, Anthropic, and 100+ providers via LiteLLM.
105102
</Card>

docs/docs/integrations/vllm.md

Lines changed: 0 additions & 89 deletions
This file was deleted.

0 commit comments

Comments
 (0)