generative-computing
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/docs/community/contributing-guide.md‎
Lines changed: 0 additions & 1 deletion b/‎docs/docs/community/contributing-guide.md‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎docs/docs/docs.json‎
Lines changed: 0 additions & 1 deletion b/‎docs/docs/docs.json‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎docs/docs/guide/glossary.md‎
Lines changed: 1 addition & 2 deletions b/‎docs/docs/guide/glossary.md‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎docs/docs/how-to/use-images-and-vision.md‎
Lines changed: 0 additions & 1 deletion b/‎docs/docs/how-to/use-images-and-vision.md‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎docs/docs/index.mdx‎
Lines changed: 0 additions & 3 deletions b/‎docs/docs/index.mdx‎
Lines changed: 0 additions & 3 deletions
diff --git a/‎docs/docs/integrations/vllm.md‎
Lines changed: 0 additions & 89 deletions b/‎docs/docs/integrations/vllm.md‎
Lines changed: 0 additions & 89 deletions
@@ -55,7 +55,7 @@ print(user.age)   # 31 — always an int, guaranteed by the schema
 - **Structured output** — `@generative` turns typed functions into LLM calls; Pydantic schemas are enforced at generation time
 - **Requirements & repair** — attach natural-language requirements to any call; Mellea validates and retries automatically
 - **Sampling strategies** — run a generation multiple times and pick the best result; swap between rejection sampling, majority voting, and more with one parameter change
-- **Multiple backends** — Ollama, OpenAI, vLLM, HuggingFace, WatsonX, LiteLLM, Bedrock
+- **Multiple backends** — Ollama, OpenAI, HuggingFace, WatsonX, LiteLLM, Bedrock
 - **Legacy integration** — easily drop Mellea into existing codebases with `mify`
 - **MCP compatible** — expose any generative program as an MCP tool
 
 
@@ -215,7 +215,6 @@ Tests use a four-tier granularity system. Every test belongs to exactly one tier
 | `openai` | OpenAI API or compatible | API calls (may use Ollama `/v1`) |
 | `watsonx` | Watsonx API | API calls, requires credentials |
 | `huggingface` | HuggingFace transformers | Local, GPU required |
-| `vllm` | vLLM | Local, GPU required |
 | `litellm` | LiteLLM (wraps other backends) | Depends on underlying backend |
 | `bedrock` | AWS Bedrock | API calls, requires credentials |
 
 
@@ -87,7 +87,6 @@
             "pages": [
               "integrations/ollama",
               "integrations/huggingface",
-              "integrations/vllm",
               "integrations/openai",
               "integrations/vertex-ai",
               "integrations/bedrock",
 
@@ -60,8 +60,7 @@ See: [Generative Functions](./generative-functions)
 ## Backend
 
 A backend is an inference engine that Mellea uses to run LLM calls. Examples:
-`OllamaModelBackend`, `OpenAIBackend`, `LocalHFBackend`, `LocalVLLMBackend`,
-`WatsonxAIBackend`. Backends are configured via `MelleaSession` or
+`OllamaModelBackend`, `OpenAIBackend`, `LocalHFBackend`, `WatsonxAIBackend`. Backends are configured via `MelleaSession` or
 `start_session()`.
 
 See: [Backends and Configuration](./backends-and-configuration)
 
@@ -114,7 +114,6 @@ To remove images from context on the next turn, pass `images=[]` explicitly.
 | `OpenAIBackend` | ✓ | Use with `gpt-4o`, or a local vision model via OpenAI-compatible endpoint |
 | `LiteLLMBackend` | ✓ | Depends on the underlying provider |
 | `LocalHFBackend` | Partial | Model-dependent; experimental |
-| `LocalVLLMBackend` | Partial | Model-dependent |
 | `WatsonxAIBackend` | ✗ | Not currently supported |
 
 > **Full example (Ollama):** [`docs/examples/image_text_models/vision_ollama_chat.py`](https://github.com/generative-computing/mellea/blob/main/docs/examples/image_text_models/vision_ollama_chat.py)
 
@@ -97,9 +97,6 @@ Mellea is backend-agnostic. The same program runs on any inference engine.
   <Card title="HuggingFace" icon="microchip" href="/integrations/huggingface">
     Local inference with Transformers — aLoRA and constrained decoding.
   </Card>
-  <Card title="vLLM" icon="microchip" href="/integrations/vllm">
-    High-throughput batched local inference on Linux + CUDA.
-  </Card>
   <Card title="LiteLLM / Vertex AI" icon="cloud" href="/integrations/vertex-ai">
     Google Vertex AI, Anthropic, and 100+ providers via LiteLLM.
   </Card>