This branch adds support for IBM Watson X AI with Granite models as an alternative to Ollama for running LocalGPT.
LocalGPT now supports two LLM backends:
- Ollama (default): Run models locally using Ollama
- Watson X: Use IBM's Granite models hosted on Watson X AI
- Added
WatsonXClientclass inrag_system/utils/watsonx_client.pythat provides an Ollama-compatible interface for Watson X - Updated
factory.pyandmain.pyto support backend switching via environment variable - Added
ibm-watsonx-aiSDK dependency torequirements.txt - Configuration now supports both backends through environment variables
To use Watson X with Granite models, you need:
- IBM Cloud account with Watson X access
- Watson X API key
- Watson X project ID
- Go to IBM Cloud
- Navigate to Watson X AI service
- Create or select a project
- Get your API key from IBM Cloud IAM
- Copy your project ID from the Watson X project settings
Create a .env file or set these environment variables:
# Choose LLM backend (default: ollama)
LLM_BACKEND=watsonx
# Watson X Configuration
WATSONX_API_KEY=your_api_key_here
WATSONX_PROJECT_ID=your_project_id_here
WATSONX_URL=https://us-south.ml.cloud.ibm.com
# Model Configuration
WATSONX_GENERATION_MODEL=ibm/granite-13b-chat-v2
WATSONX_ENRICHMENT_MODEL=ibm/granite-8b-japaneseWatson X offers several Granite models:
ibm/granite-13b-chat-v2- General purpose chat modelibm/granite-13b-instruct-v2- Instruction-following modelibm/granite-20b-multilingual- Multilingual supportibm/granite-8b-japanese- Lightweight Japanese modelibm/granite-3b-code-instruct- Code generation model
For a full list of available models, visit the Watson X documentation.
- Install the Watson X SDK:
pip install ibm-watsonx-ai>=1.3.39Or install all dependencies:
pip install -r rag_system/requirements.txtOnce configured, simply set the environment variable and run as normal:
export LLM_BACKEND=watsonx
python -m rag_system.main apiOr in Python:
import os
os.environ['LLM_BACKEND'] = 'watsonx'
from rag_system.factory import get_agent
# Get agent with Watson X backend
agent = get_agent(mode="default")
# Use as normal
result = agent.run("What is artificial intelligence?")
print(result)You can easily switch between Ollama and Watson X:
# Use Ollama (local)
export LLM_BACKEND=ollama
python -m rag_system.main api
# Use Watson X (cloud)
export LLM_BACKEND=watsonx
python -m rag_system.main apiThe Watson X client supports all the key features used by LocalGPT:
- ✅ Text generation / completion
- ✅ Async generation
- ✅ Streaming responses
- ✅ Embeddings (if using Watson X embedding models)
- ✅ Custom generation parameters (temperature, max_tokens, top_p, top_k)
⚠️ Image/multimodal support (limited, depends on model availability)
The WatsonXClient provides the same interface as OllamaClient:
from rag_system.utils.watsonx_client import WatsonXClient
client = WatsonXClient(
api_key="your_api_key",
project_id="your_project_id"
)
# Generate completion
response = client.generate_completion(
model="ibm/granite-13b-chat-v2",
prompt="Explain quantum computing"
)
print(response['response'])
# Stream completion
for chunk in client.stream_completion(
model="ibm/granite-13b-chat-v2",
prompt="Write a story about AI"
):
print(chunk, end='', flush=True)-
Embedding Models: Watson X uses different embedding models than Ollama. Make sure to configure embedding models appropriately in
main.pyif needed. -
Multimodal Support: Image support varies by model availability in Watson X. Not all Granite models support multimodal inputs.
-
Streaming: Streaming support depends on the Watson X SDK version and may fall back to returning the full response at once.
-
Rate Limits: Watson X has API rate limits that may differ from local Ollama usage. Monitor your usage accordingly.
If you see authentication errors:
- Verify your API key is correct
- Check that your project ID matches an existing Watson X project
- Ensure your IBM Cloud account has Watson X access
If you get model not found errors:
- Verify the model ID is correct (e.g.,
ibm/granite-13b-chat-v2) - Check that the model is available in your Watson X instance
- Some models may require additional permissions
If you experience connection issues:
- Check your internet connection
- Verify the Watson X URL is correct for your region
- Check IBM Cloud status page for service outages
Unlike local Ollama, Watson X is a cloud service with usage-based pricing:
- Token-based pricing for generation
- Consider your query volume
- Monitor usage through IBM Cloud dashboard
To switch back to local Ollama:
unset LLM_BACKEND # or set LLM_BACKEND=ollama
python -m rag_system.main apiFor Watson X specific issues:
For LocalGPT issues:
If you find issues with the Watson X integration or want to add features:
- Create an issue describing the problem/feature
- Submit a pull request with your changes
- Ensure all tests pass
This integration follows the same license as LocalGPT (MIT License).