Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
248 changes: 248 additions & 0 deletions LSF-vLLM/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@
IBM LSF for vLLM Persistent Inference Service
==========================================

Overview
--------
This repository shows how to run a long-running vLLM inference service under IBM LSF,
validate it through a standard OpenAI-compatible API, access it from a Jupyter notebook,
and reuse the same service from a downstream batch job.

What this implementation demonstrates
-------------------------------------
- IBM LSF launching and managing a persistent inference runtime as a service job
- vLLM exposing an OpenAI-compatible endpoint
- endpoint discovery through a small registry file written by the service job
- interactive validation using curl and Jupyter
- downstream reuse through a separate IBM LSF batch job

Repository layout
-----------------
- scripts/start_vllm_lsf.sh
Starts the vLLM container, waits for readiness, writes the registry file, and keeps the
service attached to the IBM LSF job lifecycle.
- scripts/resolve_endpoint.py
Reads the registry file for a given IBM LSF job ID and prints the resolved base URL.
- scripts/batch_client.py
Reads a prompt corpus and sends requests to the registered vLLM service.
- notebook/LSF_vLLM_Client.ipynb
Jupyter notebook for interactive validation against the IBM LSF-managed runtime.
- corpus/prompts.txt
Sample prompt corpus for downstream batch validation.

Prerequisites
-------------
- IBM LSF installed and operational
- podman installed
- python3 installed
- curl installed
- network access from the execution host to pull the vLLM image and model
- a single-node IBM LSF setup is sufficient for this implementation

Note:
Replace "your-host" with the hostname or IP address of the system where the vLLM service is running.

The examples below assume you are running as the same user for all steps.

Step 1: Create the working directories
--------------------------------------

```bash
mkdir -p ~/lsf_vllm_poc/{logs,registry,cache,corpus,results,notebook}
```

```bash
cp corpus/prompts.txt ~/lsf_vllm_poc/corpus/prompts.txt
```

```bash
cp scripts/start_vllm_lsf.sh ~/lsf_vllm_poc/
cp scripts/resolve_endpoint.py ~/lsf_vllm_poc/
cp scripts/batch_client.py ~/lsf_vllm_poc/
```

```bash
chmod +x ~/lsf_vllm_poc/start_vllm_lsf.sh
chmod +x ~/lsf_vllm_poc/resolve_endpoint.py
chmod +x ~/lsf_vllm_poc/batch_client.py
```

Step 2: Review the service script defaults
------------------------------------------

```bash
MODEL=Qwen/Qwen3-0.6B PORT=8001 API_KEY=local-vllm-key
```

Step 3: Submit the persistent service job
-----------------------------------------

```bash
JOBID=$(
bsub -J vllm_service -q normal -n 1 -R 'rusage[mem=12GB]' -oo ~/lsf_vllm_poc/logs/vllm.%J.out -eo ~/lsf_vllm_poc/logs/vllm.%J.err ~/lsf_vllm_poc/start_vllm_lsf.sh | awk '{print $2}' | tr -d '<>'
)

echo "Submitted service JOBID=$JOBID"
```

Step 4: Monitor the service startup
-----------------------------------

```bash
bjobs
bjobs -l ${JOBID}
bpeek ${JOBID}
```

```bash
podman ps -a | grep vllm-job-${JOBID}
podman logs -f vllm-job-${JOBID}
```

Step 5: Wait for the registry file
----------------------------------

```bash
until [[ -f ~/lsf_vllm_poc/registry/${JOBID}.json ]]; do
sleep 2
done

cat ~/lsf_vllm_poc/registry/${JOBID}.json
```

Step 6: Resolve the endpoint
----------------------------

```bash
python3 ~/lsf_vllm_poc/resolve_endpoint.py ${JOBID}
```

```bash
ENDPOINT=$(python3 ~/lsf_vllm_poc/resolve_endpoint.py ${JOBID})
echo "${ENDPOINT}"
```

Step 7: Validate the service with curl
--------------------------------------

```bash
curl -sS "${ENDPOINT}/models" -H "Authorization: Bearer local-vllm-key"
```

```bash
curl -sS "${ENDPOINT}/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer local-vllm-key" -d '{
"model": "Qwen/Qwen3-0.6B",
"messages": [
{"role": "user", "content": "Explain the top 5 deserts in the world."}
],
"temperature": 0,
"max_tokens": 120
}'
```

Step 8: Validate the service from Jupyter
-----------------------------------------

```bash
python3 -m venv ~/lsf_vllm_poc/notebook/.venv
source ~/lsf_vllm_poc/notebook/.venv/bin/activate
pip install --upgrade pip
pip install notebook jupyterlab requests openai ipykernel
python -m ipykernel install --user --name lsf-vllm --display-name "Python (lsf-vllm)"
```

```bash
jupyter notebook --no-browser --ip=0.0.0.0 --port 8888 --allow-root
```

```bash
ssh -L 8888:127.0.0.1:8888 user@your-host
```

```
http://127.0.0.1:8888
```

```
http://127.0.0.1:8001/v1
```

Step 9: Validate downstream batch reuse
---------------------------------------

```bash
python3 ~/lsf_vllm_poc/batch_client.py ${JOBID} ~/lsf_vllm_poc/corpus/prompts.txt
```

```bash
cat ~/lsf_vllm_poc/results/batch_${JOBID}.jsonl
```

```bash
BATCH_JOBID=$(
bsub -J vllm_batch -q normal -n 1 -R 'rusage[mem=1GB]' -oo ~/lsf_vllm_poc/logs/batch.%J.out -eo ~/lsf_vllm_poc/logs/batch.%J.err "python3 ~/lsf_vllm_poc/batch_client.py ${JOBID} ~/lsf_vllm_poc/corpus/prompts.txt" | awk '{print $2}' | tr -d '<>'
)

echo "Submitted batch JOBID=$BATCH_JOBID"
```

```bash
bjobs
bpeek ${BATCH_JOBID}
cat ~/lsf_vllm_poc/results/batch_${JOBID}.jsonl
```

Cleanup
-------

```bash
bkill ${BATCH_JOBID}
bkill ${JOBID}
```

Troubleshooting
---------------

```bash
bpeek ${JOBID}
podman ps -a
podman logs vllm-job-${JOBID}
```

```bash
curl -sS http://127.0.0.1:8001/v1/models -H "Authorization: Bearer local-vllm-key"
```

Success criteria
----------------
1. the service job is RUN under IBM LSF
2. the registry file exists
3. /v1/models works
4. /v1/chat/completions works
5. the Jupyter notebook can call the service successfully
6. the batch client works locally
7. the IBM LSF batch job completes successfully
8. the result file contains responses for the full prompt corpus

Step 10: Validate using Open WebUI (Linux)
------------------------------------------

```bash
podman run -d -p 3000:8080 -e OPENAI_API_BASE_URL=http://your-host:8001/v1 -e OPENAI_API_KEY=local-vllm-key -e WEBUI_SECRET_KEY=my-openwebui-secret -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main
```

```bash
podman ps
podman logs -f open-webui
```

http://localhost:3000

- Settings → Connections → OpenAI
- Base URL: http://your-host:8001/v1
- API Key: local-vllm-key

Model:
Qwen/Qwen3-0.6B

Test:
Say one short line about LSF-managed model serving.
95 changes: 95 additions & 0 deletions LSF-vLLM/scripts/LSF_vLLM_Client.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# IBM LSF + vLLM Notebook Validation\\n",
"Run cells top to bottom. Update values if your endpoint differs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\\n",
"base_url = 'http://127.0.0.1:8001/v1'\\n",
"api_key = 'local-vllm-key'\\n",
"model = 'Qwen/Qwen3-0.6B'\\n",
"print(base_url)\\n",
"print(model)\\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"resp = requests.get(\\n",
" f'{base_url}/models',\\n",
" headers={'Authorization': f'Bearer {api_key}'},\\n",
" timeout=60,\\n",
")\\n",
"print(resp.status_code)\\n",
"print(resp.json())\\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"payload = {\\n",
" 'model': model,\\n",
" 'messages': [{'role': 'user', 'content': 'Explain the top 5 deserts in the world.'}],\\n",
" 'temperature': 0,\\n",
" 'max_tokens': 120,\\n",
" 'chat_template_kwargs': {'enable_thinking': False},\\n",
"}\\n",
"resp = requests.post(\\n",
" f'{base_url}/chat/completions',\\n",
" headers={'Authorization': f'Bearer {api_key}', 'Content-Type': 'application/json'},\\n",
" json=payload,\\n",
" timeout=120,\\n",
")\\n",
"print(resp.status_code)\\n",
"data = resp.json()\\n",
"print(data['choices'][0]['message']['content'])\\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from openai import OpenAI\\n",
"client = OpenAI(base_url=base_url, api_key=api_key)\\n",
"resp = client.chat.completions.create(\\n",
" model=model,\\n",
" messages=[{'role': 'user', 'content': 'Explain the top 5 deserts in the world.'}],\\n",
" temperature=0,\\n",
" max_tokens=120,\\n",
")\\n",
"print(resp.choices[0].message.content)\\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (lsf-vllm)",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
53 changes: 53 additions & 0 deletions LSF-vLLM/scripts/batch_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#!/usr/bin/env python3
import json
import os
import sys
import urllib.request

if len(sys.argv) not in (2, 3):
print("Usage: batch_client.py <JOBID> [PROMPTS_FILE]", file=sys.stderr)
sys.exit(1)

jobid = sys.argv[1]
base = os.path.expanduser("~/lsf_vllm_poc")
prompts_file = sys.argv[2] if len(sys.argv) == 3 else os.path.join(base, "corpus", "prompts.txt")

with open(os.path.join(base, "registry", f"{jobid}.json")) as f:
reg = json.load(f)

url = f"http://{reg['host']}:{reg['port']}/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {reg['api_key']}",
}

outp = os.path.join(base, "results", f"batch_{jobid}.jsonl")

with open(prompts_file) as fin, open(outp, "w") as fout:
for line in fin:
prompt = line.strip()
if not prompt:
continue

payload = {
"model": reg["model"],
"messages": [{"role": "user", "content": prompt}],
"temperature": 0,
"max_tokens": 96,
"chat_template_kwargs": {"enable_thinking": False},
}

req = urllib.request.Request(
url,
data=json.dumps(payload).encode("utf-8"),
headers=headers,
method="POST",
)

with urllib.request.urlopen(req, timeout=300) as resp:
data = json.load(resp)

text = data["choices"][0]["message"]["content"]
fout.write(json.dumps({"prompt": prompt, "response": text}) + "\n")

print(outp)
Loading