diff --git a/LSF-vLLM/README.md b/LSF-vLLM/README.md new file mode 100644 index 0000000..fc2e3ae --- /dev/null +++ b/LSF-vLLM/README.md @@ -0,0 +1,248 @@ +IBM LSF for vLLM Persistent Inference Service +========================================== + +Overview +-------- +This repository shows how to run a long-running vLLM inference service under IBM LSF, +validate it through a standard OpenAI-compatible API, access it from a Jupyter notebook, +and reuse the same service from a downstream batch job. + +What this implementation demonstrates +------------------------------------- +- IBM LSF launching and managing a persistent inference runtime as a service job +- vLLM exposing an OpenAI-compatible endpoint +- endpoint discovery through a small registry file written by the service job +- interactive validation using curl and Jupyter +- downstream reuse through a separate IBM LSF batch job + +Repository layout +----------------- +- scripts/start_vllm_lsf.sh + Starts the vLLM container, waits for readiness, writes the registry file, and keeps the + service attached to the IBM LSF job lifecycle. +- scripts/resolve_endpoint.py + Reads the registry file for a given IBM LSF job ID and prints the resolved base URL. +- scripts/batch_client.py + Reads a prompt corpus and sends requests to the registered vLLM service. +- notebook/LSF_vLLM_Client.ipynb + Jupyter notebook for interactive validation against the IBM LSF-managed runtime. +- corpus/prompts.txt + Sample prompt corpus for downstream batch validation. + +Prerequisites +------------- +- IBM LSF installed and operational +- podman installed +- python3 installed +- curl installed +- network access from the execution host to pull the vLLM image and model +- a single-node IBM LSF setup is sufficient for this implementation + +Note: +Replace "your-host" with the hostname or IP address of the system where the vLLM service is running. + +The examples below assume you are running as the same user for all steps. + +Step 1: Create the working directories +-------------------------------------- + +```bash +mkdir -p ~/lsf_vllm_poc/{logs,registry,cache,corpus,results,notebook} +``` + +```bash +cp corpus/prompts.txt ~/lsf_vllm_poc/corpus/prompts.txt +``` + +```bash +cp scripts/start_vllm_lsf.sh ~/lsf_vllm_poc/ +cp scripts/resolve_endpoint.py ~/lsf_vllm_poc/ +cp scripts/batch_client.py ~/lsf_vllm_poc/ +``` + +```bash +chmod +x ~/lsf_vllm_poc/start_vllm_lsf.sh +chmod +x ~/lsf_vllm_poc/resolve_endpoint.py +chmod +x ~/lsf_vllm_poc/batch_client.py +``` + +Step 2: Review the service script defaults +------------------------------------------ + +```bash +MODEL=Qwen/Qwen3-0.6B PORT=8001 API_KEY=local-vllm-key +``` + +Step 3: Submit the persistent service job +----------------------------------------- + +```bash +JOBID=$( + bsub -J vllm_service -q normal -n 1 -R 'rusage[mem=12GB]' -oo ~/lsf_vllm_poc/logs/vllm.%J.out -eo ~/lsf_vllm_poc/logs/vllm.%J.err ~/lsf_vllm_poc/start_vllm_lsf.sh | awk '{print $2}' | tr -d '<>' +) + +echo "Submitted service JOBID=$JOBID" +``` + +Step 4: Monitor the service startup +----------------------------------- + +```bash +bjobs +bjobs -l ${JOBID} +bpeek ${JOBID} +``` + +```bash +podman ps -a | grep vllm-job-${JOBID} +podman logs -f vllm-job-${JOBID} +``` + +Step 5: Wait for the registry file +---------------------------------- + +```bash +until [[ -f ~/lsf_vllm_poc/registry/${JOBID}.json ]]; do + sleep 2 +done + +cat ~/lsf_vllm_poc/registry/${JOBID}.json +``` + +Step 6: Resolve the endpoint +---------------------------- + +```bash +python3 ~/lsf_vllm_poc/resolve_endpoint.py ${JOBID} +``` + +```bash +ENDPOINT=$(python3 ~/lsf_vllm_poc/resolve_endpoint.py ${JOBID}) +echo "${ENDPOINT}" +``` + +Step 7: Validate the service with curl +-------------------------------------- + +```bash +curl -sS "${ENDPOINT}/models" -H "Authorization: Bearer local-vllm-key" +``` + +```bash +curl -sS "${ENDPOINT}/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer local-vllm-key" -d '{ + "model": "Qwen/Qwen3-0.6B", + "messages": [ + {"role": "user", "content": "Explain the top 5 deserts in the world."} + ], + "temperature": 0, + "max_tokens": 120 + }' +``` + +Step 8: Validate the service from Jupyter +----------------------------------------- + +```bash +python3 -m venv ~/lsf_vllm_poc/notebook/.venv +source ~/lsf_vllm_poc/notebook/.venv/bin/activate +pip install --upgrade pip +pip install notebook jupyterlab requests openai ipykernel +python -m ipykernel install --user --name lsf-vllm --display-name "Python (lsf-vllm)" +``` + +```bash +jupyter notebook --no-browser --ip=0.0.0.0 --port 8888 --allow-root +``` + +```bash +ssh -L 8888:127.0.0.1:8888 user@your-host +``` + +``` +http://127.0.0.1:8888 +``` + +``` +http://127.0.0.1:8001/v1 +``` + +Step 9: Validate downstream batch reuse +--------------------------------------- + +```bash +python3 ~/lsf_vllm_poc/batch_client.py ${JOBID} ~/lsf_vllm_poc/corpus/prompts.txt +``` + +```bash +cat ~/lsf_vllm_poc/results/batch_${JOBID}.jsonl +``` + +```bash +BATCH_JOBID=$( + bsub -J vllm_batch -q normal -n 1 -R 'rusage[mem=1GB]' -oo ~/lsf_vllm_poc/logs/batch.%J.out -eo ~/lsf_vllm_poc/logs/batch.%J.err "python3 ~/lsf_vllm_poc/batch_client.py ${JOBID} ~/lsf_vllm_poc/corpus/prompts.txt" | awk '{print $2}' | tr -d '<>' +) + +echo "Submitted batch JOBID=$BATCH_JOBID" +``` + +```bash +bjobs +bpeek ${BATCH_JOBID} +cat ~/lsf_vllm_poc/results/batch_${JOBID}.jsonl +``` + +Cleanup +------- + +```bash +bkill ${BATCH_JOBID} +bkill ${JOBID} +``` + +Troubleshooting +--------------- + +```bash +bpeek ${JOBID} +podman ps -a +podman logs vllm-job-${JOBID} +``` + +```bash +curl -sS http://127.0.0.1:8001/v1/models -H "Authorization: Bearer local-vllm-key" +``` + +Success criteria +---------------- +1. the service job is RUN under IBM LSF +2. the registry file exists +3. /v1/models works +4. /v1/chat/completions works +5. the Jupyter notebook can call the service successfully +6. the batch client works locally +7. the IBM LSF batch job completes successfully +8. the result file contains responses for the full prompt corpus + +Step 10: Validate using Open WebUI (Linux) +------------------------------------------ + +```bash +podman run -d -p 3000:8080 -e OPENAI_API_BASE_URL=http://your-host:8001/v1 -e OPENAI_API_KEY=local-vllm-key -e WEBUI_SECRET_KEY=my-openwebui-secret -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main +``` + +```bash +podman ps +podman logs -f open-webui +``` + +http://localhost:3000 + +- Settings → Connections → OpenAI +- Base URL: http://your-host:8001/v1 +- API Key: local-vllm-key + +Model: +Qwen/Qwen3-0.6B + +Test: +Say one short line about LSF-managed model serving. diff --git a/LSF-vLLM/scripts/LSF_vLLM_Client.ipynb b/LSF-vLLM/scripts/LSF_vLLM_Client.ipynb new file mode 100644 index 0000000..21decf4 --- /dev/null +++ b/LSF-vLLM/scripts/LSF_vLLM_Client.ipynb @@ -0,0 +1,95 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# IBM LSF + vLLM Notebook Validation\\n", + "Run cells top to bottom. Update values if your endpoint differs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\\n", + "base_url = 'http://127.0.0.1:8001/v1'\\n", + "api_key = 'local-vllm-key'\\n", + "model = 'Qwen/Qwen3-0.6B'\\n", + "print(base_url)\\n", + "print(model)\\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "resp = requests.get(\\n", + " f'{base_url}/models',\\n", + " headers={'Authorization': f'Bearer {api_key}'},\\n", + " timeout=60,\\n", + ")\\n", + "print(resp.status_code)\\n", + "print(resp.json())\\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "payload = {\\n", + " 'model': model,\\n", + " 'messages': [{'role': 'user', 'content': 'Explain the top 5 deserts in the world.'}],\\n", + " 'temperature': 0,\\n", + " 'max_tokens': 120,\\n", + " 'chat_template_kwargs': {'enable_thinking': False},\\n", + "}\\n", + "resp = requests.post(\\n", + " f'{base_url}/chat/completions',\\n", + " headers={'Authorization': f'Bearer {api_key}', 'Content-Type': 'application/json'},\\n", + " json=payload,\\n", + " timeout=120,\\n", + ")\\n", + "print(resp.status_code)\\n", + "data = resp.json()\\n", + "print(data['choices'][0]['message']['content'])\\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from openai import OpenAI\\n", + "client = OpenAI(base_url=base_url, api_key=api_key)\\n", + "resp = client.chat.completions.create(\\n", + " model=model,\\n", + " messages=[{'role': 'user', 'content': 'Explain the top 5 deserts in the world.'}],\\n", + " temperature=0,\\n", + " max_tokens=120,\\n", + ")\\n", + "print(resp.choices[0].message.content)\\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python (lsf-vllm)", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/LSF-vLLM/scripts/batch_client.py b/LSF-vLLM/scripts/batch_client.py new file mode 100644 index 0000000..c386829 --- /dev/null +++ b/LSF-vLLM/scripts/batch_client.py @@ -0,0 +1,53 @@ +#!/usr/bin/env python3 +import json +import os +import sys +import urllib.request + +if len(sys.argv) not in (2, 3): + print("Usage: batch_client.py [PROMPTS_FILE]", file=sys.stderr) + sys.exit(1) + +jobid = sys.argv[1] +base = os.path.expanduser("~/lsf_vllm_poc") +prompts_file = sys.argv[2] if len(sys.argv) == 3 else os.path.join(base, "corpus", "prompts.txt") + +with open(os.path.join(base, "registry", f"{jobid}.json")) as f: + reg = json.load(f) + +url = f"http://{reg['host']}:{reg['port']}/v1/chat/completions" +headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {reg['api_key']}", +} + +outp = os.path.join(base, "results", f"batch_{jobid}.jsonl") + +with open(prompts_file) as fin, open(outp, "w") as fout: + for line in fin: + prompt = line.strip() + if not prompt: + continue + + payload = { + "model": reg["model"], + "messages": [{"role": "user", "content": prompt}], + "temperature": 0, + "max_tokens": 96, + "chat_template_kwargs": {"enable_thinking": False}, + } + + req = urllib.request.Request( + url, + data=json.dumps(payload).encode("utf-8"), + headers=headers, + method="POST", + ) + + with urllib.request.urlopen(req, timeout=300) as resp: + data = json.load(resp) + + text = data["choices"][0]["message"]["content"] + fout.write(json.dumps({"prompt": prompt, "response": text}) + "\n") + +print(outp) diff --git a/LSF-vLLM/scripts/prompts.txt b/LSF-vLLM/scripts/prompts.txt new file mode 100644 index 0000000..1ce667a --- /dev/null +++ b/LSF-vLLM/scripts/prompts.txt @@ -0,0 +1,3 @@ +Why do rivers flow towards the sea? +What is the difference between a lion and a tiger? +How do fish survive in water without breathing air like humans? diff --git a/LSF-vLLM/scripts/resolve_endpoint.py b/LSF-vLLM/scripts/resolve_endpoint.py new file mode 100644 index 0000000..cfb0f2e --- /dev/null +++ b/LSF-vLLM/scripts/resolve_endpoint.py @@ -0,0 +1,16 @@ +#!/usr/bin/env python3 +import json +import os +import sys + +if len(sys.argv) != 2: + print("Usage: resolve_endpoint.py ", file=sys.stderr) + sys.exit(1) + +jobid = sys.argv[1] +reg_path = os.path.expanduser(f"~/lsf_vllm_poc/registry/{jobid}.json") + +with open(reg_path) as f: + reg = json.load(f) + +print(f"http://{reg['host']}:{reg['port']}/v1") diff --git a/LSF-vLLM/scripts/start_vllm_lsf.sh b/LSF-vLLM/scripts/start_vllm_lsf.sh new file mode 100644 index 0000000..d77b5b0 --- /dev/null +++ b/LSF-vLLM/scripts/start_vllm_lsf.sh @@ -0,0 +1,99 @@ +#!/usr/bin/env bash +set -Eeuo pipefail + +MODEL="${MODEL:-Qwen/Qwen3-0.6B}" +PORT="${PORT:-8001}" +API_KEY="${API_KEY:-local-vllm-key}" +IMAGE="${IMAGE:-docker.io/vllm/vllm-openai-cpu:latest-x86_64}" + +BASE="${HOME}/lsf_vllm_poc" +CACHE_DIR="${BASE}/cache" +REG_DIR="${BASE}/registry" +LOG_DIR="${BASE}/logs" +CONTAINER_NAME="vllm-job-${LSB_JOBID:-manual}" + +mkdir -p "${CACHE_DIR}" "${REG_DIR}" "${LOG_DIR}" + +echo "Starting vLLM service wrapper..." +echo "MODEL=${MODEL}" +echo "PORT=${PORT}" +echo "JOBID=${LSB_JOBID:-manual}" +echo "CONTAINER_NAME=${CONTAINER_NAME}" + +cleanup() { + echo "Cleaning up container ${CONTAINER_NAME}" + podman stop "${CONTAINER_NAME}" >/dev/null 2>&1 || true + podman rm -f "${CONTAINER_NAME}" >/dev/null 2>&1 || true +} +trap cleanup EXIT + +podman rm -f "${CONTAINER_NAME}" >/dev/null 2>&1 || true + +HF_ENV=() +if [[ -n "${HF_TOKEN:-}" ]]; then + HF_ENV=(-e "HF_TOKEN=${HF_TOKEN}") +fi + +podman run -d \ + --name "${CONTAINER_NAME}" \ + -p "${PORT}:8000" \ + -v "${CACHE_DIR}:/root/.cache/huggingface:Z" \ + -e HF_HOME=/root/.cache/huggingface \ + -e VLLM_CPU_KVCACHE_SPACE=6 \ + -e VLLM_CPU_NUM_OF_RESERVED_CPU=1 \ + --security-opt seccomp=unconfined \ + --cap-add SYS_NICE \ + --shm-size=4g \ + "${HF_ENV[@]}" \ + "${IMAGE}" \ + "${MODEL}" \ + --dtype=bfloat16 \ + --max-model-len 32768 \ + --api-key "${API_KEY}" >/dev/null + +HOST_FQDN="$(hostname -f)" +export MODEL PORT API_KEY HOST_FQDN + +echo "Waiting for service readiness..." +READY=0 +for i in $(seq 1 240); do + if curl -fsS "http://127.0.0.1:${PORT}/v1/models" \ + -H "Authorization: Bearer ${API_KEY}" >/dev/null 2>&1; then + READY=1 + break + fi + if (( i % 10 == 0 )); then + echo "Still waiting... iteration=${i}" + fi + sleep 5 +done + +if [[ "${READY}" != "1" ]]; then + echo "vLLM did not become ready in time" + podman logs "${CONTAINER_NAME}" || true + exit 1 +fi + +echo "Service is ready. Writing registry file..." + +python3 - <<'PY' +import json, os +jobid = os.environ.get("LSB_JOBID", "manual") +doc = { + "jobid": jobid, + "service_name": "qwen-chat", + "model": os.environ["MODEL"], + "host": os.environ["HOST_FQDN"], + "port": int(os.environ["PORT"]), + "api_key": os.environ["API_KEY"], + "status": "ready", +} +path = os.path.expanduser(f"~/lsf_vllm_poc/registry/{jobid}.json") +with open(path, "w") as f: + json.dump(doc, f, indent=2) +print(f"registry written: {path}") +print(f"endpoint: http://{doc['host']}:{doc['port']}/v1") +PY + +echo "Streaming container logs..." +podman logs -f "${CONTAINER_NAME}"