Skip to content

Commit fe62db6

Browse files
committed
feat(standalone): self-contained Docker image + ghcr.io publishing
Phase 2 of the standalone-container plan. Anyone can now pull and run the API without setting up a local development environment with sibling Python repos. Three pieces: 1. Dockerfile.standalone: clones sibling Python repos (cobrakbase, ModelSEEDpy, KBUtilLib, cb_annotation_ontology_api) and data repos (ModelSEEDDatabase dev branch, ModelSEEDTemplates) from GitHub at build time. Result: ~1.5-2 GB self-contained image. Defaults to local-filesystem storage so no PATRIC account is needed. ANL-only endpoints (workspace, RAST jobs, RAST genome) cleanly 503 unless their env vars are configured. Existing production Dockerfile is untouched; production deploy mechanic on poplar continues to work as-is. 2. .github/workflows/publish-image.yml: builds + pushes to ghcr.io/modelseed/modelseed-api on push to main + on git tags. Tag scheme: :latest, :main-<sha>, :vX.Y.Z, :X.Y. Auth via the built-in GITHUB_TOKEN; no separate registry account needed. Includes a smoke job that pulls the published image, hits /api/health + /api/biochem/stats + /api/biochem/search?query=glucose, and asserts that /api/rast/jobs returns 503 (standalone defaults correctly leave RAST DB unconfigured). 3. docs/STANDALONE.md: full walkthrough for researchers running the image locally: quick-start, persisting data, what endpoints work without auth, end-to-end FASTA-to-model recipe, MCP server config for Claude Desktop, opt-in env vars for the production-only endpoints. README also gets a short "Run standalone" section near the top with the one-liner. .dockerignore extended to cover docs/, .github/, .claude/, .env*, IDE artifacts, OS noise. Next: once the first build lands and we verify the image works, the bulk reconstruction endpoint (Chris's PRD) is purely additive on top of this image.
1 parent b801ec1 commit fe62db6

5 files changed

Lines changed: 487 additions & 2 deletions

File tree

.dockerignore

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,56 @@
1+
# Keep the docker build context lean: only files the Dockerfiles
2+
# actually COPY need to be sent to the daemon. Tests, docs, git
3+
# history, and dev artifacts can be excluded.
4+
5+
# Version control + git metadata
16
.git
7+
.gitignore
8+
.gitattributes
9+
10+
# Python build / install artifacts
211
.venv
312
venv
413
__pycache__
514
*.py[cod]
615
*.egg-info
716
dist
817
build
18+
19+
# Test artifacts (tests run in CI, not inside the image)
20+
tests/
921
.pytest_cache
1022
.coverage
1123
htmlcov
12-
.DS_Store
24+
25+
# Docs (served by FastAPI from code, not from the markdown files)
26+
docs/
27+
28+
# CI + tooling config
29+
.github
1330
.claude
31+
noxfile.py
32+
33+
# Editor / IDE
34+
.vscode
35+
.idea
36+
*.swp
37+
*.swo
38+
*~
39+
40+
# Local secrets / config (never bake into the image)
1441
.env
15-
tests
42+
.env.*
43+
!.env.example
44+
45+
# Local Docker iteration
46+
docker-compose.yml
47+
docker-compose.*.yml
48+
49+
# OS noise
50+
.DS_Store
51+
Thumbs.db
52+
53+
# Logs and scratch
54+
*.log
55+
scratch
56+
tmp
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Build the standalone modelseed-api Docker image and publish to
2+
# ghcr.io/modelseed/modelseed-api.
3+
#
4+
# Tagging:
5+
# - push to main: ghcr.io/.../modelseed-api:latest + :main-<sha>
6+
# - git tag v*: ghcr.io/.../modelseed-api:vX.Y.Z + :latest
7+
#
8+
# Image is built from Dockerfile.standalone (clones sibling repos from
9+
# GitHub during build, includes bundled data). Production deployments on
10+
# poplar continue to use the regular Dockerfile + docker-compose; this
11+
# workflow is purely for the public/standalone artifact.
12+
#
13+
# Requires: nothing special. GITHUB_TOKEN is auto-provisioned and has
14+
# packages:write on the org-scoped registry (ghcr.io/modelseed/...).
15+
16+
name: Publish standalone image
17+
18+
on:
19+
push:
20+
branches: [main]
21+
paths:
22+
# Only rebuild when something that actually affects the image changes.
23+
- 'src/**'
24+
- 'data/**'
25+
- 'pyproject.toml'
26+
- 'Dockerfile.standalone'
27+
- '.dockerignore'
28+
- '.github/workflows/publish-image.yml'
29+
push:
30+
tags:
31+
- 'v*'
32+
workflow_dispatch:
33+
inputs:
34+
smoke_test:
35+
description: 'Run smoke test after build (slower, but verifies image actually runs)'
36+
type: boolean
37+
default: true
38+
39+
permissions:
40+
contents: read
41+
packages: write
42+
43+
env:
44+
REGISTRY: ghcr.io
45+
IMAGE_NAME: ${{ github.repository }}
46+
47+
jobs:
48+
build-and-push:
49+
runs-on: ubuntu-latest
50+
timeout-minutes: 60
51+
steps:
52+
- name: Checkout
53+
uses: actions/checkout@v4
54+
55+
- name: Set up Docker Buildx
56+
uses: docker/setup-buildx-action@v3
57+
58+
- name: Log in to ghcr.io
59+
uses: docker/login-action@v3
60+
with:
61+
registry: ${{ env.REGISTRY }}
62+
username: ${{ github.actor }}
63+
password: ${{ secrets.GITHUB_TOKEN }}
64+
65+
- name: Compute image tags + labels
66+
id: meta
67+
uses: docker/metadata-action@v5
68+
with:
69+
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
70+
tags: |
71+
type=ref,event=branch
72+
type=semver,pattern={{version}}
73+
type=semver,pattern={{major}}.{{minor}}
74+
type=sha,prefix=main-,enable=${{ github.ref == format('refs/heads/{0}', 'main') }}
75+
type=raw,value=latest,enable=${{ github.ref == format('refs/heads/{0}', 'main') || startsWith(github.ref, 'refs/tags/v') }}
76+
77+
- name: Build + push
78+
id: build
79+
uses: docker/build-push-action@v5
80+
with:
81+
context: .
82+
file: ./Dockerfile.standalone
83+
push: true
84+
tags: ${{ steps.meta.outputs.tags }}
85+
labels: ${{ steps.meta.outputs.labels }}
86+
# GitHub Actions cache so subsequent builds reuse the heavy layers
87+
# (system deps + cloned sibling repos) when those didn't change.
88+
cache-from: type=gha
89+
cache-to: type=gha,mode=max
90+
91+
smoke:
92+
name: Smoke-test the published image
93+
needs: build-and-push
94+
runs-on: ubuntu-latest
95+
timeout-minutes: 10
96+
if: ${{ github.event_name != 'workflow_dispatch' || inputs.smoke_test }}
97+
steps:
98+
- name: Pull image
99+
run: docker pull ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
100+
101+
- name: Start container
102+
run: |
103+
docker run -d --name smoke -p 8000:8000 \
104+
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
105+
# Give uvicorn a moment to bind + biochem init to finish.
106+
# biochem init reads ~45k compounds + 56k reactions; takes ~10s.
107+
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do
108+
if curl -sf http://localhost:8000/api/health > /dev/null; then
109+
echo "Healthy after ${i}s"
110+
break
111+
fi
112+
sleep 1
113+
done
114+
115+
- name: Hit /api/health
116+
run: |
117+
resp=$(curl -sf http://localhost:8000/api/health)
118+
echo "Response: $resp"
119+
echo "$resp" | grep -q '"status":"ok"'
120+
121+
- name: Hit /api/biochem/stats (no auth, exercises bundled data)
122+
run: |
123+
resp=$(curl -sf http://localhost:8000/api/biochem/stats)
124+
echo "Response: $resp"
125+
echo "$resp" | grep -q 'total_compounds'
126+
127+
- name: Hit /api/biochem/search (real query against bundled DB)
128+
run: |
129+
resp=$(curl -sf 'http://localhost:8000/api/biochem/search?query=glucose&type=compounds&limit=5')
130+
echo "Response: $resp" | head -c 500
131+
echo "$resp" | grep -q 'cpd00027'
132+
133+
- name: Confirm ANL-only endpoints return 503 (standalone-friendly defaults)
134+
run: |
135+
# No PATRIC token + no RAST DB configured = these should 503 cleanly,
136+
# not crash or return 500.
137+
code=$(curl -s -o /dev/null -w '%{http_code}' \
138+
-H 'Authorization: bogus-token' \
139+
http://localhost:8000/api/rast/jobs)
140+
if [ "$code" != "503" ]; then
141+
echo "FAIL: /api/rast/jobs returned $code, expected 503"
142+
exit 1
143+
fi
144+
echo "OK: /api/rast/jobs returns 503 as expected (no MODELSEED_RAST_DB_HOST)"
145+
146+
- name: Show container logs (always, for diagnostics)
147+
if: always()
148+
run: docker logs smoke | tail -100
149+
150+
- name: Stop container
151+
if: always()
152+
run: docker rm -f smoke

Dockerfile.standalone

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# ModelSEED API: self-contained image for standalone use
2+
#
3+
# Unlike the production Dockerfile (which COPYs sibling repos from the
4+
# build host), this image clones all dependency repos from GitHub during
5+
# the build. Result: a single self-contained image that anyone can pull
6+
# from ghcr.io and run without setting up a local development environment.
7+
#
8+
# Build (CI): GitHub Actions builds + pushes to ghcr.io/modelseed/modelseed-api
9+
# Build (local, for testing):
10+
# docker build -f Dockerfile.standalone -t modelseed-api:standalone .
11+
# Run:
12+
# docker run -p 8000:8000 ghcr.io/modelseed/modelseed-api:latest
13+
# # then hit http://localhost:8000/demo/
14+
#
15+
# The image includes ModelSEEDDatabase and ModelSEEDTemplates baked in;
16+
# total size ~1.5-2 GB. Larger than typical web-app images but eliminates
17+
# the need for separate data downloads.
18+
#
19+
# Defaults to local-storage mode (no PATRIC account needed). All ANL-
20+
# specific endpoints (workspace, RAST jobs, RAST genome) cleanly return
21+
# 503 unless their env vars are configured. See docs/STANDALONE.md.
22+
23+
FROM python:3.11-slim
24+
25+
# System deps: GLPK for cobra's linear solver, git for cloning, gcc/g++
26+
# for Python wheel builds that need compilation.
27+
RUN apt-get update && apt-get install -y --no-install-recommends \
28+
glpk-utils \
29+
libglpk-dev \
30+
libexpat1 \
31+
gcc \
32+
g++ \
33+
git \
34+
&& rm -rf /var/lib/apt/lists/*
35+
36+
WORKDIR /deps
37+
38+
# Clone dependency repos. Using `--depth 1` keeps the image smaller by
39+
# skipping git history. Pin to specific commits later if reproducible
40+
# builds become important; HEAD of each branch is fine for now.
41+
RUN git clone --depth 1 --branch master https://github.com/Fxe/cobrakbase.git && \
42+
git clone --depth 1 --branch main https://github.com/cshenry/ModelSEEDpy.git && \
43+
git clone --depth 1 --branch main https://github.com/cshenry/KBUtilLib.git && \
44+
git clone --depth 1 --branch main https://github.com/kbaseapps/cb_annotation_ontology_api.git && \
45+
git clone --depth 1 --branch dev https://github.com/ModelSEED/ModelSEEDDatabase.git && \
46+
git clone --depth 1 --branch main https://github.com/ModelSEED/ModelSEEDTemplates.git
47+
48+
# Install dependency packages in the order they expect.
49+
# cobrakbase first (no deps on others), then ModelSEEDpy, then KBUtilLib.
50+
RUN pip install --no-cache-dir -e /deps/cobrakbase && \
51+
pip install --no-cache-dir -e /deps/ModelSEEDpy && \
52+
pip install --no-cache-dir -e /deps/KBUtilLib
53+
54+
WORKDIR /app
55+
56+
# Copy modelseed-api source from build context. With CI, the context is
57+
# this repo's checkout; locally, run from the repo root.
58+
COPY src/ /app/src/
59+
COPY data/ /app/data/
60+
COPY pyproject.toml /app/
61+
62+
# Install modelseed-api with the modeling+celery extras.
63+
RUN pip install --no-cache-dir -e ".[modeling,celery]"
64+
65+
# numpy/scikit-learn occasionally end up at mismatched ABI versions
66+
# after the editable installs; force-reinstall to a coherent pair.
67+
# Then pre-download the ~25MB genome classifier files so the first
68+
# model build is fast.
69+
RUN pip install --no-cache-dir --force-reinstall numpy scikit-learn && \
70+
python -c "from modelseedpy.helpers import get_classifier; get_classifier('knn_ACNP_RAST_filter_01_17_2023')"
71+
72+
# Default configuration.
73+
ENV MODELSEED_MODELSEED_DB_PATH=/deps/ModelSEEDDatabase \
74+
MODELSEED_TEMPLATES_PATH=/deps/ModelSEEDTemplates/templates/v7.0 \
75+
MODELSEED_CB_ANNOTATION_ONTOLOGY_API_PATH=/deps/cb_annotation_ontology_api \
76+
MODELSEED_JOB_STORE_DIR=/tmp/modelseed-jobs \
77+
MODELSEED_HOST=0.0.0.0 \
78+
MODELSEED_PORT=8000 \
79+
MODELSEED_STORAGE_BACKEND=local \
80+
MODELSEED_LOCAL_DATA_DIR=/data/modelseed
81+
# Local-storage by default: no PATRIC account needed, models persist
82+
# under MODELSEED_LOCAL_DATA_DIR. Users who want PATRIC-workspace mode
83+
# can override MODELSEED_STORAGE_BACKEND=workspace at run time.
84+
85+
# WORKAROUND: cobrakbase.KBaseAPI() reads a token from ~/.kbase/token
86+
# even when not connecting to KBase. Required by MSReconstructionUtils
87+
# init. Dummy value is fine.
88+
ENV KB_AUTH_TOKEN=unused
89+
RUN mkdir -p /root/.kbase && echo "unused" > /root/.kbase/token
90+
91+
# Make sure the default local-data dir exists so first-run writes work
92+
# without the user pre-creating + bind-mounting it.
93+
RUN mkdir -p /data/modelseed && chmod 777 /data/modelseed
94+
VOLUME /data/modelseed
95+
96+
EXPOSE 8000
97+
98+
WORKDIR /app/src
99+
CMD ["python", "-m", "uvicorn", "modelseed_api.main:app", "--host", "0.0.0.0", "--port", "8000"]

README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,32 @@ The production API is served at **https://modelseed.org/PMS/**.
2222
For deployment, restart procedures, on-call runbook, and infrastructure details, see the private operations repo: [`ModelSEED/modelseed-api-ops`](https://github.com/ModelSEED/modelseed-api-ops). Access is invite-only: ask Chris Henry or Jose Faria.
2323

2424

25+
## Run standalone (your own machine, no ANL setup)
26+
27+
The repo ships a self-contained Docker image on GitHub Container Registry. Bundles all dependencies + biochemistry data; no PATRIC account or RAST account needed for the local-mode workflow.
28+
29+
```bash
30+
docker run -p 8000:8000 ghcr.io/modelseed/modelseed-api:latest
31+
```
32+
33+
Then open `http://localhost:8000/demo/` or hit the API directly:
34+
35+
```bash
36+
curl http://localhost:8000/api/health
37+
curl 'http://localhost:8000/api/biochem/search?query=glucose&type=compounds&limit=5'
38+
```
39+
40+
To persist models across container restarts, bind-mount a host directory:
41+
42+
```bash
43+
docker run -p 8000:8000 \
44+
-v ~/.modelseed:/data/modelseed \
45+
ghcr.io/modelseed/modelseed-api:latest
46+
```
47+
48+
The standalone image defaults to local-filesystem storage. Endpoints that need ANL infrastructure (`/api/workspace/*`, `/api/rast/*`) return a clean 503 unless their env vars are configured. See [`docs/STANDALONE.md`](docs/STANDALONE.md) for the full walkthrough including FASTA-to-model recipes and MCP server setup for Claude Desktop.
49+
50+
2551
## API Endpoints
2652

2753
Most endpoints require a PATRIC token in the `Authorization` header (not needed in local mode). Biochemistry endpoints are always public.

0 commit comments

Comments
 (0)