Support nvidia dynamo by Bihan · Pull Request #3868 · dstackai/dstack

Bihan · 2026-05-08T17:52:46Z

Service Configuration example

type: service
name: dynamo-pd


env:
  - HF_TOKEN
  - MODEL_ID=meta-llama/Llama-3.2-3B-Instruct

retry:
  on_events: [error]
  duration: 1h

replicas: 
  - name: router
    count: 1
    docker: true
    router:
      type: dynamo
    commands:
      # DIND ships docker but not pip — set up a venv for ai-dynamo.
      - apt-get update
      - apt-get install -y python3-dev python3-venv
      - python3 -m venv ~/dyn-venv
      - source ~/dyn-venv/bin/activate
      - pip install -U pip
      - pip install --pre "ai-dynamo[sglang]"
      # Pull Dynamo and start the supporting compose stack (NATS, etcd, ...).
      - git clone https://github.com/ai-dynamo/dynamo.git
      - docker compose -f dynamo/deploy/docker-compose.yml up -d
      # Run the Dynamo frontend (was the commented dev command).
      - |
        python3 -m dynamo.frontend \
          --http-host 0.0.0.0 --http-port 8000 \
          --discovery-backend etcd --router-mode kv \
          --kv-cache-block-size 64
    resources:
      cpu: 4

  # ── prefill worker ─────────────────────────────────────────────────────
  # dstackai/base + Python 3.12 + NVCC (needed for CUDA kernels in sglang).
  - name: prefill
    count: 1..2
    scaling:
      metric: rps
      target: 4 
    python: "3.12"
    nvcc: true
    commands:
      # dstack injected DSTACK_ROUTER_INTERNAL_IP after the router replica
      # was provisioned. Compose the etcd/NATS endpoints from it.
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      - export DYN_SYSTEM_HOST="0.0.0.0"
      - export DYN_SYSTEM_PORT="8000"
      # Wait until the router's etcd and NATS ports are actually accepting
      # connections — having the IP isn't the same as having the services up.
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install --pre "ai-dynamo[sglang]"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode prefill --disaggregation-transfer-backend nixl
    resources:
      gpu: L4

  # ── decode worker ──────────────────────────────────────────────────────
  - name: decode
    count: 1
    python: "3.12"
    nvcc: true
    commands:
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      - export DYN_SYSTEM_HOST="0.0.0.0"
      - export DYN_SYSTEM_PORT="8000"
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install --pre "ai-dynamo[sglang]"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode decode --disaggregation-transfer-backend nixl
    resources:
      gpu: L4

port: 8000
model: meta-llama/Llama-3.2-3B-Instruct

probes:
 - type: http
   url: /health
   interval: 15s

Bihan Rana added 2 commits May 8, 2026 22:31

Resolve Merge Conflict

9dcc361

Support NVIDIA dynamo

dcd72e3

Bihan requested review from jvstme and r4victor May 8, 2026 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support nvidia dynamo#3868

Support nvidia dynamo#3868
Bihan wants to merge 2 commits intodstackai:masterfrom
Bihan:support_nvidia_dynamo

Bihan commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Bihan commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bihan commented May 8, 2026 •

edited

Loading