Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion charts/model-csi-driver/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,10 @@ data:
config.yaml: |-
service_name: {{ .Values.config.serviceName }}
root_dir: {{ .Values.config.rootDir }}
dynamic_csi_endpoint: {{ .Values.config.dynamicCsiEndpoint }}
csi_endpoint: unix:///csi/csi.sock
metrics_addr: {{ .Values.config.metricsAddr }}
{{- with .Values.config.pullConfig }}
pull_config:
{{- toYaml . | nindent 6 }}
{{- end }}
{{- end }}
6 changes: 5 additions & 1 deletion charts/model-csi-driver/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ config:
# Root working directory for model storage and metadata,
# must be writable and have enough disk space
rootDir: /var/lib/model-csi
# Deprecated compatibility socket for the legacy shared dynamic CSI server.
dynamicCsiEndpoint: unix:///var/run/model-csi/csi.sock
# Bind address for Prometheus metrics. POD_IP is provided by the daemonset.
metricsAddr: tcp://127.0.0.1:5244
registryAuths:
# registry.example.com:
# auth: dXNlcm5hbWU6cGFzc3dvcmQ=
Expand Down Expand Up @@ -63,4 +67,4 @@ podAnnotations: {}
nodeSelector: {}
tolerations: []
affinity: {}
hostAliases: {}
hostAliases: {}
273 changes: 224 additions & 49 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -1,102 +1,277 @@
# Getting Started with Model CSI Driver

Model CSI Driver is a Kubernetes CSI driver for serving OCI model artifacts, which are bundled based on [Model Spec](https://github.com/modelpack/model-spec). This guide will help you deploy and use the Model CSI Driver in your Kubernetes cluster.
Model CSI Driver is a Kubernetes CSI driver for serving OCI model artifacts packaged according to the [Model Spec](https://github.com/modelpack/model-spec). It enables model delivery through CSI volumes and supports both direct image pulls and P2P-accelerated distribution.

## Overview

The Model CSI Driver simplifies and accelerates model deployment in Kubernetes by:
Model CSI Driver is designed for clusters that need to mount model artifacts into pods without building model data into application images.

- Seamlessly mount model artifacts as volumes into pod
- Compatible with older Kubernetes versions
- Natively supports P2P-accelerated distribution
Key capabilities:

- Mount OCI model artifacts as CSI volumes
- Support a simple static inline mount flow for direct consumption
- Support a dynamic in-pod mount flow through a local Unix domain socket API
- Integrate with P2P distribution for large model delivery

## Prerequisites

Before getting started, ensure you have:
Prepare the following before deployment:

- `kubectl` configured to access your Kubernetes cluster
- Helm v3.x (recommended for installation)
- A Kubernetes cluster with kubectl access
- Helm 3.x
- Access to an OCI registry that stores model artifacts

## Installation
To build and push a model artifact, follow the `modctl` guide at https://github.com/modelpack/modctl/blob/main/docs/getting-started.md.

### Helm Installation
## Installation

1. Create custom configuration:
Install the driver with Helm. The example below keeps only the configuration that is typically customized in real deployments.

```yaml
# values-custom.yaml
config:
# Root working directory for model storage and metadata,
# must be writable and have enough disk space
serviceName: model.csi.modelpack.org
rootDir: /var/lib/model-csi
# Configuration for private registry auth
dynamicCsiEndpoint: unix:///var/run/model-csi/csi.sock
metricsAddr: tcp://127.0.0.1:5244
registryAuths:
# Registry host:port
registry.example.com:
# Based64 encoded username:password
auth: dXNlcm5hbWU6cGFzc3dvcmQ=
# Registry server scheme, http or https
serverscheme: https

image:
# Model csi driver daemonset image
repository: ghcr.io/modelpack/model-csi-driver
pullPolicy: IfNotPresent
tag: latest
```

2. Install the driver using Helm:
Notes:

- **serviceName** must stay aligned with the CSI driver name used in pod specs unless you intentionally deploy a custom name.
- **rootDir** must be writable on every node and have enough local disk capacity for pulled model data.
- **dynamicCsiEndpoint** keeps the legacy shared dynamic CSI socket available for backward compatibility.
- **metricsAddr** controls the Prometheus metrics listener.
- **registryAuths** uses base64-encoded username:password values.

Deploy the chart:

```bash
helm upgrade --install model-csi-driver \
oci://ghcr.io/modelpack/charts/model-csi-driver \
--namespace model-csi \
--create-namespace \
-f values-custom.yaml
oci://ghcr.io/modelpack/charts/model-csi-driver \
--namespace model-csi \
--create-namespace \
-f values-custom.yaml
```

3. Verify the installation:
Verify the daemonset:

```bash
kubectl get pods -n model-csi
```

## Basic Usage
## Static Inline Mount

### Create Model Artifact with modctl
Use a static inline mount when the model reference is known at pod creation time. The model is pulled and mounted during pod startup, and the local data is reclaimed when the pod is removed.

```yaml
apiVersion: v1
kind: Pod
metadata:
name: model-inference-pod
spec:
containers:
- name: inference-server
image: ubuntu:24.04
command: ["sleep", "infinity"]
volumeMounts:
- name: model-volume
mountPath: /home/admin/model
readOnly: true
volumes:
- name: model-volume
csi:
driver: model.csi.modelpack.org
volumeAttributes:
model.csi.modelpack.org/type: image
model.csi.modelpack.org/reference: example.com/model/llama:v1.0.0
```

Follow the [guide](https://github.com/modelpack/modctl/blob/main/docs/getting-started.md) to build and push a model artifact to an OCI distribution-compatible registry.
Use this mode for the simplest deployment path, keep in mind that kubelet applies a mount timeout of about 2 minutes for inline volumes, so for large models, combine the driver with a P2P cache service (for example [Dragonfly](https://github.com/dragonflyoss/dragonfly)) to avoid startup failures, we'll provide detailed information about integrating Dragonfly later.

### Create a Pod with Model Volume
## Dynamic Inline Mount

The Model CSI Driver uses inline volume directly in pod spec, here's a basic example:
Use a dynamic inline mount when the pod should decide at runtime which model to mount. In this mode, the CSI volume only exposes the driver working directory. Models are then mounted and unmounted inside the pod through the local Unix domain socket API.

```yaml
apiVersion: v1
kind: Pod
metadata:
name: model-inference-pod
name: dynamic-model-pod
spec:
containers:
- name: inference-server
image: ubuntu:24.04
command: ["sleep", "infinity"]
volumeMounts:
- name: model-volume
mountPath: /model
readOnly: true
- name: main
image: ubuntu:24.04
command: ["sleep", "infinity"]
volumeMounts:
- name: model-volume
mountPath: /home/admin/model-csi
volumes:
- name: model-volume
csi:
driver: model.csi.modelpack.org
volumeAttributes:
model.csi.modelpack.org/reference: "registry.example.com/models/qwen3-0.6b:latest"
- name: model-volume
csi:
driver: model.csi.modelpack.org
```

After the pod starts, use the mounted directory as the root for local model operations.

## UDS HTTP API

The dynamic mount flow is managed through a REST-style HTTP API exposed on a Unix domain socket.

### Discover the socket path

If the CSI root is mounted at volume_dir, the socket path is:

```text
unix://$volume_dir/csi/csi.sock
```

### Discover the volume name

```bash
volume_name=$(jq -r .volume_name "$volume_dir/status.json")
```

### Response semantics

- 2xx indicates success
- 4xx indicates an invalid client request
- 5xx indicates an internal server failure

Error responses use the following shape:

```json
{
"code": "INVALID_ARGUMENT",
"message": "..."
}
```

During daemonset rollout or restart, the socket file may be recreated. Clients should retry when the socket path does not exist or when the request fails with connection refused.

### Create a model mount

```bash
curl --unix-socket "$volume_dir/csi/csi.sock" \
-H "Content-Type: application/json" \
-X POST http://localhost/api/v1/volumes/$volume_name/mounts \
-d '{
"mount_id": "demo-mount",
"reference": "example.com/model/llama:v1.0.0"
}'
```

The same request can include file filtering parameters when the pod only needs part of the model contents:

```bash
curl --unix-socket "$volume_dir/csi/csi.sock" \
-H "Content-Type: application/json" \
-X POST http://localhost/api/v1/volumes/$volume_name/mounts \
-d '{
"mount_id": "bootstrap-only",
"reference": "example.com/model/llama:v1.0.0",
"exclude_file_patterns": [
"model.safetensors.index.json",
"!tiktoken.model"
]
Comment on lines +182 to +185
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example for exclude_file_patterns could be confusing without more context on the pattern syntax. The use of !tiktoken.model suggests an exception to a rule, but no such rule is present in the example. To improve clarity, please consider adding a brief explanation of the pattern syntax, especially the ! prefix for inclusions/exceptions. For example:

"Patterns follow .gitignore conventions. Prefacing a pattern with ! will negate it. This is useful for re-including a file if it was excluded by a broader pattern like * or *.model."

}'
```

Example response:

```json
{
"volume_name": "csi-xxx",
"mount_id": "demo-mount",
"reference": "example.com/model/llama:v1.0.0",
"state": "PULL_SUCCEEDED"
}
```

Notes:

- mount_id may contain letters, numbers, underscores, and hyphens
- The mounted model becomes available at $volume_dir/models/$mount_id/model
- This is a synchronous operation that pulls and mounts the model before returning
- For large models, use a sufficiently large HTTP client timeout

### Get a model mount

```bash
curl --unix-socket "$volume_dir/csi/csi.sock" \
-X GET http://localhost/api/v1/volumes/$volume_name/mounts/$mount_id
```

Example response:

```json
{
"volume_name": "csi-xxx",
"mount_id": "demo-mount",
"reference": "example.com/model/llama:v1.0.0",
"state": "PULLING",
"progress": {
"total": 5,
"items": [
{
"digest": "sha256:0c75d49a2c25846123b238a2e7bfa2d78f6b3d62069f3ce68364e3024d1a76da",
"path": "/tokenizer.json",
"size": 7849472,
"started_at": "2025-06-10T20:19:12.797873473+08:00",
"finished_at": "2025-06-10T20:19:15.046158731+08:00"
},
{
"digest": "sha256:70c80fe937f84ce03629c7b397038a1566cac5aeabad92b5344384aa8f13f44c",
"path": "/configuration.json",
"size": 2048,
"started_at": "2025-06-10T20:19:12.79806982+08:00"
}
Comment on lines +229 to +237
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example timestamps use the year 2025, which is in the future. This can be slightly confusing for readers. Using a past or the current year would make the example more grounded and realistic.

Suggested change
"started_at": "2025-06-10T20:19:12.797873473+08:00",
"finished_at": "2025-06-10T20:19:15.046158731+08:00"
},
{
"digest": "sha256:70c80fe937f84ce03629c7b397038a1566cac5aeabad92b5344384aa8f13f44c",
"path": "/configuration.json",
"size": 2048,
"started_at": "2025-06-10T20:19:12.79806982+08:00"
}
"started_at": "2024-06-10T20:19:12.797873473+08:00",
"finished_at": "2024-06-10T20:19:15.046158731+08:00"
},
{
"digest": "sha256:70c80fe937f84ce03629c7b397038a1566cac5aeabad92b5344384aa8f13f44c",
"path": "/configuration.json",
"size": 2048,
"started_at": "2024-06-10T20:19:12.79806982+08:00"
}

]
}
}
```

Possible state values are `PULLING`, `PULL_SUCCEEDED`, and `PULL_FAILED`.

### List model mounts

```bash
curl --unix-socket "$volume_dir/csi/csi.sock" \
-X GET http://localhost/api/v1/volumes/$volume_name/mounts
```

### Delete a model mount

```bash
curl --unix-socket "$volume_dir/csi/csi.sock" \
-X DELETE http://localhost/api/v1/volumes/$volume_name/mounts/$mount_id
```

## Supported Volume Attributes

The driver recognizes the following CSI volume attributes:

| Attribute | Required | Description |
| --- | --- | --- |
| model.csi.modelpack.org/reference | Yes for static inline mounts | OCI reference of the model artifact to mount. |
| model.csi.modelpack.org/exclude-file-patterns | No | JSON array of path patterns to exclude during static inline mounts. |

For dynamic inline mounts, the pod-level CSI volume typically omits these attributes. The model reference is supplied later through the UDS API.

## Troubleshooting

### Pod stuck in Pending or ContainerCreating
```bash
# Describe a pod with issues
kubectl describe pod <pod-name>
### Pod stays in Pending or ContainerCreating

# Check model csi driver logs
kubectl logs -c model-csi-driver -n model-csi
```
```bash
kubectl describe pod <pod-name>
kubectl logs -n model-csi -c model-csi-driver <model-csi-driver-pod>
```
Loading