Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,9 +133,11 @@ API and command-line option may change frequently.***
## Performance

If you want to improve performance or reduce VRAM/RAM usage, please refer to [performance guide](./docs/performance.md).
For runtime and parameter backend placement, see the [backend selection guide](./docs/backend.md).

## More Guides

- [Backend selection](./docs/backend.md)
- [SD1.x/SD2.x/SDXL](./docs/sd.md)
- [SD3/SD3.5](./docs/sd3.md)
- [FLUX.1-dev/FLUX.1-schnell](./docs/flux.md)
Expand Down
122 changes: 122 additions & 0 deletions docs/backend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Backend selection

`stable-diffusion.cpp` has two backend assignments:

- `--backend` selects the runtime backend used to execute model graphs.
- `--params-backend` selects the backend used to allocate model parameters.

If `--params-backend` is not set, parameters use the same backend as their module runtime backend.

## Syntax

A backend assignment can be a single backend name:

```shell
sd-cli -m model.safetensors -p "a cat" --backend cpu
```

This applies to every module that does not have a more specific assignment.

Assignments can also target individual modules:

```shell
sd-cli -m model.safetensors -p "a cat" --backend te=cpu,vae=cuda0,diffusion=vulkan0
```

The same syntax is used for parameter placement:

```shell
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend te=cpu,vae=cpu
```

Module names are case-insensitive. Hyphens and underscores in module names are ignored, so `clip_vision`, `clip-vision`, and `clipvision` are equivalent.

`all=`, `default=`, and `*=` can be used to set the default backend inside a mixed assignment:

```shell
sd-cli -m model.safetensors -p "a cat" --backend all=cuda0,te=cpu
```

## Modules

| Module | Purpose | Accepted names |
| --- | --- | --- |
| `diffusion` | UNet, DiT, MMDiT, Flux, Wan, Qwen Image, and other diffusion models | `diffusion`, `model`, `unet`, `dit` |
| `te` | Text encoders and conditioners | `te`, `clip`, `text`, `textencoder`, `textencoders`, `conditioner`, `cond`, `llm`, `t5`, `t5xxl` |
| `clip_vision` | CLIP vision encoder | `clip_vision`, `clipvision`, `clip-vision`, `vision` |
| `vae` | VAE and TAE | `vae`, `firststage`, `autoencoder`, `tae` |
| `controlnet` | ControlNet | `controlnet`, `control` |
| `photomaker` | PhotoMaker ID encoder and PhotoMaker LoRA | `photomaker`, `photomakerid`, `pmid`, `photo` |
| `upscaler` | ESRGAN upscaler | `upscaler`, `esrgan`, `hires` |

`te` is the preferred module name for text encoders. `clip` is kept as an accepted alias because many existing commands and model names use CLIP terminology.

## Backend names

Backend names are resolved against the GGML backend device list. Matching is case-insensitive and accepts exact names or unique prefixes, so common values include names such as:

- `cpu`
- `cuda0`
- `vulkan0`
- `metal`

The special values `auto`, `default`, and an empty backend name select the default backend. The default preference is GPU, then integrated GPU, then CPU.

The special value `gpu` selects the first GPU backend, falling back to the first integrated GPU backend.

## Runtime backend vs. parameter backend

The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated.

For example:

```shell
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend cpu
```

This runs all modules on `cuda0`, but stores parameters in CPU RAM. During execution, parameters are moved to the runtime backend as needed.

Per-module assignments can be mixed:

```shell
sd-cli -m model.safetensors -p "a cat" --backend diffusion=cuda0,te=cpu,vae=cpu --params-backend diffusion=cuda0,te=cpu,vae=cpu
```

This keeps text encoding and VAE execution on CPU while the diffusion model runs on GPU.

## Backend sharing and lifetime

Backends are managed by `SDBackendManager`.

Within one manager, backend instances are cached by resolved backend device name. If multiple modules request the same backend, they share the same `ggml_backend_t`.

For example:

```shell
--backend te=cpu,vae=cpu
```

uses one shared CPU backend for both `te` and `vae` runtime execution.

Runtime and parameter assignments also share the same backend cache. If `--backend diffusion=cuda0` and `--params-backend diffusion=cuda0` resolve to the same device, both use the same backend instance.

`SDBackendManager` owns the backend instances and frees them when the context or upscaler is destroyed. Model runners receive non-owning runtime and parameter backend pointers and do not free them.

## Compatibility flags

The older CPU placement flags are still supported:

- `--clip-on-cpu`
- `--vae-on-cpu`
- `--control-net-cpu`
- `--offload-to-cpu`

`--clip-on-cpu`, `--vae-on-cpu`, and `--control-net-cpu` affect runtime backend assignment only when `--backend` is not set. They map to `te=cpu`, `vae=cpu`, and `controlnet=cpu`.

`--offload-to-cpu` affects parameter backend assignment only when `--params-backend` is not set. It is equivalent to:

```shell
--params-backend cpu
```

Explicit `--backend` and `--params-backend` assignments are preferred for new commands.
4 changes: 3 additions & 1 deletion examples/cli/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -749,7 +749,9 @@ int main(int argc, const char* argv[]) {
ctx_params.offload_params_to_cpu,
ctx_params.diffusion_conv_direct,
ctx_params.n_threads,
gen_params.upscale_tile_size));
gen_params.upscale_tile_size,
ctx_params.backend.c_str(),
ctx_params.params_backend.c_str()));

if (upscaler_ctx == nullptr) {
LOG_ERROR("new_upscaler_ctx failed");
Expand Down
12 changes: 12 additions & 0 deletions examples/common/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,14 @@ ArgOptions SDContextParams::get_options() {
"--upscale-model",
"path to esrgan model.",
&esrgan_path},
{"",
"--backend",
"runtime backend assignment, e.g. cpu or clip=cpu,vae=cuda0,diffusion=vulkan0",
&backend},
{"",
"--params-backend",
"parameter backend assignment, e.g. cpu or diffusion=cpu,clip=cpu",
&params_backend},
};

options.int_options = {
Expand Down Expand Up @@ -676,6 +684,8 @@ std::string SDContextParams::to_string() const {
<< " sampler_rng_type: " << sd_rng_type_name(sampler_rng_type) << ",\n"
<< " offload_params_to_cpu: " << (offload_params_to_cpu ? "true" : "false") << ",\n"
<< " max_vram: " << max_vram << ",\n"
<< " backend: \"" << backend << "\",\n"
<< " params_backend: \"" << params_backend << "\",\n"
<< " enable_mmap: " << (enable_mmap ? "true" : "false") << ",\n"
<< " control_net_cpu: " << (control_net_cpu ? "true" : "false") << ",\n"
<< " clip_on_cpu: " << (clip_on_cpu ? "true" : "false") << ",\n"
Expand Down Expand Up @@ -751,6 +761,8 @@ sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool f
chroma_t5_mask_pad,
qwen_image_zero_cond_t,
max_vram,
backend.c_str(),
params_backend.c_str(),
};
return sd_ctx_params;
}
Expand Down
18 changes: 10 additions & 8 deletions examples/common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,14 +110,16 @@ struct SDContextParams {
rng_type_t sampler_rng_type = RNG_TYPE_COUNT;
bool offload_params_to_cpu = false;
float max_vram = 0.f;
bool enable_mmap = false;
bool control_net_cpu = false;
bool clip_on_cpu = false;
bool vae_on_cpu = false;
bool flash_attn = false;
bool diffusion_flash_attn = false;
bool diffusion_conv_direct = false;
bool vae_conv_direct = false;
std::string backend;
std::string params_backend;
bool enable_mmap = false;
bool control_net_cpu = false;
bool clip_on_cpu = false;
bool vae_on_cpu = false;
bool flash_attn = false;
bool diffusion_flash_attn = false;
bool diffusion_conv_direct = false;
bool vae_conv_direct = false;

bool circular = false;
bool circular_x = false;
Expand Down
6 changes: 5 additions & 1 deletion include/stable-diffusion.h
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,8 @@ typedef struct {
int chroma_t5_mask_pad;
bool qwen_image_zero_cond_t;
float max_vram; // GiB budget for graph-cut segmented param offload (0 = disabled, -1 = auto free VRAM minus 1 GiB)
const char* backend;
const char* params_backend;
} sd_ctx_params_t;

typedef struct {
Expand Down Expand Up @@ -427,7 +429,9 @@ SD_API upscaler_ctx_t* new_upscaler_ctx(const char* esrgan_path,
bool offload_params_to_cpu,
bool direct,
int n_threads,
int tile_size);
int tile_size,
const char* backend,
const char* params_backend);
SD_API void free_upscaler_ctx(upscaler_ctx_t* upscaler_ctx);

SD_API sd_image_t upscale(upscaler_ctx_t* upscaler_ctx,
Expand Down
4 changes: 2 additions & 2 deletions src/anima.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -526,10 +526,10 @@ namespace Anima {
AnimaNet net;

AnimaRunner(ggml_backend_t backend,
bool offload_params_to_cpu,
ggml_backend_t params_backend,
const String2TensorStorage& tensor_storage_map = {},
const std::string prefix = "model.diffusion_model")
: GGMLRunner(backend, offload_params_to_cpu) {
: GGMLRunner(backend, params_backend) {
int64_t num_layers = 0;
std::string layer_tag = prefix + ".net.blocks.";
for (const auto& kv : tensor_storage_map) {
Expand Down
4 changes: 2 additions & 2 deletions src/auto_encoder_kl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -664,13 +664,13 @@ struct AutoEncoderKL : public VAE {
AutoEncoderKLModel ae;

AutoEncoderKL(ggml_backend_t backend,
bool offload_params_to_cpu,
ggml_backend_t params_backend,
const String2TensorStorage& tensor_storage_map,
const std::string prefix,
bool decode_only = false,
bool use_video_decoder = false,
SDVersion version = VERSION_SD1)
: decode_only(decode_only), VAE(version, backend, offload_params_to_cpu) {
: decode_only(decode_only), VAE(version, backend, params_backend) {
if (sd_version_is_sd1(version) || sd_version_is_sd2(version)) {
scale_factor = 0.18215f;
shift_factor = 0.f;
Expand Down
4 changes: 2 additions & 2 deletions src/clip.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -469,13 +469,13 @@ struct CLIPTextModelRunner : public GGMLRunner {
std::vector<float> attention_mask_vec;

CLIPTextModelRunner(ggml_backend_t backend,
bool offload_params_to_cpu,
ggml_backend_t params_backend,
const String2TensorStorage& tensor_storage_map,
const std::string prefix,
CLIPVersion version = OPENAI_CLIP_VIT_L_14,
bool with_final_ln = true,
bool force_clip_f32 = false)
: GGMLRunner(backend, offload_params_to_cpu) {
: GGMLRunner(backend, params_backend) {
bool proj_in = false;
for (const auto& [name, tensor_storage] : tensor_storage_map) {
if (!starts_with(name, prefix)) {
Expand Down
Loading
Loading