Skip to content

Commit 3633072

Browse files
leejetstduhpf
andauthored
feat: add module backend assignment support (#1500)
Co-authored-by: Stéphane du Hamel <stephduh@live.fr>
1 parent 0c1ca17 commit 3633072

37 files changed

Lines changed: 1234 additions & 760 deletions

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,9 +133,11 @@ API and command-line option may change frequently.***
133133
## Performance
134134

135135
If you want to improve performance or reduce VRAM/RAM usage, please refer to [performance guide](./docs/performance.md).
136+
For runtime and parameter backend placement, see the [backend selection guide](./docs/backend.md).
136137

137138
## More Guides
138139

140+
- [Backend selection](./docs/backend.md)
139141
- [SD1.x/SD2.x/SDXL](./docs/sd.md)
140142
- [SD3/SD3.5](./docs/sd3.md)
141143
- [FLUX.1-dev/FLUX.1-schnell](./docs/flux.md)

docs/backend.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Backend selection
2+
3+
`stable-diffusion.cpp` has two backend assignments:
4+
5+
- `--backend` selects the runtime backend used to execute model graphs.
6+
- `--params-backend` selects the backend used to allocate model parameters.
7+
8+
If `--params-backend` is not set, parameters use the same backend as their module runtime backend.
9+
10+
## Syntax
11+
12+
A backend assignment can be a single backend name:
13+
14+
```shell
15+
sd-cli -m model.safetensors -p "a cat" --backend cpu
16+
```
17+
18+
This applies to every module that does not have a more specific assignment.
19+
20+
Assignments can also target individual modules:
21+
22+
```shell
23+
sd-cli -m model.safetensors -p "a cat" --backend te=cpu,vae=cuda0,diffusion=vulkan0
24+
```
25+
26+
The same syntax is used for parameter placement:
27+
28+
```shell
29+
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend te=cpu,vae=cpu
30+
```
31+
32+
Module names are case-insensitive. Hyphens and underscores in module names are ignored, so `clip_vision`, `clip-vision`, and `clipvision` are equivalent.
33+
34+
`all=`, `default=`, and `*=` can be used to set the default backend inside a mixed assignment:
35+
36+
```shell
37+
sd-cli -m model.safetensors -p "a cat" --backend all=cuda0,te=cpu
38+
```
39+
40+
## Modules
41+
42+
| Module | Purpose | Accepted names |
43+
| --- | --- | --- |
44+
| `diffusion` | UNet, DiT, MMDiT, Flux, Wan, Qwen Image, and other diffusion models | `diffusion`, `model`, `unet`, `dit` |
45+
| `te` | Text encoders and conditioners | `te`, `clip`, `text`, `textencoder`, `textencoders`, `conditioner`, `cond`, `llm`, `t5`, `t5xxl` |
46+
| `clip_vision` | CLIP vision encoder | `clip_vision`, `clipvision`, `clip-vision`, `vision` |
47+
| `vae` | VAE and TAE | `vae`, `firststage`, `autoencoder`, `tae` |
48+
| `controlnet` | ControlNet | `controlnet`, `control` |
49+
| `photomaker` | PhotoMaker ID encoder and PhotoMaker LoRA | `photomaker`, `photomakerid`, `pmid`, `photo` |
50+
| `upscaler` | ESRGAN upscaler | `upscaler`, `esrgan`, `hires` |
51+
52+
`te` is the preferred module name for text encoders. `clip` is kept as an accepted alias because many existing commands and model names use CLIP terminology.
53+
54+
## Backend names
55+
56+
Backend names are resolved against the GGML backend device list. Matching is case-insensitive and accepts exact names or unique prefixes, so common values include names such as:
57+
58+
- `cpu`
59+
- `cuda0`
60+
- `vulkan0`
61+
- `metal`
62+
63+
The special values `auto`, `default`, and an empty backend name select the default backend. The default preference is GPU, then integrated GPU, then CPU.
64+
65+
The special value `gpu` selects the first GPU backend, falling back to the first integrated GPU backend.
66+
67+
## Runtime backend vs. parameter backend
68+
69+
The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated.
70+
71+
For example:
72+
73+
```shell
74+
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend cpu
75+
```
76+
77+
This runs all modules on `cuda0`, but stores parameters in CPU RAM. During execution, parameters are moved to the runtime backend as needed.
78+
79+
Per-module assignments can be mixed:
80+
81+
```shell
82+
sd-cli -m model.safetensors -p "a cat" --backend diffusion=cuda0,te=cpu,vae=cpu --params-backend diffusion=cuda0,te=cpu,vae=cpu
83+
```
84+
85+
This keeps text encoding and VAE execution on CPU while the diffusion model runs on GPU.
86+
87+
## Backend sharing and lifetime
88+
89+
Backends are managed by `SDBackendManager`.
90+
91+
Within one manager, backend instances are cached by resolved backend device name. If multiple modules request the same backend, they share the same `ggml_backend_t`.
92+
93+
For example:
94+
95+
```shell
96+
--backend te=cpu,vae=cpu
97+
```
98+
99+
uses one shared CPU backend for both `te` and `vae` runtime execution.
100+
101+
Runtime and parameter assignments also share the same backend cache. If `--backend diffusion=cuda0` and `--params-backend diffusion=cuda0` resolve to the same device, both use the same backend instance.
102+
103+
`SDBackendManager` owns the backend instances and frees them when the context or upscaler is destroyed. Model runners receive non-owning runtime and parameter backend pointers and do not free them.
104+
105+
## Compatibility flags
106+
107+
The older CPU placement flags are still supported:
108+
109+
- `--clip-on-cpu`
110+
- `--vae-on-cpu`
111+
- `--control-net-cpu`
112+
- `--offload-to-cpu`
113+
114+
`--clip-on-cpu`, `--vae-on-cpu`, and `--control-net-cpu` affect runtime backend assignment only when `--backend` is not set. They map to `te=cpu`, `vae=cpu`, and `controlnet=cpu`.
115+
116+
`--offload-to-cpu` affects parameter backend assignment only when `--params-backend` is not set. It is equivalent to:
117+
118+
```shell
119+
--params-backend cpu
120+
```
121+
122+
Explicit `--backend` and `--params-backend` assignments are preferred for new commands.

examples/cli/main.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -749,7 +749,9 @@ int main(int argc, const char* argv[]) {
749749
ctx_params.offload_params_to_cpu,
750750
ctx_params.diffusion_conv_direct,
751751
ctx_params.n_threads,
752-
gen_params.upscale_tile_size));
752+
gen_params.upscale_tile_size,
753+
ctx_params.backend.c_str(),
754+
ctx_params.params_backend.c_str()));
753755

754756
if (upscaler_ctx == nullptr) {
755757
LOG_ERROR("new_upscaler_ctx failed");

examples/common/common.cpp

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -380,6 +380,14 @@ ArgOptions SDContextParams::get_options() {
380380
"--upscale-model",
381381
"path to esrgan model.",
382382
&esrgan_path},
383+
{"",
384+
"--backend",
385+
"runtime backend assignment, e.g. cpu or clip=cpu,vae=cuda0,diffusion=vulkan0",
386+
&backend},
387+
{"",
388+
"--params-backend",
389+
"parameter backend assignment, e.g. cpu or diffusion=cpu,clip=cpu",
390+
&params_backend},
383391
};
384392

385393
options.int_options = {
@@ -676,6 +684,8 @@ std::string SDContextParams::to_string() const {
676684
<< " sampler_rng_type: " << sd_rng_type_name(sampler_rng_type) << ",\n"
677685
<< " offload_params_to_cpu: " << (offload_params_to_cpu ? "true" : "false") << ",\n"
678686
<< " max_vram: " << max_vram << ",\n"
687+
<< " backend: \"" << backend << "\",\n"
688+
<< " params_backend: \"" << params_backend << "\",\n"
679689
<< " enable_mmap: " << (enable_mmap ? "true" : "false") << ",\n"
680690
<< " control_net_cpu: " << (control_net_cpu ? "true" : "false") << ",\n"
681691
<< " clip_on_cpu: " << (clip_on_cpu ? "true" : "false") << ",\n"
@@ -751,6 +761,8 @@ sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool f
751761
chroma_t5_mask_pad,
752762
qwen_image_zero_cond_t,
753763
max_vram,
764+
backend.c_str(),
765+
params_backend.c_str(),
754766
};
755767
return sd_ctx_params;
756768
}

examples/common/common.h

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -110,14 +110,16 @@ struct SDContextParams {
110110
rng_type_t sampler_rng_type = RNG_TYPE_COUNT;
111111
bool offload_params_to_cpu = false;
112112
float max_vram = 0.f;
113-
bool enable_mmap = false;
114-
bool control_net_cpu = false;
115-
bool clip_on_cpu = false;
116-
bool vae_on_cpu = false;
117-
bool flash_attn = false;
118-
bool diffusion_flash_attn = false;
119-
bool diffusion_conv_direct = false;
120-
bool vae_conv_direct = false;
113+
std::string backend;
114+
std::string params_backend;
115+
bool enable_mmap = false;
116+
bool control_net_cpu = false;
117+
bool clip_on_cpu = false;
118+
bool vae_on_cpu = false;
119+
bool flash_attn = false;
120+
bool diffusion_flash_attn = false;
121+
bool diffusion_conv_direct = false;
122+
bool vae_conv_direct = false;
121123

122124
bool circular = false;
123125
bool circular_x = false;

include/stable-diffusion.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,8 @@ typedef struct {
206206
int chroma_t5_mask_pad;
207207
bool qwen_image_zero_cond_t;
208208
float max_vram; // GiB budget for graph-cut segmented param offload (0 = disabled, -1 = auto free VRAM minus 1 GiB)
209+
const char* backend;
210+
const char* params_backend;
209211
} sd_ctx_params_t;
210212

211213
typedef struct {
@@ -427,7 +429,9 @@ SD_API upscaler_ctx_t* new_upscaler_ctx(const char* esrgan_path,
427429
bool offload_params_to_cpu,
428430
bool direct,
429431
int n_threads,
430-
int tile_size);
432+
int tile_size,
433+
const char* backend,
434+
const char* params_backend);
431435
SD_API void free_upscaler_ctx(upscaler_ctx_t* upscaler_ctx);
432436

433437
SD_API sd_image_t upscale(upscaler_ctx_t* upscaler_ctx,

src/anima.hpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -526,10 +526,10 @@ namespace Anima {
526526
AnimaNet net;
527527

528528
AnimaRunner(ggml_backend_t backend,
529-
bool offload_params_to_cpu,
529+
ggml_backend_t params_backend,
530530
const String2TensorStorage& tensor_storage_map = {},
531531
const std::string prefix = "model.diffusion_model")
532-
: GGMLRunner(backend, offload_params_to_cpu) {
532+
: GGMLRunner(backend, params_backend) {
533533
int64_t num_layers = 0;
534534
std::string layer_tag = prefix + ".net.blocks.";
535535
for (const auto& kv : tensor_storage_map) {

src/auto_encoder_kl.hpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -664,13 +664,13 @@ struct AutoEncoderKL : public VAE {
664664
AutoEncoderKLModel ae;
665665

666666
AutoEncoderKL(ggml_backend_t backend,
667-
bool offload_params_to_cpu,
667+
ggml_backend_t params_backend,
668668
const String2TensorStorage& tensor_storage_map,
669669
const std::string prefix,
670670
bool decode_only = false,
671671
bool use_video_decoder = false,
672672
SDVersion version = VERSION_SD1)
673-
: decode_only(decode_only), VAE(version, backend, offload_params_to_cpu) {
673+
: decode_only(decode_only), VAE(version, backend, params_backend) {
674674
if (sd_version_is_sd1(version) || sd_version_is_sd2(version)) {
675675
scale_factor = 0.18215f;
676676
shift_factor = 0.f;

src/clip.hpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -469,13 +469,13 @@ struct CLIPTextModelRunner : public GGMLRunner {
469469
std::vector<float> attention_mask_vec;
470470

471471
CLIPTextModelRunner(ggml_backend_t backend,
472-
bool offload_params_to_cpu,
472+
ggml_backend_t params_backend,
473473
const String2TensorStorage& tensor_storage_map,
474474
const std::string prefix,
475475
CLIPVersion version = OPENAI_CLIP_VIT_L_14,
476476
bool with_final_ln = true,
477477
bool force_clip_f32 = false)
478-
: GGMLRunner(backend, offload_params_to_cpu) {
478+
: GGMLRunner(backend, params_backend) {
479479
bool proj_in = false;
480480
for (const auto& [name, tensor_storage] : tensor_storage_map) {
481481
if (!starts_with(name, prefix)) {

0 commit comments

Comments
 (0)