dual use text encoder/kv fast edits #13853
Replies: 2 comments 1 reply
-
|
Hi @kwal559, this is a solid workflow. The part that actually makes it work on one 24GB card is loading only the text encoder first, using it as an LLM for prompt expansion and then for embeddings, and dumping it before the transformer and VAE come in. That matches how diffusers already thinks about offload order anyway, you’re just doing it harder. Since Klein’s text encoder is Qwen3, calling generate and then encode_prompt on the same weights isn’t a hack. That’s the model. The KV transformer plus the resized base image as reference is what keeps the character consistent across the grid. You pay for enhancement once, lock the look at 1024, then batch cheap edits at 256. One thing I’d try is batching encode_prompt with the full variant list instead of looping one prompt at a time. Should help a lot once you crank variations up to 50 or 100. Also your system prompt asks for the final line in quotes but the parser mostly looks for thinking tags. If those don’t show up you might get extra junk in the embed. Pulling the last quoted string with a simple regex fallback would probably make that step more reliable. Did you try Flux2KleinPipeline instead of Flux2Pipeline here. Might be a cleaner fit since you’re already on distilled 4 step settings and passing prompt_embeds plus image directly. |
Beta Was this translation helpful? Give feedback.
-
|
I shared that 'harder' method because it performs 3 times faster than the recommended enable cpu_offload. I also show how to enhance a prompt with the default 'text encoder'in natural language - with reasoning mode enabled, and have it encode to embeds by telling diffusers what not to load.. I didn't know these llm/encoders nor their sometimes companion prompt helpers are the same default models we download on HF, but with a custom prompt.. Here are some benchmark for BF16 klein on rtx4090, the point being that 'harder' ways are also improved ways, and i figure enthusiasts likely read these, the type who look to benefit the hobby. 5 images cpu offload enabled: 37.61s |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
this script explores flux2 klein 9b-kv. Pass a prompt to enhance it directly to text encoder, allow it to think and capture it's final output. Then we create variation prompts and feed them back to text encoder for embeds. 1 component dual use = save memory.. Quantize it if you want to save time.. We include the initial image and pile on the variation prompts. Allow batch generation for speed. Receive a grid of consistent characters in different poses/image challenges. If you want to see magic, load up a svdq or similar small transformer and set the image count to 100. on rtx 4090 100 pics (128x128) generate less than 10 seconds. each image unique and character remains.
import torch,diffusers,gc,time,psutil,random
from PIL import Image
def flush():
gc.collect();torch.cuda.empty_cache()
print(f"🧹✂️ {torch.cuda.memory_reserved()/10243:.1f}GB")
print(f"VRAM: {24 - torch.cuda.mem_get_info()[0]/10243:.2f}GB | RAM: {psutil.virtual_memory()[3]/1024**3:.1f}GB")
model_id, kv_tran= "black-forest-labs/FLUX.2-klein-9B","black-forest-labs/FLUX.2-klein-9b-kv"
def enhance_and_embed(user_concept, num_prompts=20):
time_1 = time.time()
print("🧠 Text Encode + Enhance")
pipe = diffusers.DiffusionPipeline.from_pretrained(model_id,transformer=None,vae=None,scheduler=None,torch_dtype=torch.bfloat16).to("cuda")
def generate_images(init_embeddings, prompt_embeddings, num_prompts=20):
print("\n🚀 Loading Image Generation Models...")
vae = diffusers.AutoencoderKLFlux2.from_pretrained("black-forest-labs/FLUX.2-small-decoder", torch_dtype=torch.bfloat16)
transformer = diffusers.AutoModel.from_pretrained(kv_tran, subfolder="transformer", torch_dtype=torch.bfloat16)
transformer.enable_layerwise_casting(storage_dtype=torch.float8_e4m3fn)
EXECUTE PIPELINE
if name == "main":
USER_CONCEPT = "Portrait of a ghoul"
NUM_VARIATIONS = 20
Beta Was this translation helpful? Give feedback.
All reactions