Bit-interleaved Q1_0 8x32 repack kernels for x86 AVX2 by pl752 · Pull Request #29 · PrismML-Eng/llama.cpp

pl752 · 2026-05-02T10:42:45Z

Continuation of #21 and #10

Been a hot minute

Decided to drop nrc==2 (might revisit if plain AVX and SSSE3 are needed) as it is mostly used in specific situations for ARM_DOTPROD and focus on optimized gemv and gemm.

Also I have finally moved to native linux from WSL2, so now benchmarks are run with -fa 1 -mmp 0 -r 5 -t 6 instead of -t 10 as SMT threads don't help significantly with performance anymore, but increase memory pressure. So benchmark baselines have shifted again.

flow	run	dot	repack	delta
`AVX2`	`pp512`	139.80 t/s	190.98 t/s	+36.61%
`AVX2`	`tg128`	91.70 t/s	115.17 t/s	+25.59%
`AVX512`*	`pp512`	145.09 t/s	219.96 t/s	+51.60%
`AVX512`*	`tg128`	93.34 t/s	120.47 t/s	+29.07%

* - register file increase only, no special kernel

AVX512 is in theory usable, but couldn't implement kernel which won't regress Zen 4 AVX512 performance yet, so currently relying on AVX2 code

Perplexity

chunk             PPL               ln(PPL(Q)/PPL(base))          KL Divergence              Δp RMS            Same top p
   1      13.9558 ±    3.1805      -0.00009 ±    0.00239       0.00021 ±    0.00003     0.396 ±  0.056 %    99.608 ±  0.392 %
   2      20.2053 ±    3.4389       0.01465 ±    0.01152       0.00022 ±    0.00002     0.386 ±  0.034 %    99.412 ±  0.339 %
   3      20.8472 ±    2.7882       0.00892 ±    0.00770       0.00022 ±    0.00001     0.365 ±  0.026 %    99.085 ±  0.344 %
   4      21.1986 ±    2.3887       0.00633 ±    0.00579       0.00022 ±    0.00001     0.377 ±  0.026 %    99.216 ±  0.276 %
   5      21.0772 ±    2.1025       0.00518 ±    0.00466       0.00023 ±    0.00001     0.365 ±  0.022 %    99.216 ±  0.247 %

====== Perplexity statistics ======
Mean PPL(Q)                   :  21.077184 ±   2.102473
Mean PPL(base)                :  20.968387 ±   2.074795
Cor(ln(PPL(Q)), ln(PPL(base))):  99.89%
Mean ln(PPL(Q)/PPL(base))     :   0.005175 ±   0.004663
Mean PPL(Q)/PPL(base)         :   1.005189 ±   0.004688
Mean PPL(Q)-PPL(base)         :   0.108796 ±   0.100463

====== KL divergence statistics ======
Mean    KLD:   0.000226 ±   0.000011
Maximum KLD:   0.006768
99.9%   KLD:   0.005245
99.0%   KLD:   0.001404
95.0%   KLD:   0.000682
90.0%   KLD:   0.000481
Median  KLD:   0.000135
10.0%   KLD:   0.000002
 5.0%   KLD:   0.000000
 1.0%   KLD:  -0.000010
 0.1%   KLD:  -0.000033
Minimum KLD:  -0.000039

====== Token probability statistics ======
Mean    Δp:  0.020 ± 0.010 %
Maximum Δp:  3.536%
99.9%   Δp:  2.703%
99.0%   Δp:  1.293%
95.0%   Δp:  0.595%
90.0%   Δp:  0.300%
75.0%   Δp:  0.065%
Median  Δp:  0.000%
25.0%   Δp: -0.041%
10.0%   Δp: -0.277%
 5.0%   Δp: -0.472%
 1.0%   Δp: -1.087%
 0.1%   Δp: -1.576%
Minimum Δp: -1.698%
RMS Δp    :  0.365 ± 0.022 %
Same top p: 99.216 ± 0.247 %

For some reason model identifies its type as Q2_0

Benchmarks for various number of threads for repack AVX512

model	size	params	backend	threads	fa	test	t/s
qwen3 1.7B Q2_0 (HUH!? Y?)	231.13 MiB	1.72 B	CPU	4	1	pp512	167.58 ± 2.59
qwen3 1.7B Q2_0	231.13 MiB	1.72 B	CPU	4	1	tg128	94.55 ± 0.14
qwen3 1.7B Q2_0	231.13 MiB	1.72 B	CPU	6	1	pp512	219.96 ± 0.17
qwen3 1.7B Q2_0	231.13 MiB	1.72 B	CPU	6	1	tg128	120.47 ± 0.16
qwen3 1.7B Q2_0	231.13 MiB	1.72 B	CPU	8	1	pp512	200.69 ± 0.23
qwen3 1.7B Q2_0	231.13 MiB	1.72 B	CPU	8	1	tg128	120.49 ± 0.08
qwen3 1.7B Q2_0	231.13 MiB	1.72 B	CPU	10	1	pp512	197.99 ± 1.67
qwen3 1.7B Q2_0	231.13 MiB	1.72 B	CPU	10	1	tg128	116.79 ± 1.11
qwen3 1.7B Q2_0	231.13 MiB	1.72 B	CPU	12	1	pp512	210.22 ± 0.35
qwen3 1.7B Q2_0	231.13 MiB	1.72 B	CPU	12	1	tg128	121.91 ± 0.16

…ng#29 Codex post-commit review found: 1. TURBO_D was QK_TURBO3 (now 32) — broke turbo4 C array sizes 2. SET_ROWS kernel turbo3-specific but instantiated for turbo4 3. Tail block drop for non-128 head dims Fixed PrismML-Eng#3 (TURBO_D). Mintplex-Labs#1 and Mintplex-Labs#2 don't affect turbo3+dk128 path. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ling (Issue PrismML-Eng#29) Three bugs from the block-size-32 refactor: 1. kernel_set_rows_turbo hardcoded turbo3 packing for turbo4 — split into separate kernel_set_rows_turbo3 and kernel_set_rows_turbo4 kernels. turbo4 now correctly does 3-bit PolarQuant + QJL residual correction. 2. Integer division in n_groups = nk0 / blocks_per_group silently dropped tail blocks for non-128-aligned head dims (e.g. dk=192). Added ceiling division with tail-group bounds checking in turbo3, and GGML_ASSERT in WHT dispatch to catch non-128-aligned tensors. 3. TURBO_D constant was semantically coupled to QK_TURBO4 — replaced with TURBO_ROT_DIM (= QK_TURBO3_GROUP) and added static_assert that QK_TURBO4 == QK_TURBO3_GROUP to guard against future drift. Closes PrismML-Eng#29 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…-cache fix: turbo4 SET_ROWS, tail-block truncation, constant coupling, stack overflow (Issue PrismML-Eng#29)

…ng#29 Codex post-commit review found: 1. TURBO_D was QK_TURBO3 (now 32) — broke turbo4 C array sizes 2. SET_ROWS kernel turbo3-specific but instantiated for turbo4 3. Tail block drop for non-128 head dims Fixed PrismML-Eng#3 (TURBO_D). Mintplex-Labs#1 and Mintplex-Labs#2 don't affect turbo3+dk128 path. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ling (Issue PrismML-Eng#29) Three bugs from the block-size-32 refactor: 1. kernel_set_rows_turbo hardcoded turbo3 packing for turbo4 — split into separate kernel_set_rows_turbo3 and kernel_set_rows_turbo4 kernels. turbo4 now correctly does 3-bit PolarQuant + QJL residual correction. 2. Integer division in n_groups = nk0 / blocks_per_group silently dropped tail blocks for non-128-aligned head dims (e.g. dk=192). Added ceiling division with tail-group bounds checking in turbo3, and GGML_ASSERT in WHT dispatch to catch non-128-aligned tensors. 3. TURBO_D constant was semantically coupled to QK_TURBO4 — replaced with TURBO_ROT_DIM (= QK_TURBO3_GROUP) and added static_assert that QK_TURBO4 == QK_TURBO3_GROUP to guard against future drift. Closes PrismML-Eng#29 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

twoxfh · 2026-05-04T01:39:57Z

Are there instruction on how to test? I was trying to test the PR with a machine that has AVX2 and SSE3 with Bonsai 1.7b gguf and did not notice a difference in pp or tg vs the current llama.cpp implementation. Likely I am doing something wrong.

pl752 · 2026-05-04T03:43:54Z

@twoxfh Hello, which build flags do you use and what does llama-bench (or other executable that you have used) say with -v flag?

Does log have lines like this?

load_tensors:   CPU_REPACK model buffer size =   189.00 MiB
or
repack: repack tensor blk.0.attn_q.weight with q1_0_8x32
or
llama_memory_breakdown_print: |   - CPU_REPACK         |                  189 =   189 +       0 +       0                |

If it is llama-server, what does it say about enabled features?
(like this)

system_info: n_threads = 6 (n_threads_batch = 6) / 12 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |

pl752 · 2026-05-04T08:37:01Z

@twoxfh If bonsai is new ternary one, then it won't work right now, as optimized kernels are not implemented yet (Q2_0 in my case is reported due to some kind of issue)

twoxfh · 2026-05-04T15:16:58Z

@pl752 I am using the Bonsai 1.7b 1bit. Ah, I see you have AVX512 and my cpu does not support it. My build parameters are simple and without any Intel drivers, -DGGML_CURL=OFF -DGGML_CUDA=OFF -DGGML_AVX512=ON. I did install BLAS but it cut my tg in half, so I removed it.

system_info: n_threads = 2 (n_threads_batch = 2) / 14 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX-VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |

pl752 · 2026-05-04T15:24:11Z

@twoxfh Weird, because kernel should not require avx512, only avx2

pl752 · 2026-05-04T15:25:11Z

@twoxfh Which command was used for launching model and what happens if -v is appended?

twoxfh · 2026-05-04T15:56:01Z

@twoxfh Which command was used for launching model and what happens if -v is appended?

@pl752 With -v it definitely it repacks to q1_0_8x32, but its the same speed for me. I am using the following command
numactl -C 0-2 ./llama-server -m Bonsai-1.7b.gguf -c 4000 --numa distribute -fa on --mmap -jinja -r 5 -t 2 -v

twoxfh · 2026-05-04T16:04:08Z

@pl752 I am getting done_getting_tensors tensor 'token_embed.weight' (q1_0) (and 114 others) cannot be used with preferred buffer type CPU_REPACK, using CPU instead. So it tries then fails.

using the gguf from https://huggingface.co/prism-ml/Bonsai-1.7B-gguf/tree/main

pl752 · 2026-05-04T16:28:35Z

@twoxfh Does llama-bench (like bin/llama-bench -m Bonsai-1.7B.gguf -fa 1 -mmp 0 -r 5 -t 6) produce higher speeds? ~~I have tested with llama-cli just now and for some reason I can't reproduce high speeds anymore, even though it worked perfectly just few days ago.~~ nvm, I have just trolled myself as I didn't plug my laptop to wall power and llama-bench just somehow didn't lose performance

pl752 · 2026-05-04T16:34:48Z

@twoxfh However there is indeed something wrong here, as I just found out that when I use llama cli there is no difference between dot implementation and repack (except maybe for preprocessing slightly), so it might be me too doing something wrong, or this thing is just hyper sensitive to ram bandwidth, let me check it further...

pl752 · 2026-05-04T16:41:31Z

@twoxfh ~~Now I am frustrated a little bit,~~ rebuilding it cleanly somehow solved the issue for me, even though it was clearly repacked before too, avx2 works fine too

pl752 · 2026-05-04T16:47:52Z

@twoxfh I can see some kind of intel with e-cores, is this why numactl is used?

pl752 · 2026-05-04T16:52:49Z

@twoxfh 'token_embed.weight' (q1_0) (and 114 others) is fine as repack isn't used for embeddings (they are just retreival by token id and dequant op, not mat_mul) and various norm tensors

pl752 · 2026-05-04T16:55:52Z

@twoxfh I think I might need to find somebody else to find out if this optimization useful only on systems like mine or there is something subtle off.

twoxfh · 2026-05-04T16:58:11Z

@twoxfh I can see some kind of intel with e-cores, is this why numactl is used?

@pl752 exactly, I try to ensure I get performance cores since I only use a couple typically. llama-bench gives me pp512 gives me 70.96(your branch) vs 61.51 (llama) and tg128 42.71(your branch) vs 38.97 (llama.cpp). It appears to be more of a divide then I thought, but as much as yours. That might be due to mine being a mobile processor vs desktop?

pl752 · 2026-05-04T17:06:10Z

@twoxfh So there actually IS some difference, my current theory is that if this laptop uses older memory type like ddr4 3200MT it's bandwidth can be maxed out. My cpu is mobile zen4 ryzen with 6 cores and LPDDR5 6400MT. There is a real known issue that outer accumulators in my new kernels don't fit to 16 ymm registers and spill to memory which make bandwidth issue worse just enough to kill most of the boost (+ the fact that it is heterogenous cpu with multiple clusters and e-cores have significantly reduced l1 data cache and NO l3 cache (l2 is used as core cluster shared instead)), this is also indicated by the fact that enabling AVX512 in my case helps performance, even though the only difference is that there are 32 ymm registers available (32 zmm in 256 instead of 512 mode). The purpose of special gemm and gemv kernel is not only to save cycles by reusing some register contents over multiple rows/columns, but by making memory access order more convenient, thus reducing overhead, however my kernels are definitely suboptimal in that sense.

pl752 · 2026-05-04T17:11:31Z

@twoxfh Also on my system there is DIRECT 1:1 inversed corellation for tg speed and model size (same number of parameters, different quant), so memory is indeed limiting factor for tg, though spilling occurs only for gemm and gemv is fine in that sense. (gemv is matrix*vector for number of rows < 4 (repeated nrows times), gemm is matrix*matrix with rows % 4 == 0 in this case).

twoxfh · 2026-05-04T17:20:08Z

@pl752 That makes a lot of sense, also mine is Intel 7 ultra 165U DDR5 5600 MT/s vs your DDR5 6400 MT/s. I saw your jump of 25% and got jealous :). Really appreciate all the effort your putting into the kernels.

pl752 · 2026-05-04T17:20:19Z

@twoxfh Hmm, only 2 threads are set for server, that usually means that there is no maxing out the memory. Windows 11 / WSL, by any chance?

twoxfh · 2026-05-04T17:26:12Z

@twoxfh Hmm, only 2 threads are set for server, that usually means that there is no maxing out the memory. Windows 11 / WSL, by any chance?

@pl752 I turned it up to 6 cores, thats about the sweet spot before losing performance adding an ecore. A lot of the time I run a little toasting and cut it back to 2. For the benchmarks I ran at 6 cores. As soon as I go above performance tanks.

pl752 · 2026-05-04T17:32:29Z

Performance tanks due to aformentioned schenanigans with cpu arch, that's expected. Do you use windows with/without WSL2 or linux? Also can you check somehow (for an example via htop with cpu clocks display turned on) what happens to cpu clock speeds when using repack and no repack (can be toggled off with -nr flag)? This cpu has 2+10 cores (vs 6 mine) and has 15W of sustained TDP with 57W peak (my advertised as 35 - 54W depending on cooling and VRM capabilities, and also I use very beefy cooling pad to minimize performance variance due to thermals), so clock can play role, as denser computations can cause higher power draw, causing clock speed to drop to fit into TDP limits (Also there is PL1 and PL2 for temporary boost power limits (last only few seconds to few minutes)).

pl752 · 2026-05-04T17:45:08Z

@twoxfh Why am asking about Windows 11 and WSL: it has security features called "Core isolation and memory integrity" which essentially wraps whole os into thin VM which give some additional hardening, but sometimes OBLITERATES performance of memory intense programs like LLMs or video games, also if VT-x is enabled and allowed in bcdboot Win 11 is notorious for trying to wrap most of the processes into sandbox, which can cause some overhead too. Also WSL2 is essentially a well optimized VM.

twoxfh · 2026-05-04T17:48:11Z

@twoxfh Why am asking about Windows 11 and WSL: it has security features called "Core isolation and memory integrity" which wrap whole os into thin VM which give some additional hardening, but sometimes OBLITERATES performance of memory intense programs like LLMs or video games, also if VT-x is enabled and allowed in bcdboot Win 11 is notorious for trying wrap most of the processes into sandbox, which can cause some overhead too. Also WSL2 is essentially a well optimized VM.

@pl752 Ah I am on Windows with WSL and Docker Desktop. I am running llama-server on a debian docker container. CPU utilization with or without repack on is roughly the same from my spot checks, it boots to 3.5 then settles to 2.8ghz after about 10 seconds. I have a bout 10gb ram free..

pl752 · 2026-05-04T17:50:39Z

@twoxfh This is the most likely reason for weirdness then (container inside linux inside vm inside windows and power throttling)

pl752 · 2026-05-04T18:15:38Z

@twoxfh Also some AMD insanity: my cpu holds 4.5ghz for tg and drops to 4.2ghz for pp with repack, -t 6 and avx512 and around 100mhz less on avx2 (note: runs are relatively short and cooling pad is set to deafening mode)

khosravipasha · 2026-05-04T20:05:44Z

Thanks, this is cool, seems there is more juice on cpu side :)

For some reason model identifies its type as Q2_0

That's kinda odd, is the ggml id changed when you do the repacking? There is few different enums for each type, one of them might have been mixed up. I will need to take a closer look, this is only during llama-bench right?

Copilot

Pull request overview

This PR extends ggml’s CPU repack/matmul pipeline to support bit-interleaved Q1_0 (8x32) repacked kernels on x86 (AVX2-focused), including the required repack format, quantization path, and kernel dispatch selection.

Changes:

Add a new repacked block layout for Q1_0 (block_q1_0x8) and implement repack from native block_q1_0.
Introduce Q8_0 quantization for 4x32 layout and add Q1_0 gemv/gemm kernels for the new repack type (generic + x86 AVX2/AVX512 builds).
Extend repack dispatch to select the Q1_0 repack/kernels on AVX2-capable systems when dimensions are compatible.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`ggml/src/ggml-cpu/repack.h`	Adds Q1_0 support to templated repack block machinery and declares new Q1_0/Q8_0 kernel entrypoints.
`ggml/src/ggml-cpu/repack.cpp`	Implements generic Q8_0 `4x32` quantization, generic Q1_0 `gemv/gemm`, Q1_0 repack routine, and dispatch selection for the new repack type.
`ggml/src/ggml-cpu/arch/x86/repack.cpp`	Adds x86 implementation for Q8_0 `4x32` quantization and AVX2/AVX512F Q1_0 `gemv/gemm` kernels.
`ggml/src/ggml-cpu/arch-fallback.h`	Wires new entrypoints into the existing “rename `_generic` when no native impl exists” mechanism for relevant architectures/build modes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pl752 · 2026-05-05T16:50:14Z

Some adjustments for copilot suggestions and avoiding problems with some versions of compilers, no performance changes expected

pl752 · 2026-05-05T18:16:22Z

I have tried to aleviate issue with register pressure in gemm and achieved slight improvements, tg not affected.

flow	run	dot	repack	delta
`AVX2`	`pp512`	190.98 t/s	213.79 t/s	+11.94%
`AVX512`	`pp512`	219.96 t/s	228.14 t/s	+3.71%

Might be interesting for @twoxfh
I have another repacking approach in mind, will experiment with it later.

Implemented bit-interleaved Q1_0 8x32 repack kernels for x86 AVX2

d11c45d

github-actions Bot added the ggml label May 2, 2026

pl752 marked this pull request as ready for review May 2, 2026 11:09

pl752 mentioned this pull request May 2, 2026

(Prototype) q1_0 nrc = 2 and diabolic tiles branches #21

Draft

khosravipasha requested review from Copilot and khosravipasha May 5, 2026 06:25

Copilot started reviewing on behalf of khosravipasha May 5, 2026 06:26 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread ggml/src/ggml-cpu/repack.cpp

Comment thread ggml/src/ggml-cpu/arch/x86/repack.cpp Outdated

pl752 added 2 commits May 5, 2026 19:38

Corrected gemv with assumption of nr==1 for consistency

5a9ab76

Unrolled mm256 registers to avoid register pointers

3fa569d

Split 8 cols to two passes to reduce register pressure

4e34da1

Conversation

pl752 commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twoxfh commented May 4, 2026

Uh oh!

pl752 commented May 4, 2026

Uh oh!

pl752 commented May 4, 2026

Uh oh!

twoxfh commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented May 4, 2026

Uh oh!

pl752 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twoxfh commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twoxfh commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented May 4, 2026

Uh oh!

pl752 commented May 4, 2026

Uh oh!

pl752 commented May 4, 2026

Uh oh!

twoxfh commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twoxfh commented May 4, 2026

Uh oh!

pl752 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twoxfh commented May 4, 2026

Uh oh!

pl752 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twoxfh commented May 4, 2026

Uh oh!

pl752 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

khosravipasha commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

pl752 commented May 5, 2026

pl752 commented May 2, 2026 •

edited

Loading

twoxfh commented May 4, 2026 •

edited

Loading

pl752 commented May 4, 2026 •

edited

Loading

twoxfh commented May 4, 2026 •

edited

Loading

twoxfh commented May 4, 2026 •

edited

Loading

pl752 commented May 4, 2026 •

edited

Loading

pl752 commented May 4, 2026 •

edited

Loading

pl752 commented May 4, 2026 •

edited

Loading

twoxfh commented May 4, 2026 •

edited

Loading

pl752 commented May 4, 2026 •

edited

Loading

pl752 commented May 4, 2026 •

edited

Loading

pl752 commented May 4, 2026 •

edited

Loading

pl752 commented May 4, 2026 •

edited

Loading

pl752 commented May 4, 2026 •

edited

Loading

pl752 commented May 4, 2026 •

edited

Loading

pl752 commented May 4, 2026 •

edited

Loading

khosravipasha commented May 4, 2026 •

edited

Loading

pl752 commented May 5, 2026 •

edited

Loading