Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1426 commits
Select commit Hold shift + click to select a range
3d50088
perf(medium): copyMediumTree inline OS-native destination concat — de…
Snider May 23, 2026
34a4f1e
perf(chat): split normaliseRole into inline-able canonical fast path …
Snider May 23, 2026
2e3446b
perf(backend): W8-A2 reinterpret inference.Message → metal.ChatMessag…
Snider May 23, 2026
8670a01
perf(distill): pool DistillationBatchLoss vocab-sized scratch — -100%…
Snider May 23, 2026
432cc54
perf(chat): templateName canonical fast path before Trim/Lower — Arch…
Snider May 23, 2026
47ef15a
perf(dataset): firstNonEmpty variadic → fixed two-arg
Snider May 23, 2026
d1f9392
merge: cladius-lane-Wave10-W10AL (medium.go inline-concat — WalkMediu…
Snider May 23, 2026
2ddf53d
merge: cladius-lane-Wave10-W10AM (distill EmitProbe pool + temp strin…
Snider May 23, 2026
2d4b759
merge: cladius-lane-Wave10-W10AJ (chat normaliseRole/templateName inl…
Snider May 23, 2026
7c19712
perf(backend): drop redundant clone in toRootAdapterInfo — Typical -8…
Snider May 23, 2026
0ce0ab8
bench(backend): add ToRootMetrics_LoRA case demonstrating 0-alloc aft…
Snider May 23, 2026
72fa221
perf(dataset): MessagesToSample assistantIdx raw-compare fast path
Snider May 23, 2026
45e9fdf
perf(dataset): jsonRecord.toSample pointer receiver — eliminate ~256 …
Snider May 23, 2026
050db93
merge: cladius-lane-Wave10-W10AN (backend.go — toRootAdapterInfo -84%…
Snider May 23, 2026
fa7089a
merge: cladius-lane-Wave10-W10AK (dataset/jsonl.go massive wins — Loa…
Snider May 23, 2026
8c90e5b
test(gguf): add tokeniser vocab array bench
Snider May 23, 2026
6595b34
perf(chaptersmoke): slug — single buffer prefix+body, in-place dash trim
Snider May 23, 2026
f88cedf
perf(gguf): []string fast path for tokeniser-embedded vocab arrays
Snider May 23, 2026
b5ce087
perf(chaptersmoke): bundleURI single-alloc — append slug body in place
Snider May 23, 2026
d35b532
merge: cladius-lane-Wave11-W11I (gguf []string array specialisation —…
Snider May 23, 2026
cd2bb8b
perf(chaptersmoke): W9-Y sentinel errors — lift 9 constant-message *Errs
Snider May 23, 2026
7b75bb7
perf(chaptersmoke): pre-size countingStore.unique to bundle block count
Snider May 23, 2026
5f8c580
perf(blockcache): pre-compose hash header buffer — collapse 4 length-…
Snider May 23, 2026
23b95ef
perf(chaptersmoke): lift Capture labels literal to package var
Snider May 23, 2026
ec445b1
bench(chaptersmoke): add benchmarks for slug / bundleURI / countingStore
Snider May 23, 2026
10b903f
perf(dataset_stream): ownership-flip flush + lazy first-add + shared …
Snider May 23, 2026
2592e9e
merge: cladius-lane-Wave10-W10AO (chaptersmoke — Slug_Empty -58%, Bun…
Snider May 23, 2026
7874421
perf(tokenizer): DecodeOne single-id fast path — IDToken_PlainToken 9…
Snider May 23, 2026
8f42683
perf(probe): batch payload-pointer scratch across Recorder.Events — 1…
Snider May 23, 2026
9cc4a75
chore(external): bump go-inference → W11-L Generator interface (enabl…
Snider May 23, 2026
b23e0a0
merge: cladius-lane-Wave11-W11J (dataset_stream datasetPacker sibling…
Snider May 23, 2026
61889fa
merge: cladius-lane-Wave11-W11K (tokenizer DecodeOne(int32) — IDToken…
Snider May 23, 2026
7fdc42d
perf(pack): fast-path empty ApplyOptions + direct byte-buffer IssueSu…
Snider May 23, 2026
7cc6970
docs(probe): document CloneEvent per-payload allocation shape vs Even…
Snider May 23, 2026
6e9e901
perf(inference_contract): toInferenceMemoryPlan cached quantization l…
Snider May 23, 2026
e777574
test(model): add parseConfigProbe parse-only benches
Snider May 23, 2026
1c11e9b
feat(bundle): add SaveCompact for cold-storage newlineless JSON
Snider May 23, 2026
c23b869
merge: cladius-lane-Wave11-W11O (Bundle SaveCompact — Typical -91.4% …
Snider May 23, 2026
4f03722
perf(bundle): FileHash streaming for large files
Snider May 23, 2026
712e1f9
perf(fast_eval): modelDecodeGenerate via pooled struct (W11-M)
Snider May 23, 2026
76d30fb
merge: cladius-lane-Wave11-W11M (modelDecodeGenerate pooled struct — …
Snider May 23, 2026
c433718
merge: cladius-lane-Wave11-W11P (Bundle FileHash size-conditional str…
Snider May 23, 2026
767f525
perf(model): modelConfigProbe hand-rolled UnmarshalJSON walker
Snider May 23, 2026
d4e7a39
perf(blockcache): pre-render aligned prefix_tokens labels in New — -4…
Snider May 23, 2026
d869b05
merge: cladius-lane-Wave11-W11N (modelConfigProbe hand-rolled Unmarsh…
Snider May 23, 2026
0b5ab8d
perf(memvid/cli): expand bench coverage for golden-path
Snider May 23, 2026
2505ae0
merge: cladius-lane-Wave10-W10AQ (inference_contract toInferenceMemor…
Snider May 23, 2026
0fb366a
test(metal): tokeniser bench coverage for W11-S re-pass
Snider May 23, 2026
98f0ae8
bench(metal): add sampler coverage — MinP/Temperature/SuppressedGreed…
Snider May 23, 2026
35d4b1a
perf(metal): tokeniser indexIn via core.Index (stdlib Rabin-Karp)
Snider May 23, 2026
bfde381
perf(metal): applyRepeatPenalty pooled scratch + sort-dedup (W11-R)
Snider May 23, 2026
79713f1
perf(metal): suppressTokenLogits pooled scratch + sort-dedup (W11-R)
Snider May 23, 2026
ced32c9
perf(memvid/cli): drop dead reassignment in Resolve
Snider May 23, 2026
14da312
merge: cladius-lane-Wave10-W10AP (blockcache+probe+pack — ApplyOption…
Snider May 23, 2026
f81b615
perf(memvid/cli): stack-buffer Put tag keys for ≤16-tag fast path
Snider May 23, 2026
64ffe6a
perf(metal): hostUnsuppressedGreedyToken slices.BinarySearch + pooled…
Snider May 23, 2026
b1cf9a8
perf(memvid/cli): exact-size Put args slice (drop worst-case over-alloc)
Snider May 23, 2026
79ede98
perf(metal): cache_quantized reuse shape buffer across Update
Snider May 23, 2026
c9e945a
perf(metal): cache_paged_metal swap W10-G pool for W11-A stack-buffer…
Snider May 23, 2026
8f55cea
perf(memvid/cli): drop duplicate ready() call in view()
Snider May 23, 2026
54d7204
perf(memvid/filestore): bit-exact binary round-trip parity + Resolve/…
Snider May 23, 2026
8b7f797
perf(metal): hostUnsuppressedGreedyToken zero-copy float32 view (W11-R)
Snider May 23, 2026
caddd1d
perf(metal): cache_quantized Size/Dim over Shape() in hot paths
Snider May 23, 2026
afa941d
perf(metal): tokeniser Decode single-pass Builder with inline ▁→space
Snider May 23, 2026
fd34298
merge: cladius-lane-Wave11-W11Q (pkg/memvid cli shim — Put -128-256 B…
Snider May 23, 2026
0765ee6
perf(metal): decodeGPT2Bytes pre-sized buf + AsString + inline UTF-8
Snider May 23, 2026
ac539ad
perf(metal): scalar-inline TopP/MinP/MinPSampler bridge ops (W11-R)
Snider May 23, 2026
6b5d4a8
perf(metal): normalizeSentencePieceSegment single-pass + zero-alloc p…
Snider May 23, 2026
4092898
perf(metal): cache_paged_metal pool scorePages slice in SDPAPaged
Snider May 23, 2026
baa5bb1
merge: cladius-lane-Wave11-W11R (sample.go golden-path — hostUnsuppre…
Snider May 23, 2026
8c7895a
perf(metal): cache PagedKVCache storageKVPair slice-free conversion
Snider May 23, 2026
83b6516
perf(metal): cache_paged_metal swap Zeros for Zeros4 in page-grow
Snider May 23, 2026
90450ed
perf(metal): cache cacheTail NumDims fast-path skips Shape() allocs
Snider May 23, 2026
4f39843
perf(metal): cache_quantized NumDims gate in Update entry
Snider May 23, 2026
2faa1d6
perf(metal): metal_kernel Apply inline-C inputVec collapse — N+2→1 cg…
Snider May 23, 2026
1b7d841
merge: cladius-lane-Wave11-W11U (QuantizedKVCache_Q8Q8 -69% allocs / …
Snider May 23, 2026
b240e07
perf(metal): cache_paged_metal embed shape scratch as [4]int32 fields
Snider May 23, 2026
2605bf2
perf(metal): prompt_cache Slice4 + ShapeInto on golden-path restore
Snider May 23, 2026
3f19df6
perf(metal): bpeMerge direct min-heap bypasses container/heap interfa…
Snider May 23, 2026
ef43932
perf(metal): generate.go use zero-copy view in inspectAttentionCache …
Snider May 23, 2026
349ae62
perf(metal): prompt_cache drop Shape() heap allocs in validators
Snider May 23, 2026
80782a3
perf(metal): encodeSentencePieceSegment / encodeGPT2Segment via split…
Snider May 23, 2026
45d97ff
perf(metal): kv_snapshot.go use zero-copy views in inspectKVCacheRang…
Snider May 23, 2026
edf41a9
perf(metal): cache_paged_metal short-circuit empty GO_MLX_PAGED_KV_PA…
Snider May 23, 2026
171b38c
perf(metal): cachedBPETokens defer-free RLock unwind
Snider May 23, 2026
1fbb9d2
merge: cladius-lane-Wave11-W11T (PagedKV W10-G→W11-A substrate evolut…
Snider May 23, 2026
e4f5d2a
perf(metal): Zeros4 rank-4 scalar-pass + fixed-cache restore swap
Snider May 23, 2026
3658c93
merge: cladius-lane-Wave11-W11S (tokenizer full pass — indexIn -93%, …
Snider May 23, 2026
ad109ff
perf(metal): metal_kernel add ApplyOne fast path — single cgo crossin…
Snider May 23, 2026
7a148cb
perf(metal): migrate single-output kernel callers to ApplyOne — 1 all…
Snider May 23, 2026
1f71606
test(metal): bench_test.go realistic benches for cache snapshot fan-o…
Snider May 23, 2026
a5c82d0
merge: cladius-lane-Wave11-W11X (zero-copy view sweep — inspectAttent…
Snider May 23, 2026
08e464c
perf(metal): prompt_cache appendRestore — skip per-cache slice literal
Snider May 23, 2026
b4782bd
perf(metal): prompt_cache lazy snapshotOffsets on failure path only
Snider May 23, 2026
6f8d4e2
merge: cladius-lane-Wave11-W11W (prompt_cache.go residual — copyCache…
Snider May 23, 2026
e1ba72d
perf(metal): metal_kernel add DispatchOne — collapse cfg+apply into s…
Snider May 23, 2026
1b52543
perf(metal): migrate single-output kernel callers to DispatchOne — dr…
Snider May 23, 2026
5f0bb21
merge: cladius-lane-Wave11-W11V (metal_kernel.go 3 new substrate prim…
Snider May 23, 2026
943bd3d
feat(metal): add Reshape1 + Reshape2 scalar-pass primitives (W11-AC)
Snider May 23, 2026
f4ad004
feat(metal): add Slice1 / Slice2 / SliceUpdateInplace2 scalar-pass pr…
Snider May 23, 2026
6748516
feat(metal): Slice4WithStream / SliceUpdateInplace4WithStream
Snider May 23, 2026
d01ab7d
test(metal): fast bench gaps — nativePagedSingleToken + singleTokenCa…
Snider May 23, 2026
55abf39
perf(metal): fast nativePagedSingleTokenAttention page-handle pool (W…
Snider May 23, 2026
e466423
perf(metal): packQ4Cached use Reshape1 / Reshape2 / Slice2 (W11-AC)
Snider May 23, 2026
7780a24
fix(metal): pinned_array uintptr_t payload — drop unsafe.Pointer(id) …
Snider May 23, 2026
82025de
feat(metal): materialiseFloat32ViewFast skips Materialize for contigu…
Snider May 23, 2026
3a97899
perf(metal): cache use stream-passing Slice4 variant
Snider May 23, 2026
adcf9c8
perf(metal): unpackQ4 use Reshape1 + Slice1 (W11-AC)
Snider May 23, 2026
166110d
perf(metal): fast SDPA mode-string cache (W11-Y)
Snider May 23, 2026
de86a65
merge: cladius-lane-Wave11-W11AD (Slice4WithStream + SliceUpdateInpla…
Snider May 23, 2026
9474ee8
perf(metal): probe.go use materialiseFloat32ViewFast in summarizeProb…
Snider May 23, 2026
c6d190a
perf(metal): maxAll use Reshape1 (W11-AC)
Snider May 23, 2026
9e2a84b
perf(metal): pinned_array buffer pool — KVShape 3->2 allocs, 120->56 …
Snider May 23, 2026
619e534
test(metal): fast singleTokenCausalMask bench coverage (W11-Y)
Snider May 23, 2026
c8f1642
merge: cladius-lane-Wave11-W11AC (Reshape1/Reshape2 + Slice1/Slice2/S…
Snider May 23, 2026
aef1070
merge: cladius-lane-Wave11-W11AF (pinned_array.go vet cleared via run…
Snider May 23, 2026
cafab9f
merge: cladius-lane-Wave11-W11Y (fast.go nativePagedSingleToken sync.…
Snider May 23, 2026
baf6352
docs(metal): materialiseFloat32ViewFast contract — caller must pre-Ev…
Snider May 23, 2026
b0b0222
perf(metal): generate.go use materialiseFloat32ViewFast in inspectAtt…
Snider May 23, 2026
7d32941
perf(metal): kv_snapshot.go use materialiseFloat32ViewFast in inspect…
Snider May 23, 2026
4b8fa13
merge: cladius-lane-Wave11-W11AE (materialiseFloat32ViewFast skips Ma…
Snider May 23, 2026
a0cb57d
feat(state): add retained state container workflow
Snider May 24, 2026
0400b17
feat(state): wake from embedded kv payload
Snider May 24, 2026
90a4e28
feat(state): borrow kv region payloads
Snider May 24, 2026
d4ee520
feat(state): compact book runs across wake boundary
Snider May 24, 2026
715233c
feat(state): generate folded summaries for compact state
Snider May 24, 2026
40e4af8
feat(state): add Lemma new-session default
Snider May 24, 2026
f4e3d26
fix(chat): align gemma prompts with native template
Snider May 24, 2026
1f3a623
fix(chat): match gemma4 generation template
Snider May 24, 2026
019a927
perf(kv): stream partial state prefix restore
Snider May 24, 2026
67a4c48
fix(bench): share retained prompt contract
Snider May 24, 2026
696ec78
perf(state): avoid restore source ref copies
Snider May 24, 2026
000b9ae
perf(metal): skip duplicate fixed cache restore copy
Snider May 24, 2026
06a236d
fix(metal): avoid retained stop double close
Snider May 24, 2026
dda653e
fix(chat): align gemma4 assistant continuations
Snider May 24, 2026
a6a5946
test(bench): report retained output quality issues
Snider May 24, 2026
ed5a177
test(bench): align comparator prompt modes
Snider May 24, 2026
a71d225
test(bench): flag repeated table-cell output loops
Snider May 24, 2026
6bd906b
test(bench): record aligned llama direct anchor
Snider May 24, 2026
1892aa4
test(bench): align gemma4 stop diagnostics
Snider May 24, 2026
57fd45e
test(bench): share gemma4 comparator stops
Snider May 24, 2026
31d0475
docs(goal): reject noisy direct turn fixture
Snider May 24, 2026
b897083
test(bench): add clean state ramp fixture
Snider May 24, 2026
e2e1985
test(bench): verify gemma4 prompt contract
Snider May 24, 2026
60588b4
test(bench): add context fixture mode
Snider May 24, 2026
fb0edd7
fix(state): remove fixed-turn compaction
Snider May 24, 2026
423a036
fix(state): restore overflow compact trigger
Snider May 24, 2026
e5702e9
fix(state): suppress gemma4 eos list
Snider May 24, 2026
2665475
fix(state): restore overflow-only compaction threshold
Snider May 24, 2026
ccc24f1
fix(state): keep compaction overflow-only
Snider May 24, 2026
a2c724f
docs(goal): record request-context llama parity gap
Snider May 24, 2026
fab0d07
fix(state): keep overflow folding unforced
Snider May 24, 2026
12bd014
fix(metal): move shared Gemma4 KV without cloning
Snider May 24, 2026
00f85f8
fix(metal): keep fixed Gemma4 state for retained lanes
Snider May 24, 2026
74d811b
fix(api): remove 65k context defaults
Snider May 24, 2026
73e32d4
feat(api): report Gemma4 cache topology
Snider May 24, 2026
ced1eb9
fix(api): bound Gemma4 local caches by default
Snider May 24, 2026
5e3e982
fix(api): keep Gemma4 fast lane paged by default
Snider May 24, 2026
ec57514
fix(cli): drain cancelled profile streams
Snider May 24, 2026
0827657
fix(cli): stop synthesising fixed Gemma4 state budgets
Snider May 24, 2026
84368e1
docs(goal): record paged state ramp comparator row
Snider May 24, 2026
2e9fb95
fix(bench): stop using runtime workspace env
Snider May 24, 2026
8f6e077
fix(metal): skip unit temperature sampler node
Snider May 24, 2026
5789ed4
bench(metal): capture compiled sampler diagnostic
Snider May 24, 2026
7d69a55
fix(metal): retire paged full kv materialise path
Snider May 24, 2026
c420c89
fix(api): preserve explicit context through memory planning
Snider May 24, 2026
9b21c17
fix(cli): use Lemma new-session default prompt
Snider May 24, 2026
74b9eba
fix(metal): keep fast lane off fixed cache
Snider May 24, 2026
2c09927
fix(cmd): keep fixed cache out of fast lane
Snider May 24, 2026
645a992
perf(metal): fast-path contiguous pinned state arrays
Snider May 24, 2026
c2b5479
fix(cmd): block fixed cache profile paths
Snider May 24, 2026
3e3a5f1
fix(metal): ignore ambient fixed cache gates
Snider May 24, 2026
ee34eea
test(api): lock state kv production invariants
Snider May 24, 2026
32f7574
feat(cmd): record state wake memory deltas
Snider May 24, 2026
5c4dbf9
fix(cmd): remove context gate selector
Snider May 24, 2026
b54aa6f
fix(metal): honour async decode prefetch runtime gate
Snider May 24, 2026
2cda2a6
fix(cmd): guard against 65k kv boundary
Snider May 24, 2026
27c6f37
perf(metal): expose decode prefetch phase
Snider May 24, 2026
cbd10e8
test(cmd): remove artificial context cutoff guard
Snider May 24, 2026
667977e
perf(metal): prefetch dirty paged kv state
Snider May 24, 2026
da7df18
test(cmd): block archived context threshold lane
Snider May 24, 2026
955188c
docs(goal): reject prepared sampler prefetch path
Snider May 24, 2026
58723b6
perf(metal): split async prefetch trace buckets
Snider May 24, 2026
d5ec666
fix(cli): keep unarmed compaction out of state ramp limit
Snider May 24, 2026
c647c27
perf(metal): detach logits at token eval boundary
Snider May 24, 2026
74d27ce
bench(goal): retire 70k retained defaults
Snider May 24, 2026
dc27998
docs(goal): record seeded 100k retained evidence
Snider May 24, 2026
c8da2de
fix(cli): stop touching fixed-cache profile gates
Snider May 24, 2026
02bb3fd
perf(trace): expose paged concat decode events
Snider May 24, 2026
40b050d
test(cli): guard archived context threshold
Snider May 24, 2026
261b56c
docs(goal): record rejected decode gates
Snider May 24, 2026
79389a3
test(cli): drop archived context cutoff
Snider May 24, 2026
ddc21b3
perf(metal): keep local paged windows compact
Snider May 24, 2026
fde259d
perf(metal): keep q4 last logits on graph path
Snider May 25, 2026
39eb6d7
test(metal): benchmark paged attention fast path
Snider May 25, 2026
0079ed7
test(metal): bound native paged attention diagnostic
Snider May 25, 2026
55a7a4b
fix(metal): preserve shared paged kv reuse
Snider May 25, 2026
8d5a513
fix(metal): skip no-op last logits slice
Snider May 25, 2026
ff470e5
docs(goal): record rejected decode probes
Snider May 25, 2026
d8c7062
fix(metal): avoid local paged window recompaction
Snider May 25, 2026
032587b
perf(metal): compile top-k top-p sampler
Snider May 25, 2026
b0f393e
test(metal): pin local rope precompute probe
Snider May 25, 2026
24d4538
fix(metal): keep trace prefetch production shaped
Snider May 25, 2026
1f3a7b0
fix(metal): avoid empty sdpa handles
Snider May 25, 2026
cf44876
docs(goal): record rejected decode probes
Snider May 25, 2026
a4e5363
docs(goal): record rejected native sampler probe
Snider May 25, 2026
a50f4b8
docs(goal): reject sampled token lookahead
Snider May 25, 2026
a6ab570
test(metal): guard sampled token prefetch parity
Snider May 25, 2026
56b6d5e
test(metal): guard retained sampled token prefetch
Snider May 25, 2026
d4f8a44
perf(metal): drop concat parent slice allocation
Snider May 25, 2026
a354f9d
perf(metal): pool eval output handles
Snider May 25, 2026
214502e
perf(metal): use fixed-rank reshape in decode loop
Snider May 25, 2026
2114d4c
perf(metal): reuse raw shape in last-token logits
Snider May 25, 2026
5c1efbe
perf(metal): use scalar reshape for token inputs
Snider May 25, 2026
15443de
docs(goal): record scalar reshape smoke
Snider May 25, 2026
5ea5aad
docs(goal): record full-output request-context row
Snider May 25, 2026
5c53104
perf(metal): fuse sampler suppression into top-k top-p
Snider May 25, 2026
31149ef
docs(goal): refresh llama cpp request anchor
Snider May 25, 2026
63204f6
perf(metal): promote wider paged kv pages
Snider May 25, 2026
e95a795
docs(goal): record default page geometry row
Snider May 25, 2026
4f8c00f
docs(goal): reject wider paged kv probe
Snider May 25, 2026
3c6ff60
test(metal): measure flat last-token logits
Snider May 25, 2026
27e28a6
test(metal): record scalar token sync probe
Snider May 25, 2026
01127d4
test(metal): measure sample eval boundary
Snider May 25, 2026
cd184d5
test(metal): measure mixed kv attention dtype
Snider May 25, 2026
053eda0
test(metal): pin production prefetch benchmark shape
Snider May 25, 2026
0a118eb
perf(metal): collapse compiled callone boundary
Snider May 25, 2026
e2fae30
docs(goal): record callone retained workflow proof
Snider May 25, 2026
260f041
perf(metal): collapse two-array concat boundary
Snider May 25, 2026
d9bd011
perf(metal): tighten paged dirty state marking
Snider May 25, 2026
e84e3a8
bench(metal): cover categorical sampler key path
Snider May 25, 2026
ca0aad3
perf(metal): create decode token inputs directly
Snider May 25, 2026
ac64e4d
docs(runtime): clarify benchmark handover state
Snider May 25, 2026
138dec1
docs(todo): add current handover checkpoint
Snider May 25, 2026
40f36cf
fix(cmd): align driver profile defaults
Snider May 25, 2026
8ab56ac
perf(cmd): pre-size fast lane restore gates
Snider May 25, 2026
e54d1d9
bench(cmd): cover runtime gate reporting
Snider May 25, 2026
1c4dba0
perf(cmd): reduce runtime gate report allocation
Snider May 25, 2026
89de7f8
perf(cmd): stream state-ramp turn prompts
Snider May 25, 2026
0454846
perf(cmd): scan Gemma visible output once
Snider May 25, 2026
01a30a0
perf(metal): preallocate token phase traces
Snider May 25, 2026
f98c5e9
fix(cmd): keep trace text opt-in
Snider May 25, 2026
6383f55
fix(cmd): default state ramp to long-form turns
Snider May 25, 2026
463a072
docs(goal): record current binary smoke
Snider May 25, 2026
6c5b1cd
perf(metal): share native paged scratch
Snider May 25, 2026
df5c8ac
docs(repo): refresh handover state
Snider May 25, 2026
a332176
chore(cmd): pin native paged gate contract
Snider May 25, 2026
d254e67
perf(metal): stack prefill cache state eval
Snider May 25, 2026
eddc5cb
test(metal): bound paged cache benchmark memory
Snider May 25, 2026
494a3a3
perf(metal): stack gemma4 gate split slices
Snider May 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 6 additions & 0 deletions .codex/environments/environment.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# THIS IS AUTOGENERATED. DO NOT EDIT MANUALLY
version = 1
name = "go-mlx"

[setup]
script = ""
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Build artifacts
build/
bin/
*.dylib
*.so
*.a
Expand Down
12 changes: 12 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,15 @@
path = external/go-io
url = https://github.com/dappcore/go-io.git
branch = dev
[submodule "external/go-ai"]
path = external/go-ai
url = https://github.com/dappcore/go-ai.git
branch = dev
[submodule "external/go-ml"]
path = external/go-ml
url = https://github.com/dappcore/go-ml.git
branch = dev
[submodule "external/go-cgo"]
path = external/go-cgo
url = https://github.com/dappcore/go-cgo.git
branch = dev
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ All Go code lives under `go/`:
`nomlxlm` removes it)
- `go/cmd/violet/` and `go/pkg/daemon/` — local Violet Unix-socket sidecar
- `cpp/` — C++ side companion (CLion-side worktree)
- `lib/mlx/` — upstream MLX submodule pinned at `v0.30.1`
- `lib/mlx/` — upstream MLX submodule pinned at `v0.31.1`
- `patches/` — local patches against `lib/mlx` (manual apply only)
- `docs/`, `examples/` — markdown documentation and per-feature usage examples

Expand Down
7 changes: 4 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,17 +44,18 @@ After Mantis #1241, all Go code lives under `go/`:
```
go/ Go module root (dappco.re/go/mlx)
*.go Public root API: model, tokenizer, compute, training, eval, distill, GRPO, hf-fit, merge, gguf-quantize, kv-snapshot, lora-fuse
cmd/mlx/ CLI tool (built with `-o core-mlx`; consumers rename: lthn-mlx)
cmd/violet/ Unix-socket sidecar daemon
internal/metal/ All CGO code (mlx-c bindings)
mlxlm/ CGO-free Python subprocess backend
pkg/daemon/ Daemon implementation
pkg/memvid/ Memvid storage CLI
pkg/memvid/ Deprecated State codec compatibility shim
tests/ Integration tests
cpp/ C++ side (CLion-side companion)
docs/ Markdown documentation
examples/ Per-feature usage examples (markdown)
external/ Vendored core libraries
lib/mlx/ Upstream mlx submodule (pinned at v0.30.1)
lib/mlx/ Upstream mlx submodule (pinned at v0.31.1)
patches/ Local patches to lib/mlx (not auto-applied)
```

Expand Down Expand Up @@ -127,7 +128,7 @@ Architecture is detected from `config.json` (`model_type`) for safetensors and f

## Submodule Patches

`lib/mlx` is pinned at upstream tag `v0.30.1`. Local patches that we do not upstream live in `patches/` as standalone diff files (e.g. `patches/mlx-metallib-path.patch` for the `MLX_METALLIB_PATH` env-var override). Patches are not auto-applied — run them inside the submodule manually when their function is needed:
`lib/mlx` is pinned at upstream tag `v0.31.1`. Local patches that we do not upstream live in `patches/` as standalone diff files (e.g. `patches/mlx-metallib-path.patch` for the `MLX_METALLIB_PATH` env-var override). Patches are not auto-applied — run them inside the submodule manually when their function is needed:

```bash
git -C lib/mlx apply ../../patches/mlx-metallib-path.patch
Expand Down
6 changes: 5 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ cmake_minimum_required(VERSION 3.24)
project(mlx)

set(CMAKE_OSX_DEPLOYMENT_TARGET "26.0" CACHE STRING "Minimum macOS version")
set(CMAKE_CXX_STANDARD 23)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS ON)

if(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
set(CMAKE_INSTALL_PREFIX "${CMAKE_CURRENT_SOURCE_DIR}/dist" CACHE PATH "" FORCE)
Expand All @@ -17,7 +20,8 @@ set(CMAKE_INSTALL_RPATH "@loader_path")

include(FetchContent)

set(MLX_C_GIT_TAG "v0.4.1" CACHE STRING "")
set(MLX_C_GIT_TAG "v0.6.0" CACHE STRING "")
set(FETCHCONTENT_SOURCE_DIR_MLX "${CMAKE_CURRENT_SOURCE_DIR}/lib/mlx" CACHE PATH "Local patched MLX source")

FetchContent_Declare(
mlx-c
Expand Down
4,028 changes: 4,028 additions & 0 deletions GOAL.md

Large diffs are not rendered by default.

272 changes: 272 additions & 0 deletions IDEAS.md

Large diffs are not rendered by default.

70 changes: 9 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
[![Go Reference](https://pkg.go.dev/badge/dappco.re/go/mlx.svg)](https://pkg.go.dev/dappco.re/go/mlx)
[![Licence: EUPL-1.2](https://img.shields.io/badge/Licence-EUPL--1.2-blue.svg)](LICENCE)
[![License: EUPL-1.2](https://img.shields.io/badge/License-EUPL--1.2-blue.svg)](LICENSE.md)
[![Go Version](https://img.shields.io/badge/Go-1.26-00ADD8?style=flat&logo=go)](go.mod)

# go-mlx

Native Apple Metal GPU inference via mlx-c CGO bindings, implementing the `inference.Backend` and `inference.TextModel` interfaces from go-inference for Apple Silicon (M1-M4). Supports Gemma 3, Gemma 4 (dense and MoE), Qwen 2/3, and Llama 3 architectures from HuggingFace safetensors directories and GGUF checkpoints, with fused Metal kernels for RMSNorm, RoPE, scaled dot-product attention, KV cache management, LoRA fine-tuning with AdamW, and batch inference. The root package also exposes an RFC-style direct model API (`mlx.LoadModel`, `model.Generate`, `model.GenerateStream`) and a non-LLM frame-compute API (`mlx.NewSession`, `Session.BeginFrame`, `Session.FinishFrame`, `PixelBuffer`, `KernelRGB565ToRGBA8`, `KernelNearestScale`, `KernelScanlineFilter`, `KernelCRTFilter`, `KernelSoftenFilter`, `KernelSharpenFilter`) for Apple GPU-accelerated image and emulator workloads. A Python subprocess backend (`mlxlm`) is provided as a CGO-free alternative. Platform-restricted: `darwin/arm64` only; a no-op stub compiles on all other platforms.
Native Apple Metal GPU inference via mlx-c CGO bindings, implementing the `inference.Backend` and `inference.TextModel` interfaces from go-inference for Apple Silicon (M1-M4). Supports Gemma 3, Gemma 4 (dense and MoE), Qwen 2/3, and Llama 3 architectures from HuggingFace safetensors directories and GGUF checkpoints, with fused Metal kernels for RMSNorm, RoPE, scaled dot-product attention, KV cache management, LoRA fine-tuning with AdamW, and batch inference. The root package also exposes an RFC-style direct model API (`mlx.LoadModel`, `model.Generate`, `model.GenerateStream`) and a non-LLM frame-compute API (`mlx.NewSession`, `PixelBuffer`, `KernelRGB565ToRGBA8`, `KernelNearestScale`) for Apple GPU-accelerated image and emulator workloads. A Python subprocess backend (`mlxlm`) is provided as a CGO-free alternative. Platform-restricted: `darwin/arm64` only; a no-op stub compiles on all other platforms.

**Module**: `dappco.re/go/mlx`
**Licence**: EUPL-1.2
**Language**: Go 1.26
**Language**: Go 1.25

## Quick Start

Expand All @@ -17,22 +17,16 @@ import (
"context"
"fmt"

"dappco.re/go/inference"
"dappco.re/go/core/inference"
_ "dappco.re/go/mlx" // registers "metal" backend via init()
)

model, err := inference.LoadModel("/Volumes/Data/lem/safetensors/gemma-3-1b/")
if err != nil {
panic(err)
}
defer model.Close()

for tok := range model.Generate(context.Background(), "Hello", inference.WithMaxTokens(256)) {
fmt.Print(tok.Text)
}
if err := model.Err(); err != nil {
panic(err)
}
```

## Root API
Expand Down Expand Up @@ -72,41 +66,29 @@ if err != nil {
}
defer session.Close()

src, err := session.NewPixelBuffer(mlx.PixelBufferDesc{
src, _ := session.NewPixelBuffer(mlx.PixelBufferDesc{
Width: 320,
Height: 224,
Stride: 640,
Format: mlx.PixelRGB565,
})
if err != nil {
panic(err)
}
rgba, err := session.NewPixelBuffer(mlx.PixelBufferDesc{
rgba, _ := session.NewPixelBuffer(mlx.PixelBufferDesc{
Width: 320,
Height: 224,
Stride: 1280,
Format: mlx.PixelRGBA8,
})
if err != nil {
panic(err)
}
scaled, err := session.NewPixelBuffer(mlx.PixelBufferDesc{
scaled, _ := session.NewPixelBuffer(mlx.PixelBufferDesc{
Width: 960,
Height: 672,
Stride: 3840,
Format: mlx.PixelRGBA8,
})
if err != nil {
panic(err)
}

frameBytes := make([]byte, src.Descriptor().SizeBytes())
if err := src.Upload(frameBytes); err != nil {
panic(err)
}
if err := session.BeginFrame(); err != nil {
panic(err)
}
if err := session.Run(mlx.KernelRGB565ToRGBA8, mlx.KernelArgs{
Inputs: map[string]mlx.Buffer{"src": src},
Outputs: map[string]mlx.Buffer{"dst": rgba},
Expand All @@ -119,15 +101,7 @@ if err := session.Run(mlx.KernelNearestScale, mlx.KernelArgs{
}); err != nil {
panic(err)
}
if err := session.Run(mlx.KernelScanlineFilter, mlx.KernelArgs{
Inputs: map[string]mlx.Buffer{"src": scaled},
Outputs: map[string]mlx.Buffer{"dst": scaled},
Scalars: map[string]float64{"strength": 0.3},
}); err != nil {
panic(err)
}
frameMetrics, err := session.FinishFrame()
if err != nil {
if err := session.Sync(); err != nil {
panic(err)
}

Expand All @@ -136,46 +110,20 @@ if err != nil {
panic(err)
}
_ = finalFrame
_ = frameMetrics
```

## Research-Grade Pipeline

go-mlx is positioned as a Go-native research-grade model runner — not just inference. The root package exposes the full training and operations pipeline so harnesses can stop reaching for Python `mlx-lm`:

| Feature | Function | What it does |
|---------|----------|--------------|
| LoRA fine-tuning | `mlx.ApplyLoRA` + `mlx.NewAdamW` | Low-rank adaptation training with AdamW, mixed precision, gradient checkpointing |
| LoRA fusion | `mlx.FuseLoRAIntoModelPack(ctx, opts)` | Bake a trained LoRA adapter into the base model as a fresh safetensors pack |
| Knowledge distillation | `mlx.RunKnowledgeDistillation(ctx, runner, dataset, cfg)` | KL or soft-CE loss against a teacher's logits, with checkpoint resumption |
| GRPO | `mlx.RunGRPOReasoningTraining(ctx, runner, dataset, cfg)` | Group-relative policy optimisation with reward functions and reference KL |
| Eval | `mlx.RunModelEval(ctx, model, dataset, cfg)` | Dataset-native perplexity plus pluggable quality probes |
| Model merge | `mlx.MergeModelPacks(ctx, opts)` | Linear / SLERP / TIES / DARE merging of multiple model packs with provenance |
| GGUF quantise | `mlx.QuantizeModelPackToGGUF(ctx, opts)` | Native Go safetensors → GGUF Q8_0 / Q4_0 / Q4_K_M |
| KV snapshot | `snapshot.Save(path)` / `mlx.LoadKVSnapshot(path)` | Portable binary KV cache (Float32 or Q8 symmetric int8) for session restore |
| HF fit | `mlx.PlanHFModelFits(ctx, cfg)` | HuggingFace Hub metadata search to plan what fits on local hardware |
| Attention probe | `inference.AttentionInspector` adapter | Extract post-RoPE K vectors per head per layer for analysis |

See [`docs/`](docs/) and [`examples/`](examples/) for the full surface.

## Documentation

- [Compute Guide](docs/compute.md) — frame-oriented Metal compute sessions, pixel buffers, kernels, metrics
- [Architecture](docs/architecture.md) — CGO binding, model architectures, weight loading, KV cache, attention, batch inference, LoRA training, mlxlm backend
- [Models](docs/models.md) — model loading, supported architectures, tokenisation, chat templates
- [Training](docs/training.md) — LoRA fine-tuning, AdamW, gradient computation, checkpoints, fusion
- [Distillation](docs/distillation.md) — knowledge distillation (KL, soft cross-entropy)
- [GRPO](docs/grpo.md) — group-relative policy optimisation for RL
- [Eval](docs/eval.md) — dataset-native perplexity, quality probes, eval reports
- [Model Operations](docs/model-operations.md) — merge, GGUF quantise, KV snapshot, HF fit
- [Training](docs/training.md) — LoRA fine-tuning, AdamW, gradient computation, checkpoints
- [Development Guide](docs/development.md) — prerequisites (mlx-c CMake build), CGO flags, test patterns, benchmarks
- [Project History](docs/history.md) — completed phases, commit hashes, known limitations
- [Examples](examples/) — runnable usage examples organised by type

## Build & Test

```bash
git submodule update --init --recursive
go generate ./... # builds mlx-c C library (required first time)
go test ./...
go build ./...
Expand Down
Loading