Skip to content

feat(l3): dedicated l3Llm config slot for abstraction pass#1959

Open
chiefmojo wants to merge 1 commit into
MemTensor:dev-20260604-v2.0.19from
chiefmojo:feat/l3-dedicated-llm
Open

feat(l3): dedicated l3Llm config slot for abstraction pass#1959
chiefmojo wants to merge 1 commit into
MemTensor:dev-20260604-v2.0.19from
chiefmojo:feat/l3-dedicated-llm

Conversation

@chiefmojo

@chiefmojo chiefmojo commented Jun 21, 2026

Copy link
Copy Markdown

Problem

L3 abstraction runs on the main llm. On cheap models (e.g. gemini-2.5-flash-lite) this over-extracts and truncates the JSON output, producing 'constraints'/'inference' must be an array errors. The slot is quality-sensitive and runs off the turn-response path — a slower but more capable model here improves world-model quality with no impact on latency.

Changes

Adds an l3Llm top-level config slot using the same SkillEvolverSchema shape as skillEvolver. When blank (the default), behavior is identical to today — deps.l3Llm ?? deps.llm falls through to the main LLM. When set, the L3 clustering → abstraction subscriber uses the dedicated client.

l3Llm:
  provider: openai_compatible
  endpoint: https://openrouter.ai/api/v1
  model: anthropic/claude-sonnet-4-5
  apiKey: sk-or-...
  • defaults.ts — blank default (inherits main llm)
  • schema.tsl3Llm: SkillEvolverSchema entry
  • memory-core.ts — bootstrap client (mirrors reflectLlm path; errors log, don't crash)
  • deps.ts / types.tsPipelineDeps.l3Llm and PipelineHandle.l3Llm fields
  • orchestrator.ts — threads the field through to the handle
  • orchestrator.test.ts — two tests: dedicated client threads through, null when unconfigured

Config example

Operators running a budget model as their main llm can now point L3 at a stronger model:

llm:
  provider: openai_compatible
  model: gemini-2.5-flash-lite  # cheap, fast, good for scoring

l3Llm:
  provider: openai_compatible
  model: anthropic/claude-sonnet-4-5  # slow ok — L3 is async
  apiKey: sk-or-...

L3 abstraction runs on the main `llm`, which on cheap models
(gemini-2.5-flash-lite) over-extracts and truncates the JSON,
producing "'constraints'/'inference' must be an array" failures —
Violet logged 180 such failures and produced only 2 world-model
facts total.

Add an `l3Llm` config slot (same SkillEvolverSchema shape as
`skillEvolver`). Blank inherits the main `llm` (zero behavior
change); set explicitly to run the clustering → world-model pass
on a stronger model. L3 is async/off the turn-response path, so
a slower-but-correct model has no impact on companion latency.

Wiring mirrors skillEvolver: schema + blank default + secret
redaction + bootstrap client + PipelineDeps/PipelineHandle field
+ orchestrator handle passthrough, consumed at the L3 subscriber
attach (`deps.l3Llm ?? deps.llm`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant