Skip to content

core LLM completion layer#2443

Open
shaj13 wants to merge 1 commit intodocker:mainfrom
shaj13:pkg/ai
Open

core LLM completion layer#2443
shaj13 wants to merge 1 commit intodocker:mainfrom
shaj13:pkg/ai

Conversation

@shaj13
Copy link
Copy Markdown
Contributor

@shaj13 shaj13 commented Apr 15, 2026

Add new pkg/ai package that extracts and centralizes model interaction logic from runtime.

See #2409

@shaj13 shaj13 requested a review from a team as a code owner April 15, 2026 22:06
Add new pkg/ai package that extracts and centralizes model interaction
logic from runtime.

The package reuses existing types from chat, tools, and provider packages
without moving them.
@shaj13
Copy link
Copy Markdown
Contributor Author

shaj13 commented Apr 15, 2026

@dgageot mind taking a look ? Extracted the LLM interaction logic from pkg/runtime and other packages into a new reusable pkg/ai package. This includes streaming, fallback handling, retry, and tool execution — enabling consistent model communication across the codebase. The package is also can be used as a standalone Go library without requiring the full agent runtime.

@shaj13
Copy link
Copy Markdown
Contributor Author

shaj13 commented Apr 21, 2026

@dgageot just a gentle reminder — any update on this PR ?

Copy link
Copy Markdown
Contributor

@aheritier aheritier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good refactoring — the pkg/ai package has a clean API, proper separation of concerns, and solid test coverage with golden files. The interceptor pattern is well designed.

A few items need attention before merge:

  1. Merge conflicts — branch is in a dirty state, needs rebase.
  2. WithRetries semantics mismatch — doc says "n + 1 attempts" but code gives exactly n (see inline).
  3. Token usage accumulation bug — loses data when r.Usage != nil && r2.Usage == nil (see inline).
  4. Unguarded slices.IndexFunc result — can panic if model ID not found (see inline).
  5. Cooldown behavior change — the old code only set cooldown when the primary failed with a non-retryable error; the new code always sets cooldown when any fallback succeeds. This is a semantic change that may cause undesired pinning after transient retryable failures exhaust retries.

CI is green on the last push (lint, test, build all pass).

@aheritier
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@aheritier aheritier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline comments attached.

Comment thread pkg/ai/option.go
// errors (5xx, timeouts). The total attempts per model is n + 1.
func WithRetries(n int) Option {
return func(c *completion) {
c.retries = n
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: The doc says "The total attempts per model is n + 1" but the code uses for retry := range c.retries which yields exactly n iterations. WithRetries(3) = 3 attempts, not 4.

Either fix the doc to "The total attempts per model is n" or change the loop to range c.retries + 1.

Comment thread pkg/ai/completion.go
Comment on lines +462 to +474

if r2.Usage != nil && r.Usage != nil {
r2.Usage = &chat.Usage{
InputTokens: r.Usage.InputTokens + r2.Usage.InputTokens,
OutputTokens: r.Usage.OutputTokens + r2.Usage.OutputTokens,
CachedInputTokens: r.Usage.CachedInputTokens + r2.Usage.CachedInputTokens,
CacheWriteTokens: r.Usage.CacheWriteTokens + r2.Usage.CacheWriteTokens,
ReasoningTokens: r.Usage.ReasoningTokens + r2.Usage.ReasoningTokens,
}
}

return r2, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: If r.Usage != nil but r2.Usage == nil (provider didn't return usage on the recursive call), the first call's token counts are silently lost. Consider:

switch {
case r2.Usage != nil && r.Usage != nil:
    r2.Usage = &chat.Usage{
        InputTokens:       r.Usage.InputTokens + r2.Usage.InputTokens,
        OutputTokens:      r.Usage.OutputTokens + r2.Usage.OutputTokens,
        CachedInputTokens: r.Usage.CachedInputTokens + r2.Usage.CachedInputTokens,
        CacheWriteTokens:  r.Usage.CacheWriteTokens + r2.Usage.CacheWriteTokens,
        ReasoningTokens:   r.Usage.ReasoningTokens + r2.Usage.ReasoningTokens,
    }
case r2.Usage == nil && r.Usage != nil:
    r2.Usage = r.Usage
}

Comment thread pkg/ai/completion.go
Comment on lines +449 to +455
})

// Rotate to put the responding model first.
c.models = append(c.models[idx:], c.models[:idx]...)
}

r2, err := c.generate(ctx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: slices.IndexFunc returns -1 when no match is found. If r.Model doesn't match any entry in c.models (e.g. a router provider exposing a different sub-model ID), this will panic on c.models[-1:].

Add a guard:

idx := slices.IndexFunc(c.models, func(m provider.Provider) bool {
    return m.ID() == r.Model
})
if idx > 0 {
    c.models = append(c.models[idx:], c.models[:idx]...)
}

Comment thread pkg/runtime/fallback.go
Comment on lines +202 to +209
for _, m := range models {
if m.ID() == res.Model {
usedModel = m
break
}
return retryDecisionContinue
}

if !retryable {
slog.Error("Non-retryable error from model",
"agent", a.Name(),
"model", modelEntry.provider.ID(),
"error", err)
if !modelEntry.isFallback {
*primaryFailedWithNonRetryable = true
// Handle cooldown state based on which model succeeded
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: This now unconditionally sets cooldown whenever a fallback succeeds. The old code only set cooldown when primaryFailedWithNonRetryable was true (429 or 4xx). If the primary exhausts retries on transient 5xx errors and a fallback picks up, the runtime will now pin to that fallback for the full cooldown window — previously it wouldn't.

Is this intentional? If so, it's probably fine (simpler), but worth noting as a behavior change. If not, pkg/ai would need to surface why the fallback was triggered (non-retryable vs exhausted retries).

Comment on lines +108 to +121
models = append(models, titleModel)
}

if lastErr != nil {
return "", fmt.Errorf("generating title failed: %w", lastErr)
str, err := ai.GenerateText(
ctx,
ai.WithModels(models...),
ai.WithMessages(messages...),
ai.WithRequireContent(),
ai.WithLogger(lg),
)
if err != nil {
return "", fmt.Errorf("generating title failed: %w", err)
}
return "", nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Praise: Nice simplification of the title generator. The old manual stream-drain + per-model retry loop collapses cleanly into a single ai.GenerateText call with WithRequireContent.

Copy link
Copy Markdown

@docker-agent docker-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🔴 CRITICAL

1 high-severity panic + 4 medium-severity findings in new pkg/ai completion layer. See inline comments for details.

Comment thread pkg/ai/completion.go
c.messages = append(c.messages, msgs...)

if c.models[0].ID() != r.Model {
idx := slices.IndexFunc(c.models, func(m provider.Provider) bool {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Panic: slices.IndexFunc result not checked for -1 before use as slice index

slices.IndexFunc returns -1 when no element matches. If r.Model is not found in c.models (e.g., the provider returns an unexpected model ID), idx will be -1 and c.models[-1:] will panic with an index out of range.

The outer if c.models[0].ID() != r.Model guard only checks that the responding model is not the first one — it does not guarantee the model exists anywhere in the slice.

Fix: check idx before use:

idx := slices.IndexFunc(c.models, func(m provider.Provider) bool {
    return m.ID() == r.Model
})
if idx < 0 {
    idx = 0 // responding model not found; keep current order
}
c.models = append(c.models[idx:], c.models[:idx]...)

Comment thread pkg/ai/completion.go
}

if ctx.Err() != nil {
return nil, err
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Context cancellation returns stream error instead of ctx.Err(), masking cancellation

When ctx.Err() != nil (context was cancelled or timed out), the code returns err — the stream/model error — rather than ctx.Err(). Callers that check errors.Is(err, context.Canceled) or errors.Is(err, context.DeadlineExceeded) will not recognize this as a cancellation and may incorrectly retry or log it as a transient failure.

Note: the select at the bottom of the retry loop (the case <-ctx.Done() branch) correctly returns ctx.Err(), making this path inconsistent.

Fix:

if ctx.Err() != nil {
    return nil, ctx.Err()  // return the cancellation error, not the stream error
}

Comment thread pkg/ai/completion.go
c.onModelFallback(c.models[i-1], model, err)
}

for retry := range c.retries {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Off-by-one in retry loop: for retry := range c.retries gives c.retries attempts, not c.retries + 1 as documented

The WithRetries option docs say:

The total attempts per model is n + 1.

But for retry := range c.retries (Go 1.22+ range-over-integer) iterates from 0 to c.retries-1, producing exactly c.retries total attempts — one fewer than documented.

Examples:

  • WithRetries(1) → 1 attempt (expected: 2)
  • WithRetries(2) → 2 attempts (expected: 3)
  • Default (c.retries = 1) → 1 attempt total, meaning zero retries on failure

Fix — change the loop to iterate c.retries + 1 times:

for retry := range c.retries + 1 {
    ...
}

Or update the docs to state "n total attempts" if single-attempt behavior is intentional.

Comment thread pkg/ai/completion.go
// Handle tool call deltas
if len(choice.Delta.ToolCalls) > 0 {
for _, delta := range choice.Delta.ToolCalls {
idx, ok := toolCallIndex[delta.ID]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Empty-string tool call ID used as map key, merging distinct tool calls from streaming providers

toolCallIndex[delta.ID] uses delta.ID directly as a map key. In many streaming LLM protocols, the tool call ID is only sent in the first delta chunk — subsequent delta chunks for the same tool call arrive with delta.ID == "". This is fine for a single tool call per stream.

However, if a streaming provider sends multiple new tool calls where more than one arrives with delta.ID == "" (or a provider that never sends IDs in deltas), all of them are grouped under the same empty-string key. The second new call overwrites the first call's entry in toolCallIndex, and their arguments get merged into a single garbled ToolCall.

Consider adding a guard or using the delta's Index field (if available from the streaming spec) as the map key:

// Use index as primary key, fall back to ID
key := fmt.Sprintf("%d", delta.Index)
if delta.ID != "" {
    key = delta.ID
}
idx, ok := toolCallIndex[key]

Or at minimum validate that new entries (where !ok) have a non-empty ID, and log a warning otherwise.


slog.Debug("Generated session title", "session_id", sessionID, "title", result, "model", baseModel.ID())
return result, nil
models = append(models, titleModel)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] provider.CloneWithOptions return value is not nil-checked before appending to models

provider.CloneWithOptions may return nil depending on its implementation (e.g., if the base model type doesn't support cloning). A nil provider.Provider interface value appended to models and passed to ai.WithModels would cause a nil-pointer dereference (panic) inside ai.completion.stream() when the interface method is called.

The baseModel == nil guard above only filters nil inputs; it does not protect against a nil output from CloneWithOptions.

Fix: add a nil check on the result:

titleModel := provider.CloneWithOptions(ctx, baseModel, ...)
if titleModel == nil {
    continue
}
models = append(models, titleModel)

@aheritier aheritier added kind/feat PR adds a new feature (maps to feat: commit prefix) area/agent For work that has to do with the general agent loop/agentic features of the app priority:medium Normal priority, standard sprint work effort:large 2+ days or more, significant complexity status/needs-design Requires architectural discussion or design review labels May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent For work that has to do with the general agent loop/agentic features of the app effort:large 2+ days or more, significant complexity kind/feat PR adds a new feature (maps to feat: commit prefix) priority:medium Normal priority, standard sprint work status/needs-design Requires architectural discussion or design review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants