Skip to content

feat: Phase 1 attachment system – chat.Document, pkg/attachment, per-provider convertDocument#2639

Merged
simonferquel merged 8 commits intodocker:mainfrom
simonferquel-clanker:feat/phase1-attachment-system
May 7, 2026
Merged

feat: Phase 1 attachment system – chat.Document, pkg/attachment, per-provider convertDocument#2639
simonferquel merged 8 commits intodocker:mainfrom
simonferquel-clanker:feat/phase1-attachment-system

Conversation

@simonferquel-clanker
Copy link
Copy Markdown
Contributor

@simonferquel-clanker simonferquel-clanker commented May 5, 2026

Summary

Implements Phase 1 of the structured attachment system. Part of #2595.

What's added

pkg/chat/document.go (new)

  • MessagePartTypeDocument constant
  • DocumentSource struct (InlineText / InlineData fields)
  • Document struct (Name, MimeType, Size, Source)
  • Document *Document field added to MessagePart
  • MessagePartTypeFile and MessagePartTypeImageURL annotated as superseded (not formally deprecated to avoid SA1019 on existing call sites)

pkg/attachment/ (new package)

  • attachment.go: Strategy type, Decide() routing function, TXTEnvelope() helper (with </document> escape to prevent envelope breakout)
  • modelcaps/modelcaps.go: ModelCapabilities backed by models.dev data (via pkg/modelsdev store); Supports(mimeType) gates image/* on vision modality, application/pdf on pdf modality, text/* always allowed, everything else (audio, video, Office binary formats) returns false

Per-provider attachments.go (new files)

Added convertDocument(ctx, doc, modelID) and testable convertDocumentWithCaps(ctx, doc, mc) to:

  • pkg/model/provider/oaistream — image → data-URI image part; PDF/other binary → dropped with warning (Chat Completions has no native document block); text → TXTEnvelope; wired into ConvertMultiContent
  • pkg/model/provider/openai — image → OfInputImage; PDF → OfInputFile (native Responses API file block); text → OfInputText with TXTEnvelope
  • pkg/model/provider/anthropic — image → ImageBlockParam (base64); PDF → DocumentBlockParam (gated on application/pdf only, not IsAnthropicDocumentMime); text → TextBlockParam with TXTEnvelope
  • pkg/model/provider/gemini — binary → genai.Blob; text → genai.Text with TXTEnvelope
  • pkg/model/provider/bedrock — image → ContentBlockMemberImage; PDF → ContentBlockMemberDocument; text → ContentBlockMemberText with TXTEnvelope

Signature changes (intentional breaking change)

ConvertMessages and ConvertMultiContent in oaistream now accept ctx context.Context and modelID string parameters for capability routing and context propagation. All in-tree callers updated.

Tests

  • pkg/attachment/decide_test.go — table-driven, all 3 strategy outcomes + MIME-miss + TXTEnvelope escaping
  • pkg/attachment/modelcaps/modelcaps_test.go — in-memory store, vision/text-only/unknown model/Office-rejected cases
  • Per-provider attachments_test.go — TXT strategy, Drop strategy, B64 image and PDF success paths

What's NOT changed

  • Existing MessagePartTypeFile and MessagePartTypeImageURL handling is preserved unchanged
  • No config schema changes (Phase 1 is API-only)

Part of #2595

…provider convertDocument

Implements Phase 1 of the attachment system per spec:

- pkg/chat/document.go: Document, DocumentSource types; MessagePartTypeDocument const
- pkg/chat/chat.go: Document field added to MessagePart; deprecated old ImageURL/File fields
- pkg/attachment/attachment.go: Decide(), TXTEnvelope(), Advisor interface
- pkg/attachment/modelcaps/modelcaps.go: ModelCapabilities.Supports() backed by models.dev data
- Per-provider convertDocument and SupportedMIMETypes in:
  - pkg/model/provider/oaistream (+ backward-compat wrappers for ConvertMessages/ConvertMultiContent)
  - pkg/model/provider/openai (Responses API)
  - pkg/model/provider/anthropic (Beta API)
  - pkg/model/provider/gemini
  - pkg/model/provider/bedrock
- Full test coverage: decide_test.go, modelcaps_test.go, per-provider attachments_test.go

Part of docker#2595

Assisted-By: docker-agent
@simonferquel-clanker simonferquel-clanker requested a review from a team as a code owner May 5, 2026 14:08
Blockers fixed:
- B1: Replace // Deprecated: godoc tags with // Note: superseded comments to
  avoid SA1019 staticcheck errors on all in-tree call sites
- B2: Replace context.Background() with t.Context() in all 5 provider
  attachments_test.go files (also fixed in bedrock/client_test.go and
  gemini/client_test.go)
- B3: Fix gci import ordering in decide_test.go and chat.go (golangci-lint --fix)
- B4: Add B64 success-path tests to all 5 providers using convertDocumentWithCaps
  injection helper; image+PDF cases verified with native block assertions

Suggestions addressed:
- S5: Delete ConvertMessagesLegacy/ConvertMultiContentLegacy (no callers)
- S7: Add convertDocumentWithCaps injection variants in all 5 providers
- S10: Thread request ctx through ConvertMultiContent, ConvertMessages, and
  convertMessagesToResponseInput instead of using context.Background()
- S13: Add TODO(phase2) and stronger constraint comment to DocumentSource.URL

Assisted-By: docker-agent
@docker-agent
Copy link
Copy Markdown

docker-agent Bot commented May 5, 2026

PR Review Failed — The review agent encountered an error and could not complete the review. View logs.

Comment thread pkg/attachment/modelcaps/modelcaps.go Outdated
}
// All other MIME types are text-based or can be safely wrapped in a TXT
// envelope, so we allow them unconditionally.
return true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check the mime type has text/ prefix (nothing prevent the client to send parts with video/* or audio/* mime types)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5dd7103. Supports() now only allows MIME types that start with text/ or are in a specific allowlist of known Office/document formats (application/vnd.openxmlformats-officedocument.*, application/msword, application/rtf, etc.). audio/*, video/*, and arbitrary application/* types now return false unless the model explicitly declares capability. Added TestSupports_AudioVideoRejected to cover this.

Comment thread pkg/attachment/modelcaps/modelcaps.go Outdated

// Use a background context for the lookup; capability detection is best-effort
// and should not block the main request flow.
model, err := store.GetModel(context.Background(), modelID)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this actually block indifinitely ? or is the store guaranteed to bring quick results ? should we add a timeout ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5dd7103. Added a 10-second timeout (loadTimeout = 10*time.Second) to both Load and LoadFromStore. If the fetch times out or the store returns an error, the function falls back to conservative text-only caps and logs a slog.Warn with the timeout duration.

Comment thread pkg/chat/document.go Outdated
// on Documents stored in messages in Phase 1; providers should log a warning
// and skip documents that have only URL set.
// TODO(phase2): implement URL-referenced document fetching.
URL string `json:"url,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it for now, it is not supported yet

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5dd7103. Removed the URL field from DocumentSource entirely.

// - application/pdf with InlineData → BetaRequestDocumentBlock (base64)
// - text with InlineText → BetaTextBlockParam with TXTEnvelope
// - unsupported / no content → nil (logged as warning)
func convertDocument(ctx context.Context, doc chat.Document, modelID string) ([]anthropicsdk.BetaContentBlockParamUnion, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do not need to use the beta SDK anymore as we are not using File API

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5dd7103. attachments.go now uses standard SDK types throughout: anthropic.ContentBlockParamUnion, anthropic.ImageBlockParam, anthropic.DocumentBlockParam, anthropic.TextBlockParam. The beta import alias is gone. The beta path (convertBetaUserMultiContent) adapts the output via a thin stdBlocksToBeta helper in beta_converter.go.

return nil, errors.New("invalid file attachment: neither path nor file_id provided")
}

case chat.MessagePartTypeDocument:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are nt we missing convert support for non beta ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5dd7103. Added case chat.MessagePartTypeDocument: to convertUserMultiContent in client.go (the non-beta path). It calls convertDocument which now returns standard SDK types ([]anthropic.ContentBlockParamUnion), so they wire in directly. Also changed _ context.Context to ctx context.Context so the context is forwarded to convertDocument.

C1: Tighten modelcaps.Supports() — only allow text/* prefix and known
Office/document MIMEs unconditionally. audio/*, video/*, and arbitrary
application/* types now return false unless the model explicitly declares
the capability. Added tests for audio/video rejection.

C2: Remove DocumentSource.URL field — it was Phase 2 only and carries
dead code. Deleted entirely from DocumentSource.

C3: Add 10s timeout to context in modelcaps.Load and LoadFromStore.
Falls back to conservative text-only caps and logs a warning on timeout
or fetch failure.

C4: Switch anthropic/attachments.go from Beta SDK types to standard
(non-beta) Anthropic SDK types: ImageBlockParam, DocumentBlockParam,
TextBlockParam, ContentBlockParamUnion. The beta SDK is not needed
for document conversion since we are not using the Files API.

C5: Wire MessagePartTypeDocument into non-beta convertUserMultiContent
in anthropic/client.go so documents are handled on the standard path.
Updated convertBetaUserMultiContent to call convertDocument and convert
standard → beta blocks via new stdBlocksToBeta helper.

golangci-lint: 0 issues. All tests pass.

Assisted-By: docker-agent
Comment thread pkg/attachment/attachment.go Outdated
type Advisor interface {
// SupportedMIMETypes returns the list of MIME types that the provider's
// current model accepts as document attachments. The list may be empty.
SupportedMIMETypes() []string
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never called

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for clients, not for internal use. The idea here is to be able to filter out unsupported files in the UI, but we can skip it I think

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's skip for now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 31e0601. Removed the Advisor interface from attachment.go and all per-provider SupportedMIMETypes()/SupportedMIMETypesForModel() implementations — none of them had any callers.

…ations

Addresses rumpl's inline review: the attachment.Advisor interface and all
per-provider SupportedMIMETypes() / SupportedMIMETypesForModel() methods
are never called. Removed:
- attachment.Advisor interface from pkg/attachment/attachment.go
- Client.SupportedMIMETypes() from anthropic, gemini, openai, bedrock
- SupportedMIMETypesForModel() from oaistream

golangci-lint: 0 issues.

Assisted-By: docker-agent
Comment thread pkg/attachment/modelcaps/modelcaps.go Outdated
if strings.HasPrefix(mt, "text/") {
return true
}
// Known Office document formats can be safely TXT-enveloped.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not true. docx etc. are basically zip files

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change that! use models.dev info for understanding if the model has office documents capabilities

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f70acb8.

The isKnownDocMIME unconditional allowlist is removed. Office MIMEs (DOCX, XLSX, PPTX, etc.) now return false from Supports() because models.dev only exposes five input modalities (text, image, pdf, audio, video) — there is no document or office field in the current schema. These formats will remain unsupported (StrategyDrop) until models.dev adds an explicit capability declaration for them.

The officeDocumentFormat() helper in the Bedrock provider is retained as forward-looking code — it will activate once models.dev exposes the relevant modality.

Tests updated: TestLoadFromStore_OfficeDocsAlwaysAllowedTestLoadFromStore_OfficeDocsNotAllowed, office doc case in decide_test.go now expects StrategyDrop.

…owlist

Simon's review: DOCX/XLSX/PPTX are ZIP-based binary formats and cannot be
naively TXT-enveloped. The isKnownDocMIME unconditional allowlist was wrong.

Changes:
- Remove unconditional Office MIME allowlist from Supports()
- Rename helper to isOfficeMIME() (now returns false from Supports())
- Office MIMEs now return false for all models because models.dev only
  exposes: text, image, pdf, audio, video modalities — no document/office
  field exists yet. Conservative fallback until models.dev schema adds it.
- Updated TestLoadFromStore_OfficeDocsAlwaysAllowed -> NotAllowed
- Updated decide_test.go: office doc case now expects StrategyDrop
- Extended TestSupports_AudioVideoRejected to also cover Office MIMEs

The forward-looking officeDocumentFormat() in bedrock/attachments.go is
retained — it will activate once models.dev adds an office/document modality.

Assisted-By: docker-agent
}

// Other Office docs
if df, ok := officeDocumentFormat(mime); ok {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is never going to apply, as the models.dev caps is filtering those out. simplify

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in db68b0b. Removed officeDocumentFormat() and the 'Other Office docs' branch entirely. The StrategyB64 path now only handles images and PDFs — the only binary MIMEs that can reach it since modelcaps gates everything else to StrategyDrop. Added a defensive slog.Warn + drop for any unexpected binary that somehow slips through.

…rock

Simon's review: the officeDocumentFormat() helper and the 'Other Office docs'
StrategyB64 branch in convertDocumentWithCaps can never be reached — modelcaps
now gates all Office MIMEs to false/StrategyDrop before convertDocument is
called. Simplify by removing both. The StrategyB64 branch now only handles
images and PDFs (the only binary MIMEs that can reach it). An unexpected
binary MIME gets a defensive slog.Warn + drop rather than a TXT envelope.

Assisted-By: docker-agent
@simonferquel
Copy link
Copy Markdown
Contributor

/review

@docker-agent
Copy link
Copy Markdown

docker-agent Bot commented May 6, 2026

PR Review Failed — The review agent encountered an error and could not complete the review. View logs.

@aheritier aheritier added kind/feat PR adds a new feature (maps to feat: commit prefix) area/tools For features/issues/fixes related to the usage of built-in and MCP tools area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) priority:medium Normal priority, standard sprint work effort:large 2+ days or more, significant complexity labels May 6, 2026
@simonferquel
Copy link
Copy Markdown
Contributor

/review

@docker-agent
Copy link
Copy Markdown

PR Review Failed — The review agent encountered an error and could not complete the review. View logs.

rumpl
rumpl previously approved these changes May 7, 2026
Copy link
Copy Markdown
Contributor

@aheritier aheritier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid Phase 1 scaffolding — clean separation between Decide/modelcaps/per-provider convertDocument, good test coverage on the routing layer.

One blocking issue and a few smaller items:

  • Blocking: anthropic/attachments.go will ship text/plain bytes as an application/pdf block because IsAnthropicDocumentMime covers both.
  • Non-blocking: TXTEnvelope doesn't escape the body — </document> in content lets attachments break out of the wrapper.
  • Non-blocking: oaistream/openai fall back to base64-in-text-envelope for non-image binaries; for PDF this is essentially garbage tokens. Either drop, or use OfInputFile on the Responses API.
  • Non-blocking: per-attachment modelcaps.Load cold-path timeout multiplies with N attachments.
  • Nit: sanitizeDocumentName mangles extensions (report.pdfreport-pdf).

Also: the PR description still advertises ConvertMessagesLegacy / ConvertMultiContentLegacy backward-compat aliases — they don't exist in the diff. Either add them or remove the claim from the description so reviewers know ConvertMessages/ConvertMultiContent are intentionally breaking changes.

Comment thread pkg/model/provider/anthropic/attachments.go Outdated
Comment thread pkg/attachment/attachment.go Outdated
Comment thread pkg/model/provider/oaistream/attachments.go
Comment thread pkg/model/provider/bedrock/attachments.go Outdated
B1 (Anthropic text/plain sent as PDF block):
  Gate DocumentBlockParam strictly on mime=="application/pdf", not on
  IsAnthropicDocumentMime which also matches text/plain. Unexpected binary
  MIMEs in StrategyB64 now get a defensive warn+drop.

N1 (TXTEnvelope body escaping):
  Replace </document> with &lt;/document&gt; in the body before wrapping to
  prevent attachment content from breaking out of the envelope. Added
  TestTXTEnvelope_EscapesClosingTag.

N2 (PR description):
  Updated via gh pr edit: removed stale Legacy alias mention, corrected
  Anthropic block type names, noted ConvertMessages/ConvertMultiContent
  signature change as intentional, documented PDF handling per provider.

N3 (sanitizeDocumentName mangles extensions):
  Strip file extension with path.Ext before sanitizing so report.pdf -> report
  instead of report-pdf. Spaces, parens, and square brackets are now allowed
  (Bedrock permits them). Added TestSanitizeDocumentName table test.

PDF/oaistream (Chat Completions):
  Drop application/pdf with slog.Warn instead of sending opaque base64 bytes
  in a TXT envelope that wastes tokens and is meaningless.

PDF/openai Responses API:
  Use OfInputFile with FileData (data URI) for application/pdf — the correct
  native block on the Responses API endpoint.

Assisted-By: docker-agent
Comment thread pkg/attachment/attachment.go Outdated
// any occurrence of "</document>" in the body is replaced with "&lt;/document&gt;".
func TXTEnvelope(name, mimeType, body string) string {
return fmt.Sprintf("<document name=%q mime-type=%q>%s</document>", name, mimeType, body)
safe := strings.ReplaceAll(body, "</document>", "&lt;/document&gt;")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like this approach. i would rather generate mangle the enveloppe tag name.
eg: <document-name-slugged-mimetype-slugged> </document-name-slugged-mimetype-slugged> to make it very unlikely it conflicts

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in be25d7d. TXTEnvelope now generates a unique slug-based tag from the document name and MIME type:

report.md + text/markdown → <document-report-md-text-markdown>…</document-report-md-text-markdown>

slugify() lowercases the string, replaces non-alphanumeric runes with hyphens, collapses consecutive hyphens, and trims leading/trailing ones. Replaced TestTXTEnvelope_EscapesClosingTag with TestTXTEnvelope_UniqueTag which verifies open/close tag symmetry and that different name+MIME pairs produce different tags.

return []responses.ResponseInputContentUnionParam{
{
OfInputFile: &responses.ResponseInputFileParam{
FileData: param.NewOpt(dataURI),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if FileData supposed to be the dataURI or to be the base64 payload directly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated in be25d7d. The SDK ResponseInputFileContentParam.FileData godoc says: "The base64-encoded data of the file to be sent to the model." — raw base64, not a data URI.

The previous implementation was wrong (it was passing data:application/pdf;base64,…). Fixed: now passes base64.StdEncoding.EncodeToString(doc.Source.InlineData) directly.

…ase64

TXTEnvelope (Simon's preferred approach):
  Replace body-escaping with a unique slug-based tag name derived from the
  document name and MIME type, making accidental tag break-out practically
  impossible without needing to escape the body.
  Example: report.md + text/markdown -> <document-report-md-text-markdown>
  Adds slugify() helper (lowercase, alphanum+hyphens, collapsed runs).
  Removes TestTXTEnvelope_EscapesClosingTag; adds TestTXTEnvelope_UniqueTag.
  Updates per-provider tests to check slug forms instead of name= attributes.

OpenAI Responses API FileData:
  Fix: ResponseInputFileParam.FileData expects raw base64-encoded bytes, NOT
  a data URI. SDK godoc: 'The base64-encoded data of the file to be sent to
  the model.' Removed the data:application/pdf;base64,... wrapping.

Assisted-By: docker-agent
Copy link
Copy Markdown
Contributor

@aheritier aheritier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All four prior blocking/non-blocking points addressed:

  • Anthropic PDF gate: mime == "application/pdf" strict check — text/plain no longer slips into DocumentBlockParam.
  • TXT envelope breakout: tag is now slugged from name + MIME (<document-report-md-text-markdown>…) — content can't forge a closing tag without knowing the slug. Bonus: matches @simonferquel's suggested approach.
  • oaistream PDFs: dropped with a clear warning + comment explaining Chat Completions has no native document block.
  • Bedrock filename: path.Ext stripped before sanitisation, so report.pdfreport instead of report-pdf.

Code reads cleanly, tests cover the routing matrix, CI is green. LGTM.

@simonferquel simonferquel merged commit 22c999b into docker:main May 7, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) area/tools For features/issues/fixes related to the usage of built-in and MCP tools effort:large 2+ days or more, significant complexity kind/feat PR adds a new feature (maps to feat: commit prefix) priority:medium Normal priority, standard sprint work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants