feat: Phase 1 attachment system – chat.Document, pkg/attachment, per-provider convertDocument#2639
Conversation
…provider convertDocument Implements Phase 1 of the attachment system per spec: - pkg/chat/document.go: Document, DocumentSource types; MessagePartTypeDocument const - pkg/chat/chat.go: Document field added to MessagePart; deprecated old ImageURL/File fields - pkg/attachment/attachment.go: Decide(), TXTEnvelope(), Advisor interface - pkg/attachment/modelcaps/modelcaps.go: ModelCapabilities.Supports() backed by models.dev data - Per-provider convertDocument and SupportedMIMETypes in: - pkg/model/provider/oaistream (+ backward-compat wrappers for ConvertMessages/ConvertMultiContent) - pkg/model/provider/openai (Responses API) - pkg/model/provider/anthropic (Beta API) - pkg/model/provider/gemini - pkg/model/provider/bedrock - Full test coverage: decide_test.go, modelcaps_test.go, per-provider attachments_test.go Part of docker#2595 Assisted-By: docker-agent
Blockers fixed: - B1: Replace // Deprecated: godoc tags with // Note: superseded comments to avoid SA1019 staticcheck errors on all in-tree call sites - B2: Replace context.Background() with t.Context() in all 5 provider attachments_test.go files (also fixed in bedrock/client_test.go and gemini/client_test.go) - B3: Fix gci import ordering in decide_test.go and chat.go (golangci-lint --fix) - B4: Add B64 success-path tests to all 5 providers using convertDocumentWithCaps injection helper; image+PDF cases verified with native block assertions Suggestions addressed: - S5: Delete ConvertMessagesLegacy/ConvertMultiContentLegacy (no callers) - S7: Add convertDocumentWithCaps injection variants in all 5 providers - S10: Thread request ctx through ConvertMultiContent, ConvertMessages, and convertMessagesToResponseInput instead of using context.Background() - S13: Add TODO(phase2) and stronger constraint comment to DocumentSource.URL Assisted-By: docker-agent
|
❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs. |
| } | ||
| // All other MIME types are text-based or can be safely wrapped in a TXT | ||
| // envelope, so we allow them unconditionally. | ||
| return true |
There was a problem hiding this comment.
we should check the mime type has text/ prefix (nothing prevent the client to send parts with video/* or audio/* mime types)
There was a problem hiding this comment.
Fixed in 5dd7103. Supports() now only allows MIME types that start with text/ or are in a specific allowlist of known Office/document formats (application/vnd.openxmlformats-officedocument.*, application/msword, application/rtf, etc.). audio/*, video/*, and arbitrary application/* types now return false unless the model explicitly declares capability. Added TestSupports_AudioVideoRejected to cover this.
|
|
||
| // Use a background context for the lookup; capability detection is best-effort | ||
| // and should not block the main request flow. | ||
| model, err := store.GetModel(context.Background(), modelID) |
There was a problem hiding this comment.
can this actually block indifinitely ? or is the store guaranteed to bring quick results ? should we add a timeout ?
There was a problem hiding this comment.
Fixed in 5dd7103. Added a 10-second timeout (loadTimeout = 10*time.Second) to both Load and LoadFromStore. If the fetch times out or the store returns an error, the function falls back to conservative text-only caps and logs a slog.Warn with the timeout duration.
| // on Documents stored in messages in Phase 1; providers should log a warning | ||
| // and skip documents that have only URL set. | ||
| // TODO(phase2): implement URL-referenced document fetching. | ||
| URL string `json:"url,omitempty"` |
There was a problem hiding this comment.
remove it for now, it is not supported yet
There was a problem hiding this comment.
Done in 5dd7103. Removed the URL field from DocumentSource entirely.
| // - application/pdf with InlineData → BetaRequestDocumentBlock (base64) | ||
| // - text with InlineText → BetaTextBlockParam with TXTEnvelope | ||
| // - unsupported / no content → nil (logged as warning) | ||
| func convertDocument(ctx context.Context, doc chat.Document, modelID string) ([]anthropicsdk.BetaContentBlockParamUnion, error) { |
There was a problem hiding this comment.
I think we do not need to use the beta SDK anymore as we are not using File API
There was a problem hiding this comment.
Fixed in 5dd7103. attachments.go now uses standard SDK types throughout: anthropic.ContentBlockParamUnion, anthropic.ImageBlockParam, anthropic.DocumentBlockParam, anthropic.TextBlockParam. The beta import alias is gone. The beta path (convertBetaUserMultiContent) adapts the output via a thin stdBlocksToBeta helper in beta_converter.go.
| return nil, errors.New("invalid file attachment: neither path nor file_id provided") | ||
| } | ||
|
|
||
| case chat.MessagePartTypeDocument: |
There was a problem hiding this comment.
are nt we missing convert support for non beta ?
There was a problem hiding this comment.
Fixed in 5dd7103. Added case chat.MessagePartTypeDocument: to convertUserMultiContent in client.go (the non-beta path). It calls convertDocument which now returns standard SDK types ([]anthropic.ContentBlockParamUnion), so they wire in directly. Also changed _ context.Context to ctx context.Context so the context is forwarded to convertDocument.
C1: Tighten modelcaps.Supports() — only allow text/* prefix and known Office/document MIMEs unconditionally. audio/*, video/*, and arbitrary application/* types now return false unless the model explicitly declares the capability. Added tests for audio/video rejection. C2: Remove DocumentSource.URL field — it was Phase 2 only and carries dead code. Deleted entirely from DocumentSource. C3: Add 10s timeout to context in modelcaps.Load and LoadFromStore. Falls back to conservative text-only caps and logs a warning on timeout or fetch failure. C4: Switch anthropic/attachments.go from Beta SDK types to standard (non-beta) Anthropic SDK types: ImageBlockParam, DocumentBlockParam, TextBlockParam, ContentBlockParamUnion. The beta SDK is not needed for document conversion since we are not using the Files API. C5: Wire MessagePartTypeDocument into non-beta convertUserMultiContent in anthropic/client.go so documents are handled on the standard path. Updated convertBetaUserMultiContent to call convertDocument and convert standard → beta blocks via new stdBlocksToBeta helper. golangci-lint: 0 issues. All tests pass. Assisted-By: docker-agent
| type Advisor interface { | ||
| // SupportedMIMETypes returns the list of MIME types that the provider's | ||
| // current model accepts as document attachments. The list may be empty. | ||
| SupportedMIMETypes() []string |
There was a problem hiding this comment.
this is for clients, not for internal use. The idea here is to be able to filter out unsupported files in the UI, but we can skip it I think
There was a problem hiding this comment.
Fixed in 31e0601. Removed the Advisor interface from attachment.go and all per-provider SupportedMIMETypes()/SupportedMIMETypesForModel() implementations — none of them had any callers.
…ations Addresses rumpl's inline review: the attachment.Advisor interface and all per-provider SupportedMIMETypes() / SupportedMIMETypesForModel() methods are never called. Removed: - attachment.Advisor interface from pkg/attachment/attachment.go - Client.SupportedMIMETypes() from anthropic, gemini, openai, bedrock - SupportedMIMETypesForModel() from oaistream golangci-lint: 0 issues. Assisted-By: docker-agent
| if strings.HasPrefix(mt, "text/") { | ||
| return true | ||
| } | ||
| // Known Office document formats can be safely TXT-enveloped. |
There was a problem hiding this comment.
It is not true. docx etc. are basically zip files
There was a problem hiding this comment.
Change that! use models.dev info for understanding if the model has office documents capabilities
There was a problem hiding this comment.
Fixed in f70acb8.
The isKnownDocMIME unconditional allowlist is removed. Office MIMEs (DOCX, XLSX, PPTX, etc.) now return false from Supports() because models.dev only exposes five input modalities (text, image, pdf, audio, video) — there is no document or office field in the current schema. These formats will remain unsupported (StrategyDrop) until models.dev adds an explicit capability declaration for them.
The officeDocumentFormat() helper in the Bedrock provider is retained as forward-looking code — it will activate once models.dev exposes the relevant modality.
Tests updated: TestLoadFromStore_OfficeDocsAlwaysAllowed → TestLoadFromStore_OfficeDocsNotAllowed, office doc case in decide_test.go now expects StrategyDrop.
…owlist Simon's review: DOCX/XLSX/PPTX are ZIP-based binary formats and cannot be naively TXT-enveloped. The isKnownDocMIME unconditional allowlist was wrong. Changes: - Remove unconditional Office MIME allowlist from Supports() - Rename helper to isOfficeMIME() (now returns false from Supports()) - Office MIMEs now return false for all models because models.dev only exposes: text, image, pdf, audio, video modalities — no document/office field exists yet. Conservative fallback until models.dev schema adds it. - Updated TestLoadFromStore_OfficeDocsAlwaysAllowed -> NotAllowed - Updated decide_test.go: office doc case now expects StrategyDrop - Extended TestSupports_AudioVideoRejected to also cover Office MIMEs The forward-looking officeDocumentFormat() in bedrock/attachments.go is retained — it will activate once models.dev adds an office/document modality. Assisted-By: docker-agent
| } | ||
|
|
||
| // Other Office docs | ||
| if df, ok := officeDocumentFormat(mime); ok { |
There was a problem hiding this comment.
I think this is never going to apply, as the models.dev caps is filtering those out. simplify
There was a problem hiding this comment.
Done in db68b0b. Removed officeDocumentFormat() and the 'Other Office docs' branch entirely. The StrategyB64 path now only handles images and PDFs — the only binary MIMEs that can reach it since modelcaps gates everything else to StrategyDrop. Added a defensive slog.Warn + drop for any unexpected binary that somehow slips through.
…rock Simon's review: the officeDocumentFormat() helper and the 'Other Office docs' StrategyB64 branch in convertDocumentWithCaps can never be reached — modelcaps now gates all Office MIMEs to false/StrategyDrop before convertDocument is called. Simplify by removing both. The StrategyB64 branch now only handles images and PDFs (the only binary MIMEs that can reach it). An unexpected binary MIME gets a defensive slog.Warn + drop rather than a TXT envelope. Assisted-By: docker-agent
|
/review |
|
❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs. |
|
/review |
|
❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs. |
aheritier
left a comment
There was a problem hiding this comment.
Solid Phase 1 scaffolding — clean separation between Decide/modelcaps/per-provider convertDocument, good test coverage on the routing layer.
One blocking issue and a few smaller items:
- Blocking:
anthropic/attachments.gowill shiptext/plainbytes as anapplication/pdfblock becauseIsAnthropicDocumentMimecovers both. - Non-blocking:
TXTEnvelopedoesn't escape the body —</document>in content lets attachments break out of the wrapper. - Non-blocking: oaistream/openai fall back to base64-in-text-envelope for non-image binaries; for PDF this is essentially garbage tokens. Either drop, or use
OfInputFileon the Responses API. - Non-blocking: per-attachment
modelcaps.Loadcold-path timeout multiplies with N attachments. - Nit:
sanitizeDocumentNamemangles extensions (report.pdf→report-pdf).
Also: the PR description still advertises ConvertMessagesLegacy / ConvertMultiContentLegacy backward-compat aliases — they don't exist in the diff. Either add them or remove the claim from the description so reviewers know ConvertMessages/ConvertMultiContent are intentionally breaking changes.
B1 (Anthropic text/plain sent as PDF block): Gate DocumentBlockParam strictly on mime=="application/pdf", not on IsAnthropicDocumentMime which also matches text/plain. Unexpected binary MIMEs in StrategyB64 now get a defensive warn+drop. N1 (TXTEnvelope body escaping): Replace </document> with </document> in the body before wrapping to prevent attachment content from breaking out of the envelope. Added TestTXTEnvelope_EscapesClosingTag. N2 (PR description): Updated via gh pr edit: removed stale Legacy alias mention, corrected Anthropic block type names, noted ConvertMessages/ConvertMultiContent signature change as intentional, documented PDF handling per provider. N3 (sanitizeDocumentName mangles extensions): Strip file extension with path.Ext before sanitizing so report.pdf -> report instead of report-pdf. Spaces, parens, and square brackets are now allowed (Bedrock permits them). Added TestSanitizeDocumentName table test. PDF/oaistream (Chat Completions): Drop application/pdf with slog.Warn instead of sending opaque base64 bytes in a TXT envelope that wastes tokens and is meaningless. PDF/openai Responses API: Use OfInputFile with FileData (data URI) for application/pdf — the correct native block on the Responses API endpoint. Assisted-By: docker-agent
| // any occurrence of "</document>" in the body is replaced with "</document>". | ||
| func TXTEnvelope(name, mimeType, body string) string { | ||
| return fmt.Sprintf("<document name=%q mime-type=%q>%s</document>", name, mimeType, body) | ||
| safe := strings.ReplaceAll(body, "</document>", "</document>") |
There was a problem hiding this comment.
I do not like this approach. i would rather generate mangle the enveloppe tag name.
eg: <document-name-slugged-mimetype-slugged> </document-name-slugged-mimetype-slugged> to make it very unlikely it conflicts
There was a problem hiding this comment.
Implemented in be25d7d. TXTEnvelope now generates a unique slug-based tag from the document name and MIME type:
report.md + text/markdown → <document-report-md-text-markdown>…</document-report-md-text-markdown>
slugify() lowercases the string, replaces non-alphanumeric runes with hyphens, collapses consecutive hyphens, and trims leading/trailing ones. Replaced TestTXTEnvelope_EscapesClosingTag with TestTXTEnvelope_UniqueTag which verifies open/close tag symmetry and that different name+MIME pairs produce different tags.
| return []responses.ResponseInputContentUnionParam{ | ||
| { | ||
| OfInputFile: &responses.ResponseInputFileParam{ | ||
| FileData: param.NewOpt(dataURI), |
There was a problem hiding this comment.
if FileData supposed to be the dataURI or to be the base64 payload directly?
There was a problem hiding this comment.
Investigated in be25d7d. The SDK ResponseInputFileContentParam.FileData godoc says: "The base64-encoded data of the file to be sent to the model." — raw base64, not a data URI.
The previous implementation was wrong (it was passing data:application/pdf;base64,…). Fixed: now passes base64.StdEncoding.EncodeToString(doc.Source.InlineData) directly.
…ase64 TXTEnvelope (Simon's preferred approach): Replace body-escaping with a unique slug-based tag name derived from the document name and MIME type, making accidental tag break-out practically impossible without needing to escape the body. Example: report.md + text/markdown -> <document-report-md-text-markdown> Adds slugify() helper (lowercase, alphanum+hyphens, collapsed runs). Removes TestTXTEnvelope_EscapesClosingTag; adds TestTXTEnvelope_UniqueTag. Updates per-provider tests to check slug forms instead of name= attributes. OpenAI Responses API FileData: Fix: ResponseInputFileParam.FileData expects raw base64-encoded bytes, NOT a data URI. SDK godoc: 'The base64-encoded data of the file to be sent to the model.' Removed the data:application/pdf;base64,... wrapping. Assisted-By: docker-agent
aheritier
left a comment
There was a problem hiding this comment.
All four prior blocking/non-blocking points addressed:
- Anthropic PDF gate:
mime == "application/pdf"strict check —text/plainno longer slips intoDocumentBlockParam. - TXT envelope breakout: tag is now slugged from name + MIME (
<document-report-md-text-markdown>…) — content can't forge a closing tag without knowing the slug. Bonus: matches @simonferquel's suggested approach. - oaistream PDFs: dropped with a clear warning + comment explaining Chat Completions has no native document block.
- Bedrock filename:
path.Extstripped before sanitisation, soreport.pdf→reportinstead ofreport-pdf.
Code reads cleanly, tests cover the routing matrix, CI is green. LGTM.
Summary
Implements Phase 1 of the structured attachment system. Part of #2595.
What's added
pkg/chat/document.go(new)MessagePartTypeDocumentconstantDocumentSourcestruct (InlineText / InlineData fields)Documentstruct (Name, MimeType, Size, Source)Document *Documentfield added toMessagePartMessagePartTypeFileandMessagePartTypeImageURLannotated as superseded (not formally deprecated to avoid SA1019 on existing call sites)pkg/attachment/(new package)attachment.go:Strategytype,Decide()routing function,TXTEnvelope()helper (with</document>escape to prevent envelope breakout)modelcaps/modelcaps.go:ModelCapabilitiesbacked by models.dev data (viapkg/modelsdevstore);Supports(mimeType)gates image/* on vision modality, application/pdf on pdf modality, text/* always allowed, everything else (audio, video, Office binary formats) returns falsePer-provider
attachments.go(new files)Added
convertDocument(ctx, doc, modelID)and testableconvertDocumentWithCaps(ctx, doc, mc)to:pkg/model/provider/oaistream— image → data-URI image part; PDF/other binary → dropped with warning (Chat Completions has no native document block); text → TXTEnvelope; wired intoConvertMultiContentpkg/model/provider/openai— image →OfInputImage; PDF →OfInputFile(native Responses API file block); text →OfInputTextwith TXTEnvelopepkg/model/provider/anthropic— image →ImageBlockParam(base64); PDF →DocumentBlockParam(gated onapplication/pdfonly, notIsAnthropicDocumentMime); text →TextBlockParamwith TXTEnvelopepkg/model/provider/gemini— binary →genai.Blob; text →genai.Textwith TXTEnvelopepkg/model/provider/bedrock— image →ContentBlockMemberImage; PDF →ContentBlockMemberDocument; text →ContentBlockMemberTextwith TXTEnvelopeSignature changes (intentional breaking change)
ConvertMessagesandConvertMultiContentinoaistreamnow acceptctx context.ContextandmodelID stringparameters for capability routing and context propagation. All in-tree callers updated.Tests
pkg/attachment/decide_test.go— table-driven, all 3 strategy outcomes + MIME-miss + TXTEnvelope escapingpkg/attachment/modelcaps/modelcaps_test.go— in-memory store, vision/text-only/unknown model/Office-rejected casesattachments_test.go— TXT strategy, Drop strategy, B64 image and PDF success pathsWhat's NOT changed
MessagePartTypeFileandMessagePartTypeImageURLhandling is preserved unchangedPart of #2595