fix: forward image and PDF attachments for DMR models#3197
Conversation
DMR-hosted models are not in the models.dev catalog, so the capability lookup in modelinfo.LoadCaps always missed for "dmr/<model>" IDs and the conversion path fell back to a text-only strategy, silently dropping image and PDF attachments (issue #2739). Let each DMR model declare its multimodal support explicitly via provider_opts.supports_images and provider_opts.supports_pdf. The DMR client parses and validates these at creation time, stores them as the model's attachment capabilities, and injects them into message conversion through the new oaistream.ConvertMessagesWithCaps, bypassing the models.dev lookup that can never resolve a DMR model. While here, the oaistream Chat Completions path now forwards PDFs as a file content part (base64 data URI plus filename) instead of dropping them. This is gated by the resolved capabilities, so it only triggers for models declared PDF-capable: DMR via provider_opts, or catalog models that models.dev reports as PDF-capable on the legacy Chat Completions path. The conservative text-only default is unchanged: a model with no declared capability still drops image and PDF attachments. Fixes #2739.
|
❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs. |
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
One MEDIUM severity finding in the new PDF attachment forwarding path. The fix for DMR attachment capabilities is well-structured overall — capabilities are parsed cleanly, threaded correctly through Client.attachmentCaps, and ConvertMessagesWithCaps resolves caps once per batch as intended. The image forwarding path is correct. The concern is in how PDF content is encoded for the Chat Completions file content part.
| // upstream and never reach this branch. | ||
| return []openai.ChatCompletionContentPartUnionParam{ | ||
| openai.FileContentPart(openai.ChatCompletionContentPartFileFileParam{ | ||
| FileData: openai.String(dataURI(doc.MimeType, doc.Source.InlineData)), |
There was a problem hiding this comment.
[MEDIUM] PDF file_data passes a data URI; OpenAI Chat Completions expects plain base64
The new PDF branch builds the file_data value with dataURI(...), producing a data:application/pdf;base64,<encoded> string:
openai.FileContentPart(openai.ChatCompletionContentPartFileFileParam{
FileData: openai.String(dataURI(doc.MimeType, doc.Source.InlineData)),
Filename: openai.String(doc.Name),
})The OpenAI Chat Completions file content part spec expects file_data to be a plain base64-encoded string (no data: URI prefix), identical to how the Responses API input_file.file_data is specified. The data:mimetype;base64,... format is only valid for the url field of image_url content parts (used correctly in the image branch above).
Sending a data URI prefix in file_data will likely result in a 400/422 from any endpoint that follows the OpenAI Chat Completions spec strictly — including DMR backends that proxy to an OpenAI-compatible runtime. The code comment claims "Chat Completions accepts PDFs as a file content part carrying a base64 data URI", but this is not supported by the OpenAI API reference.
Suggested fix: strip the data URI prefix and pass only the raw base64 payload:
FileData: openai.String(base64.StdEncoding.EncodeToString(doc.Source.InlineData)),
Summary
DMR-hosted models are absent from the models.dev catalog, so
modelinfo.LoadCapsalways missed for
dmr/<model>IDs and image/PDF attachments were silentlydropped to a text-only strategy. This adds a per-model capability opt-in so DMR
vision/PDF models can forward attachments, and forwards PDFs as a native file
content part on the Chat Completions path.
Fixes #2739.
Issue expectations mapped to the implementation
dmrprovider, so the lookup always missesprovider_opts.supports_images/supports_pdfHow it works
provider_optsis parsed and validated at client creation, stored asClient.attachmentCaps, and injected viaoaistream.ConvertMessagesWithCaps.The models.dev lookup is skipped on the DMR path because it can never resolve a
dmr/<model>ID.Changes
pkg/model/provider/dmr/configure.goparseBoolOpt, plus parse and validatesupports_images/supports_pdfpkg/model/provider/dmr/client.goClient.attachmentCapsset inNewClient, injected inconvertMessagespkg/model/provider/oaistream/messages.goConvertMessagesWithCaps; internals threadmodelinfo.ModelCapabilities, resolving caps once per batchpkg/model/provider/oaistream/attachments.goconvertDocument; PDFs forwarded as afilecontent part instead of droppedagent-schema.json,examples/dmr.yamlqwen_visionexampleBehavior
supports_images: truesupports_pdf: trueTesting
pkg/model/provider/dmr,pkg/model/provider/oaistream,pkg/modelinfo,pkg/attachment,pkg/model/provider/openai: pass.pkg/configexample parsing (TestParseExamples) including the newdmr.yamlentry: pass.Note for reviewers
The PDF forwarding change also affects the OpenAI legacy Chat Completions path:
a model that models.dev reports as PDF-capable now receives PDFs as a file
content part rather than having them dropped (newer OpenAI models already handle
PDFs via the Responses API). The behavior is gated by resolved capabilities, so
nothing changes for models without PDF support. If restricting this strictly to
DMR seems more appropriate, it can be changed.