videos#18
Conversation
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
WalkthroughThis PR extends OpenAI provider support by adding cost estimation for audio transcription/translation and video processing, introducing corresponding type definitions, implementing proxy handlers to route new audio models to specialized processors, and registering new video proxy routes with cost estimation. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Handler as Audio Handler
participant Processor as GPT Audio<br/>Processor
participant OpenAI as OpenAI API
participant Estimator
Client->>Handler: POST /audio/transcriptions<br/>(model: gpt-4o-transcribe)
Handler->>Handler: Extract model from form
Handler->>Handler: Route check for gpt-4o-*
Handler->>Processor: processGPTTranscriptions(...)
Processor->>Processor: Validate request & context
Processor->>Processor: Build http.Request<br/>(multipart form data)
Processor->>Processor: Detect streaming mode
Processor->>Processor: Modify request<br/>(response_format handling)
Processor->>OpenAI: Execute request
OpenAI-->>Processor: Non-streaming: 200 OK<br/>TranscriptionResponse
Processor->>Processor: Unmarshal response
Processor->>Estimator: EstimateTranscriptionCost<br/>(secs, model, usage)
Estimator-->>Processor: costInUsd
Processor->>Processor: Store costInUsd in context
Processor-->>Client: JSON or text response
Note over Processor,OpenAI: Streaming path:
OpenAI-->>Processor: newline-delimited chunks
loop For each SSE chunk
Processor->>Processor: Unmarshal TranscriptionStreamChunk
Processor->>Processor: Extract delta/text
Processor->>Processor: Check if IsDone()
alt Chunk is done
Processor->>Estimator: EstimateTranscriptionCost<br/>(accumulated usage)
Estimator-->>Processor: final costInUsd
end
Processor-->>Client: SSE event
end
Processor-->>Client: SSE [DONE]
sequenceDiagram
participant Client
participant Handler as Video Handler
participant Validator as URL Builder
participant OpenAI as OpenAI API
participant Estimator
Client->>Handler: POST/GET/DELETE<br/>/api/providers/openai/v1/videos
Handler->>Handler: Validate request & context
Handler->>Validator: constructVideoURL(path)
Validator-->>Handler: https://api.openai.com/...
Handler->>Handler: Create http.Request<br/>(copy method, body, headers)
Handler->>OpenAI: Execute request
alt Success (200)
OpenAI-->>Handler: VideoResponseMetadata
Handler->>Handler: Unmarshal response
alt POST request (paid)
Handler->>Estimator: EstimateVideoCost<br/>(metadata)
Estimator-->>Handler: costInUsd
Handler->>Handler: Store costInUsd in context
end
Handler-->>Client: Status 200 + response body
else Error (non-200)
OpenAI-->>Handler: Error response
Handler->>Handler: Unmarshal ErrorResponse
Handler->>Handler: Log error details
Handler-->>Client: Original status + error body
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
internal/server/web/proxy/audio.go (1)
172-176: Extract the GPT audio-model check into one helper.The same hard-coded model list now drives branching in both handlers. A shared predicate keeps transcription/translation routing from drifting when this allowlist changes again.
Also applies to: 342-346
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/server/web/proxy/audio.go` around lines 172 - 176, Extract the hard-coded allowlist into a single helper predicate (e.g., isGPTTranscriptionModel(model string) bool) that returns true for "gpt-4o-transcribe", "gpt-4o-transcribe-diarize", and "gpt-4o-mini-transcribe"; then replace the inline checks in the handler around the model variable and the other duplicated branch (the block that currently calls processGPTTranscriptions(c, prod, client, e, model) and the similar block at the later location) to call this helper instead so both routing points share the same source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/provider/openai/cost.go`:
- Around line 820-838: EstimateVideoCost currently mandates a model-size lookup
and errors when metadata.Size is absent, but the cost map may contain fallback
keys like "sora-2" (model-only). Change EstimateVideoCost to try lookups in
order: 1) if size is present/normalized, try "model-size"; 2) if that fails (or
size missing/normalization returns an empty/expected-error), try the model-only
key "model"; and only return an error if neither key exists in
ce.tokenCostMap["video"]. Handle normalization errors by treating missing size
as absent (do not immediately return), and update the same lookup logic in the
analogous image pricing function (the one around lines 841-852) so both video
and image cost resolution use the model-then-model-only fallback.
In `@internal/provider/openai/types.go`:
- Around line 101-106: The current GetSecondsAsFloat silently returns 0 on parse
failure which causes EstimateVideoCost to under-bill; change GetSecondsAsFloat
to return (float64, error) (or add a new GetSecondsAsFloatSafe that returns
(float64, error)) and propagate/handle the error in callers like
EstimateVideoCost and any other call sites, validating v.Seconds before using it
and returning/propagating the parse error instead of treating malformed or
missing seconds as 0 so billing is correct.
In `@internal/server/web/proxy/audio_extended.go`:
- Around line 65-67: The call to modifyGPTTranscriptionsRequest currently
swallows multipart-rewrite failures by writing responses internally and
returning void, causing processGPTAudio to continue and later call
client.Do(req) which can double-write the response; change
modifyGPTTranscriptionsRequest to return an error (or a bool + error) and in
processGPTAudio (where modifyGPTTranscriptionsRequest(ginCtx, prod, log, req,
handler) is invoked) check that return value and immediately return from
processGPTAudio if an error/non-ok is returned (so you don't proceed to
client.Do(req)); update all other call sites in the same file (including the
block around lines 235-286) to handle the new return and propagate or log the
error appropriately.
- Around line 57-67: The streaming branch reads form values (ginCtx.PostForm)
which drains multipart request bodies, so when isStreaming is true the audio
payload can be lost because modifyGPTTranscriptionsRequest (which rebuilds the
multipart body) is skipped; ensure the multipart body is reconstructed after any
form parsing regardless of isStreaming by calling or inlining the same
body-rebuild logic used in modifyGPTTranscriptionsRequest (or factoring that
logic into a helper) before proxying in the streaming path so req.Body contains
the full multipart payload for upstream.
In `@internal/server/web/proxy/proxy.go`:
- Around line 107-114: The video routes call getVideoHandler but never set the
request's model value before the later isModelAllowed / isModelSupported checks,
letting empty model pass; update the routing/middleware so the model is
extracted and set on the request context for all video endpoints (e.g., in the
same middleware that other routes use) by reading the model from the incoming
payload/form or defaulting to the intended video model, ensuring getVideoHandler
sees a non-empty model and that isModelAllowed / isModelSupported are enforced
for routes such as the POST/GET/DELETE handlers for
"/api/providers/openai/v1/videos" and its subpaths.
---
Nitpick comments:
In `@internal/server/web/proxy/audio.go`:
- Around line 172-176: Extract the hard-coded allowlist into a single helper
predicate (e.g., isGPTTranscriptionModel(model string) bool) that returns true
for "gpt-4o-transcribe", "gpt-4o-transcribe-diarize", and
"gpt-4o-mini-transcribe"; then replace the inline checks in the handler around
the model variable and the other duplicated branch (the block that currently
calls processGPTTranscriptions(c, prod, client, e, model) and the similar block
at the later location) to call this helper instead so both routing points share
the same source of truth.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7dce883e-09e8-40eb-ac83-14fcdb5c7c91
📒 Files selected for processing (7)
internal/provider/openai/cost.gointernal/provider/openai/types.gointernal/server/web/proxy/audio.gointernal/server/web/proxy/audio_extended.gointernal/server/web/proxy/middleware.gointernal/server/web/proxy/proxy.gointernal/server/web/proxy/video.go
https://bugtracker.codiodev.com/issue/codio-17481/Add-BricksLLM-support-for-OpenAI-Text-to-Speech-and-Speech-to-Text-model-allowlist-audio-routing
Summary by CodeRabbit