Skip to content

videos#18

Merged
sergei-bronnikov merged 8 commits intomainfrom
17481_audio_video
Apr 15, 2026
Merged

videos#18
sergei-bronnikov merged 8 commits intomainfrom
17481_audio_video

Conversation

@sergei-bronnikov
Copy link
Copy Markdown

@sergei-bronnikov sergei-bronnikov commented Mar 16, 2026

https://bugtracker.codiodev.com/issue/codio-17481/Add-BricksLLM-support-for-OpenAI-Text-to-Speech-and-Speech-to-Text-model-allowlist-audio-routing

Summary by CodeRabbit

  • New Features
    • Added support for new audio transcription models (gpt-4o-transcribe and gpt-4o-mini variants).
    • Introduced video generation and processing endpoints with full proxy support.
    • Enhanced transcription cost calculation using token-based usage tracking.
    • Added video cost estimation based on model and resolution.

destitutus
destitutus previously approved these changes Mar 18, 2026
@sergei-bronnikov
Copy link
Copy Markdown
Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 10, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 10, 2026

Walkthrough

This PR extends OpenAI provider support by adding cost estimation for audio transcription/translation and video processing, introducing corresponding type definitions, implementing proxy handlers to route new audio models to specialized processors, and registering new video proxy routes with cost estimation.

Changes

Cohort / File(s) Summary
Audio & Video Cost Estimation
internal/provider/openai/cost.go, internal/provider/openai/types.go
Added audio price entries for gpt-4o-transcribe, gpt-4o-transcribe-diarize, gpt-4o-mini-transcribe, and gpt-4o-mini-tts; introduced transcription-input and transcription-output cost maps; added video cost map keyed by normalized resolution (720/1024/1080). New types added for VideoResponseMetadata, TranscriptionResponse, TranscriptionStreamChunk, and helper methods for parsing metadata and chunk classification.
Transcription/Translation Proxy
internal/server/web/proxy/audio.go, internal/server/web/proxy/audio_extended.go
Updated getTranscriptionsHandler and getTranslationsHandler to branch on model name and delegate gpt-4o-transcribe* and gpt-4o-mini-transcribe models to new processors. Implemented processGPTTranscriptions, processGPTTranslations, and shared processGPTAudio handler with support for non-streaming JSON/text responses and streaming SSE responses; cost is estimated from token usage when available.
Video Proxy Handler
internal/server/web/proxy/video.go
Implemented getVideoHandler to proxy video requests to OpenAI, handling cost estimation for POST requests via EstimateVideoCost, forwarding response headers, and handling both success and error responses with telemetry recording.
Proxy Interface & Routing
internal/server/web/proxy/middleware.go, internal/server/web/proxy/proxy.go
Updated estimator interface to add usage parameter to EstimateTranscriptionCost and new EstimateVideoCost method. Registered new HTTP routes for video collection and resource endpoints (/api/providers/openai/v1/videos and variants) to getVideoHandler.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler as Audio Handler
    participant Processor as GPT Audio<br/>Processor
    participant OpenAI as OpenAI API
    participant Estimator

    Client->>Handler: POST /audio/transcriptions<br/>(model: gpt-4o-transcribe)
    Handler->>Handler: Extract model from form
    Handler->>Handler: Route check for gpt-4o-*
    Handler->>Processor: processGPTTranscriptions(...)
    Processor->>Processor: Validate request & context
    Processor->>Processor: Build http.Request<br/>(multipart form data)
    Processor->>Processor: Detect streaming mode
    Processor->>Processor: Modify request<br/>(response_format handling)
    Processor->>OpenAI: Execute request
    OpenAI-->>Processor: Non-streaming: 200 OK<br/>TranscriptionResponse
    Processor->>Processor: Unmarshal response
    Processor->>Estimator: EstimateTranscriptionCost<br/>(secs, model, usage)
    Estimator-->>Processor: costInUsd
    Processor->>Processor: Store costInUsd in context
    Processor-->>Client: JSON or text response
    
    Note over Processor,OpenAI: Streaming path:
    OpenAI-->>Processor: newline-delimited chunks
    loop For each SSE chunk
        Processor->>Processor: Unmarshal TranscriptionStreamChunk
        Processor->>Processor: Extract delta/text
        Processor->>Processor: Check if IsDone()
        alt Chunk is done
            Processor->>Estimator: EstimateTranscriptionCost<br/>(accumulated usage)
            Estimator-->>Processor: final costInUsd
        end
        Processor-->>Client: SSE event
    end
    Processor-->>Client: SSE [DONE]
Loading
sequenceDiagram
    participant Client
    participant Handler as Video Handler
    participant Validator as URL Builder
    participant OpenAI as OpenAI API
    participant Estimator

    Client->>Handler: POST/GET/DELETE<br/>/api/providers/openai/v1/videos
    Handler->>Handler: Validate request & context
    Handler->>Validator: constructVideoURL(path)
    Validator-->>Handler: https://api.openai.com/...
    Handler->>Handler: Create http.Request<br/>(copy method, body, headers)
    Handler->>OpenAI: Execute request
    alt Success (200)
        OpenAI-->>Handler: VideoResponseMetadata
        Handler->>Handler: Unmarshal response
        alt POST request (paid)
            Handler->>Estimator: EstimateVideoCost<br/>(metadata)
            Estimator-->>Handler: costInUsd
            Handler->>Handler: Store costInUsd in context
        end
        Handler-->>Client: Status 200 + response body
    else Error (non-200)
        OpenAI-->>Handler: Error response
        Handler->>Handler: Unmarshal ErrorResponse
        Handler->>Handler: Log error details
        Handler-->>Client: Original status + error body
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • 17073 gpt 5 support #9: Modifies OpenAI provider cost estimation and types alongside the estimator interface in middleware.go, enabling token-based cost calculation.
  • images/ #16: Extends OpenAI provider type definitions and OpenAiPerThousandTokenCost maps for additional media processing features (video in this PR, images in the referenced PR).

Suggested reviewers

  • destitutus
  • AndreyNikitin
🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'videos' is vague and generic, using a single non-descriptive term that does not convey meaningful information about the changeset's primary objectives. Consider a more descriptive title that captures the main changes, such as 'Add OpenAI videos, transcription, and translation proxy endpoints' or 'Support GPT-4o audio/video models and streaming transcriptions'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 17481_audio_video

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
internal/server/web/proxy/audio.go (1)

172-176: Extract the GPT audio-model check into one helper.

The same hard-coded model list now drives branching in both handlers. A shared predicate keeps transcription/translation routing from drifting when this allowlist changes again.

Also applies to: 342-346

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/web/proxy/audio.go` around lines 172 - 176, Extract the
hard-coded allowlist into a single helper predicate (e.g.,
isGPTTranscriptionModel(model string) bool) that returns true for
"gpt-4o-transcribe", "gpt-4o-transcribe-diarize", and "gpt-4o-mini-transcribe";
then replace the inline checks in the handler around the model variable and the
other duplicated branch (the block that currently calls
processGPTTranscriptions(c, prod, client, e, model) and the similar block at the
later location) to call this helper instead so both routing points share the
same source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/provider/openai/cost.go`:
- Around line 820-838: EstimateVideoCost currently mandates a model-size lookup
and errors when metadata.Size is absent, but the cost map may contain fallback
keys like "sora-2" (model-only). Change EstimateVideoCost to try lookups in
order: 1) if size is present/normalized, try "model-size"; 2) if that fails (or
size missing/normalization returns an empty/expected-error), try the model-only
key "model"; and only return an error if neither key exists in
ce.tokenCostMap["video"]. Handle normalization errors by treating missing size
as absent (do not immediately return), and update the same lookup logic in the
analogous image pricing function (the one around lines 841-852) so both video
and image cost resolution use the model-then-model-only fallback.

In `@internal/provider/openai/types.go`:
- Around line 101-106: The current GetSecondsAsFloat silently returns 0 on parse
failure which causes EstimateVideoCost to under-bill; change GetSecondsAsFloat
to return (float64, error) (or add a new GetSecondsAsFloatSafe that returns
(float64, error)) and propagate/handle the error in callers like
EstimateVideoCost and any other call sites, validating v.Seconds before using it
and returning/propagating the parse error instead of treating malformed or
missing seconds as 0 so billing is correct.

In `@internal/server/web/proxy/audio_extended.go`:
- Around line 65-67: The call to modifyGPTTranscriptionsRequest currently
swallows multipart-rewrite failures by writing responses internally and
returning void, causing processGPTAudio to continue and later call
client.Do(req) which can double-write the response; change
modifyGPTTranscriptionsRequest to return an error (or a bool + error) and in
processGPTAudio (where modifyGPTTranscriptionsRequest(ginCtx, prod, log, req,
handler) is invoked) check that return value and immediately return from
processGPTAudio if an error/non-ok is returned (so you don't proceed to
client.Do(req)); update all other call sites in the same file (including the
block around lines 235-286) to handle the new return and propagate or log the
error appropriately.
- Around line 57-67: The streaming branch reads form values (ginCtx.PostForm)
which drains multipart request bodies, so when isStreaming is true the audio
payload can be lost because modifyGPTTranscriptionsRequest (which rebuilds the
multipart body) is skipped; ensure the multipart body is reconstructed after any
form parsing regardless of isStreaming by calling or inlining the same
body-rebuild logic used in modifyGPTTranscriptionsRequest (or factoring that
logic into a helper) before proxying in the streaming path so req.Body contains
the full multipart payload for upstream.

In `@internal/server/web/proxy/proxy.go`:
- Around line 107-114: The video routes call getVideoHandler but never set the
request's model value before the later isModelAllowed / isModelSupported checks,
letting empty model pass; update the routing/middleware so the model is
extracted and set on the request context for all video endpoints (e.g., in the
same middleware that other routes use) by reading the model from the incoming
payload/form or defaulting to the intended video model, ensuring getVideoHandler
sees a non-empty model and that isModelAllowed / isModelSupported are enforced
for routes such as the POST/GET/DELETE handlers for
"/api/providers/openai/v1/videos" and its subpaths.

---

Nitpick comments:
In `@internal/server/web/proxy/audio.go`:
- Around line 172-176: Extract the hard-coded allowlist into a single helper
predicate (e.g., isGPTTranscriptionModel(model string) bool) that returns true
for "gpt-4o-transcribe", "gpt-4o-transcribe-diarize", and
"gpt-4o-mini-transcribe"; then replace the inline checks in the handler around
the model variable and the other duplicated branch (the block that currently
calls processGPTTranscriptions(c, prod, client, e, model) and the similar block
at the later location) to call this helper instead so both routing points share
the same source of truth.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7dce883e-09e8-40eb-ac83-14fcdb5c7c91

📥 Commits

Reviewing files that changed from the base of the PR and between ecbe3e2 and 9289ffe.

📒 Files selected for processing (7)
  • internal/provider/openai/cost.go
  • internal/provider/openai/types.go
  • internal/server/web/proxy/audio.go
  • internal/server/web/proxy/audio_extended.go
  • internal/server/web/proxy/middleware.go
  • internal/server/web/proxy/proxy.go
  • internal/server/web/proxy/video.go

Comment thread internal/provider/openai/cost.go Outdated
Comment thread internal/provider/openai/types.go Outdated
Comment thread internal/server/web/proxy/audio_extended.go
Comment thread internal/server/web/proxy/audio_extended.go Outdated
Comment thread internal/server/web/proxy/proxy.go
@sergei-bronnikov sergei-bronnikov merged commit 42cf848 into main Apr 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants