videos by sergei-bronnikov · Pull Request #18 · codio/BricksLLM

sergei-bronnikov · 2026-03-16T13:17:57Z

https://bugtracker.codiodev.com/issue/codio-17481/Add-BricksLLM-support-for-OpenAI-Text-to-Speech-and-Speech-to-Text-model-allowlist-audio-routing

Summary by CodeRabbit

New Features
- Added support for new audio transcription models (gpt-4o-transcribe and gpt-4o-mini variants).
- Introduced video generation and processing endpoints with full proxy support.
- Enhanced transcription cost calculation using token-based usage tracking.
- Added video cost estimation based on model and resolution.

sergei-bronnikov · 2026-04-10T12:57:57Z

@coderabbitai full review

coderabbitai · 2026-04-10T12:58:03Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-04-10T13:12:51Z

Walkthrough

This PR extends OpenAI provider support by adding cost estimation for audio transcription/translation and video processing, introducing corresponding type definitions, implementing proxy handlers to route new audio models to specialized processors, and registering new video proxy routes with cost estimation.

Changes

Cohort / File(s)	Summary
Audio & Video Cost Estimation `internal/provider/openai/cost.go`, `internal/provider/openai/types.go`	Added audio price entries for `gpt-4o-transcribe`, `gpt-4o-transcribe-diarize`, `gpt-4o-mini-transcribe`, and `gpt-4o-mini-tts`; introduced `transcription-input` and `transcription-output` cost maps; added `video` cost map keyed by normalized resolution (720/1024/1080). New types added for `VideoResponseMetadata`, `TranscriptionResponse`, `TranscriptionStreamChunk`, and helper methods for parsing metadata and chunk classification.
Transcription/Translation Proxy `internal/server/web/proxy/audio.go`, `internal/server/web/proxy/audio_extended.go`	Updated `getTranscriptionsHandler` and `getTranslationsHandler` to branch on model name and delegate `gpt-4o-transcribe*` and `gpt-4o-mini-transcribe` models to new processors. Implemented `processGPTTranscriptions`, `processGPTTranslations`, and shared `processGPTAudio` handler with support for non-streaming JSON/text responses and streaming SSE responses; cost is estimated from token usage when available.
Video Proxy Handler `internal/server/web/proxy/video.go`	Implemented `getVideoHandler` to proxy video requests to OpenAI, handling cost estimation for POST requests via `EstimateVideoCost`, forwarding response headers, and handling both success and error responses with telemetry recording.
Proxy Interface & Routing `internal/server/web/proxy/middleware.go`, `internal/server/web/proxy/proxy.go`	Updated `estimator` interface to add `usage` parameter to `EstimateTranscriptionCost` and new `EstimateVideoCost` method. Registered new HTTP routes for video collection and resource endpoints (`/api/providers/openai/v1/videos` and variants) to `getVideoHandler`.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler as Audio Handler
    participant Processor as GPT Audio<br/>Processor
    participant OpenAI as OpenAI API
    participant Estimator

    Client->>Handler: POST /audio/transcriptions<br/>(model: gpt-4o-transcribe)
    Handler->>Handler: Extract model from form
    Handler->>Handler: Route check for gpt-4o-*
    Handler->>Processor: processGPTTranscriptions(...)
    Processor->>Processor: Validate request & context
    Processor->>Processor: Build http.Request<br/>(multipart form data)
    Processor->>Processor: Detect streaming mode
    Processor->>Processor: Modify request<br/>(response_format handling)
    Processor->>OpenAI: Execute request
    OpenAI-->>Processor: Non-streaming: 200 OK<br/>TranscriptionResponse
    Processor->>Processor: Unmarshal response
    Processor->>Estimator: EstimateTranscriptionCost<br/>(secs, model, usage)
    Estimator-->>Processor: costInUsd
    Processor->>Processor: Store costInUsd in context
    Processor-->>Client: JSON or text response
    
    Note over Processor,OpenAI: Streaming path:
    OpenAI-->>Processor: newline-delimited chunks
    loop For each SSE chunk
        Processor->>Processor: Unmarshal TranscriptionStreamChunk
        Processor->>Processor: Extract delta/text
        Processor->>Processor: Check if IsDone()
        alt Chunk is done
            Processor->>Estimator: EstimateTranscriptionCost<br/>(accumulated usage)
            Estimator-->>Processor: final costInUsd
        end
        Processor-->>Client: SSE event
    end
    Processor-->>Client: SSE [DONE]

sequenceDiagram
    participant Client
    participant Handler as Video Handler
    participant Validator as URL Builder
    participant OpenAI as OpenAI API
    participant Estimator

    Client->>Handler: POST/GET/DELETE<br/>/api/providers/openai/v1/videos
    Handler->>Handler: Validate request & context
    Handler->>Validator: constructVideoURL(path)
    Validator-->>Handler: https://api.openai.com/...
    Handler->>Handler: Create http.Request<br/>(copy method, body, headers)
    Handler->>OpenAI: Execute request
    alt Success (200)
        OpenAI-->>Handler: VideoResponseMetadata
        Handler->>Handler: Unmarshal response
        alt POST request (paid)
            Handler->>Estimator: EstimateVideoCost<br/>(metadata)
            Estimator-->>Handler: costInUsd
            Handler->>Handler: Store costInUsd in context
        end
        Handler-->>Client: Status 200 + response body
    else Error (non-200)
        OpenAI-->>Handler: Error response
        Handler->>Handler: Unmarshal ErrorResponse
        Handler->>Handler: Log error details
        Handler-->>Client: Original status + error body
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

17073 gpt 5 support #9: Modifies OpenAI provider cost estimation and types alongside the estimator interface in middleware.go, enabling token-based cost calculation.
images/ #16: Extends OpenAI provider type definitions and OpenAiPerThousandTokenCost maps for additional media processing features (video in this PR, images in the referenced PR).

Suggested reviewers

destitutus
AndreyNikitin

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'videos' is vague and generic, using a single non-descriptive term that does not convey meaningful information about the changeset's primary objectives.	Consider a more descriptive title that captures the main changes, such as 'Add OpenAI videos, transcription, and translation proxy endpoints' or 'Support GPT-4o audio/video models and streaming transcriptions'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch 17481_audio_video

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

internal/server/web/proxy/audio.go (1)
172-176: Extract the GPT audio-model check into one helper.

The same hard-coded model list now drives branching in both handlers. A shared predicate keeps transcription/translation routing from drifting when this allowlist changes again.

Also applies to: 342-346
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/web/proxy/audio.go` around lines 172 - 176, Extract the
hard-coded allowlist into a single helper predicate (e.g.,
isGPTTranscriptionModel(model string) bool) that returns true for
"gpt-4o-transcribe", "gpt-4o-transcribe-diarize", and "gpt-4o-mini-transcribe";
then replace the inline checks in the handler around the model variable and the
other duplicated branch (the block that currently calls
processGPTTranscriptions(c, prod, client, e, model) and the similar block at the
later location) to call this helper instead so both routing points share the
same source of truth.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/provider/openai/cost.go`:
- Around line 820-838: EstimateVideoCost currently mandates a model-size lookup
and errors when metadata.Size is absent, but the cost map may contain fallback
keys like "sora-2" (model-only). Change EstimateVideoCost to try lookups in
order: 1) if size is present/normalized, try "model-size"; 2) if that fails (or
size missing/normalization returns an empty/expected-error), try the model-only
key "model"; and only return an error if neither key exists in
ce.tokenCostMap["video"]. Handle normalization errors by treating missing size
as absent (do not immediately return), and update the same lookup logic in the
analogous image pricing function (the one around lines 841-852) so both video
and image cost resolution use the model-then-model-only fallback.

In `@internal/provider/openai/types.go`:
- Around line 101-106: The current GetSecondsAsFloat silently returns 0 on parse
failure which causes EstimateVideoCost to under-bill; change GetSecondsAsFloat
to return (float64, error) (or add a new GetSecondsAsFloatSafe that returns
(float64, error)) and propagate/handle the error in callers like
EstimateVideoCost and any other call sites, validating v.Seconds before using it
and returning/propagating the parse error instead of treating malformed or
missing seconds as 0 so billing is correct.

In `@internal/server/web/proxy/audio_extended.go`:
- Around line 65-67: The call to modifyGPTTranscriptionsRequest currently
swallows multipart-rewrite failures by writing responses internally and
returning void, causing processGPTAudio to continue and later call
client.Do(req) which can double-write the response; change
modifyGPTTranscriptionsRequest to return an error (or a bool + error) and in
processGPTAudio (where modifyGPTTranscriptionsRequest(ginCtx, prod, log, req,
handler) is invoked) check that return value and immediately return from
processGPTAudio if an error/non-ok is returned (so you don't proceed to
client.Do(req)); update all other call sites in the same file (including the
block around lines 235-286) to handle the new return and propagate or log the
error appropriately.
- Around line 57-67: The streaming branch reads form values (ginCtx.PostForm)
which drains multipart request bodies, so when isStreaming is true the audio
payload can be lost because modifyGPTTranscriptionsRequest (which rebuilds the
multipart body) is skipped; ensure the multipart body is reconstructed after any
form parsing regardless of isStreaming by calling or inlining the same
body-rebuild logic used in modifyGPTTranscriptionsRequest (or factoring that
logic into a helper) before proxying in the streaming path so req.Body contains
the full multipart payload for upstream.

In `@internal/server/web/proxy/proxy.go`:
- Around line 107-114: The video routes call getVideoHandler but never set the
request's model value before the later isModelAllowed / isModelSupported checks,
letting empty model pass; update the routing/middleware so the model is
extracted and set on the request context for all video endpoints (e.g., in the
same middleware that other routes use) by reading the model from the incoming
payload/form or defaulting to the intended video model, ensuring getVideoHandler
sees a non-empty model and that isModelAllowed / isModelSupported are enforced
for routes such as the POST/GET/DELETE handlers for
"/api/providers/openai/v1/videos" and its subpaths.

---

Nitpick comments:
In `@internal/server/web/proxy/audio.go`:
- Around line 172-176: Extract the hard-coded allowlist into a single helper
predicate (e.g., isGPTTranscriptionModel(model string) bool) that returns true
for "gpt-4o-transcribe", "gpt-4o-transcribe-diarize", and
"gpt-4o-mini-transcribe"; then replace the inline checks in the handler around
the model variable and the other duplicated branch (the block that currently
calls processGPTTranscriptions(c, prod, client, e, model) and the similar block
at the later location) to call this helper instead so both routing points share
the same source of truth.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7dce883e-09e8-40eb-ac83-14fcdb5c7c91

📥 Commits

Reviewing files that changed from the base of the PR and between ecbe3e2 and 9289ffe.

📒 Files selected for processing (7)

internal/provider/openai/cost.go
internal/provider/openai/types.go
internal/server/web/proxy/audio.go
internal/server/web/proxy/audio_extended.go
internal/server/web/proxy/middleware.go
internal/server/web/proxy/proxy.go
internal/server/web/proxy/video.go

sergei-bronnikov added 2 commits March 16, 2026 13:10

videos

bbdd7fd

fix video resp. audio models

c18dbd2

destitutus previously approved these changes Mar 18, 2026

View reviewed changes

processGPTTranscriptions

b35399b

sergei-bronnikov dismissed destitutus’s stale review via b35399b April 9, 2026 14:50

sergei-bronnikov added 3 commits April 10, 2026 09:24

estimate

b910577

stream

ac9c888

translation

9289ffe

coderabbitai Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread internal/provider/openai/cost.go Outdated

Comment thread internal/provider/openai/types.go Outdated

Comment thread internal/server/web/proxy/audio_extended.go

Comment thread internal/server/web/proxy/audio_extended.go Outdated

Comment thread internal/server/web/proxy/proxy.go

sergei-bronnikov added 2 commits April 14, 2026 11:27

fixes

7ef9da1

fixes

b53535b

destitutus approved these changes Apr 14, 2026

View reviewed changes

sergei-bronnikov merged commit 42cf848 into main Apr 15, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

videos#18

videos#18
sergei-bronnikov merged 8 commits intomainfrom
17481_audio_video

sergei-bronnikov commented Mar 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

sergei-bronnikov commented Apr 10, 2026

Uh oh!

coderabbitai Bot commented Apr 10, 2026

Uh oh!

coderabbitai Bot commented Apr 10, 2026

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sergei-bronnikov commented Mar 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

sergei-bronnikov commented Apr 10, 2026

Uh oh!

coderabbitai Bot commented Apr 10, 2026

Uh oh!

coderabbitai Bot commented Apr 10, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sergei-bronnikov commented Mar 16, 2026 •

edited by coderabbitai Bot

Loading