Skip to content

fix: resolve benchmark prompt assets without asserting concrete asset types#625

Open
RapidPoseidon wants to merge 1 commit into
mainfrom
fix/benchmark-prompt-asset-resolution
Open

fix: resolve benchmark prompt assets without asserting concrete asset types#625
RapidPoseidon wants to merge 1 commit into
mainfrom
fix/benchmark-prompt-asset-resolution

Conversation

@RapidPoseidon

Copy link
Copy Markdown
Contributor

Problem

RapidataBenchmark.identifiers (and prompts / prompt_assets / tags) crashes on any benchmark whose prompts have file assets — which blocks add_model and add_prompts, since both read self.identifiers internally.

__instantiate_prompts did:

assert isinstance(prompt.prompt_asset.actual_instance, FileAssetModel)
source_url = prompt.prompt_asset.actual_instance.metadata["sourceUrl"].actual_instance
assert isinstance(source_url, SourceUrlMetadataModel)

Neither assumption holds against the current API response:

  1. prompt.prompt_asset is an IAssetModel oneOf wrapper; its .actual_instance is an IAssetModel* variant (IAssetModelFileAssetModel / …MultiAssetModel / …TextAssetModel / …NullAssetModel) — never the bare FileAssetModel. → AssertionError.
  2. metadata is Dict[str, IMetadataModel] and only carries a "sourceUrl" entry for URL-registered assets. File-uploaded prompts only have originalFilename / imageDimension, so metadata["sourceUrl"] would KeyError.
  3. The metadata variant is IMetadataModelSourceUrlMetadataModel, not the bare SourceUrlMetadataModel, so the second assert is wrong too.

Reproduced live against a real file-asset benchmark (499 prompts): benchmark.identifiers raised AssertionError before this change.

Fix

Resolve the source URL defensively instead of asserting concrete types, mirroring the existing pattern in RapidataFlowItem._extract_asset_key():

@staticmethod
def __source_url(asset):
    instance = getattr(asset, "actual_instance", None)
    metadata = getattr(instance, "metadata", None)
    if not metadata:
        return None
    source_url = metadata.get("sourceUrl")
    actual = getattr(source_url, "actual_instance", None)
    return getattr(actual, "url", None)

prompt_assets now yields the source URL when present, else None for file-uploaded assets that have no public URL in the response — the old code could never have returned a URL for those anyway; it crashed.

Verification

  • Live against the real benchmark (read-only, creates nothing): identifiers, prompts, prompt_assets, tags all populate (499 each) with no exception; add_model's identifier-membership check passes.
  • uv run black src/rapidata/rapidata_client → unchanged.
  • uv run pyright src/rapidata/rapidata_client/benchmark/rapidata_benchmark.py → 0 errors, 0 warnings.

Notes

  • No version bump — versioning/publishing is CI-automated (release_and_publish.yml).
  • No test added — the repo has no test framework yet; a regression test for this (IAssetModel* wrapper + missing sourceUrl) would be worth adding once test infra exists.
  • Discovered while adding participants to a mock benchmark in Image-Editing-Benchmarks (RapidataAI/Image-Editing-Benchmarks#6), which currently ships a consumer-side cache-injection workaround that can be removed once this releases.

🔗 Session: ${POSEIDON_SESSION_URL:-$HOSTNAME}

🤖 Generated with Claude Code

… types

RapidataBenchmark.__instantiate_prompts asserted that each prompt's
prompt_asset.actual_instance was a FileAssetModel and then indexed
metadata["sourceUrl"]. Neither holds for the current API response: the
prompts endpoint returns an IAssetModel oneOf wrapper whose actual_instance
is an IAssetModel* variant (file/multi/text/null), and only URL-registered
assets carry a "sourceUrl" metadata entry. File-uploaded prompts only have
originalFilename/imageDimension, so the assert raised AssertionError (and the
index would otherwise KeyError) on any benchmark with file assets.

Because identifiers/prompts/prompt_assets/tags all lazily call this method, and
add_model/add_prompts read self.identifiers, the crash blocked the whole
participant-adding flow.

Resolve the source URL defensively (mirroring RapidataFlowItem._extract_asset_key):
navigate actual_instance -> metadata -> "sourceUrl" with getattr/.get and fall
back to None when no URL is present, rather than asserting concrete types.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: karl <karl@rapidata.ai>
@claude

claude Bot commented Jun 18, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR fixes a real crash in RapidataBenchmark.__instantiate_prompts when processing benchmarks whose prompts have file assets. The old code made two incorrect assumptions about the auto-generated API client's response structure — asserting bare FileAssetModel/SourceUrlMetadataModel types that are actually wrapped in IAssetModel*/IMetadataModel* oneOf variants — causing an AssertionError on every file-asset benchmark and blocking add_model/add_prompts as a result. The fix is targeted and correct.


Correctness

  • The root cause is accurately diagnosed: the OpenAPI client wraps concrete types in oneOf holder objects (IAssetModel), so the old assert isinstance(..., FileAssetModel) could never pass.
  • Passing None to __source_url is handled gracefully — getattr(None, "actual_instance", None) returns None, so the explicit if prompt.prompt_asset is None branch is correctly collapsed into the single call.
  • The metadata.get("sourceUrl") call is safe after the if not metadata guard.
  • Returning None for file-uploaded assets (which have no sourceUrl in the response) preserves the pre-existing semantics — the old code would have crashed before ever returning anything for those cases anyway.

Code Quality

Minor style concern — docstring length on a private method. The __source_url docstring is 6 prose lines explaining the oneOf wrapper structure. CLAUDE.md asks to keep comments to a minimum and avoid multi-line blocks unless the why is non-obvious. Here the why is genuinely non-obvious (a subtle API contract), so some comment is warranted, but the current block is verbose for a private static helper. Consider condensing to one or two lines:

# IAssetModel is a oneOf wrapper; only URL-registered assets carry a "sourceUrl"
# metadata entry — navigate defensively rather than asserting concrete types.

That captures the essential constraint without repeating the full type hierarchy already visible in the surrounding code.

TYPE_CHECKING Import

IAssetModel is correctly placed under TYPE_CHECKING, consistent with the project convention from CLAUDE.md.

Tooling

PR description confirms:

  • uv run black — no changes
  • uv run pyright src/rapidata/rapidata_client/benchmark/rapidata_benchmark.py — 0 errors, 0 warnings

Potential Follow-ups (not blocking)

  1. Multi-asset / text-asset prompt typesIAssetModelMultiAssetModel and IAssetModelTextAssetModel variants presumably also lack a sourceUrl; they will silently return None here. That is likely correct behavior, but worth a note so callers of prompt_assets know a None entry can mean "file-uploaded or non-URL asset type", not just "no asset".
  2. Test coverage — the PR correctly notes the repo has no test framework yet. A regression test covering the IAssetModel oneOf wrapper + missing sourceUrl path would be a high-value first test once infra exists.

Summary

Clean, well-scoped fix. The defensive getattr navigation mirrors the existing pattern in RapidataFlowItem._extract_asset_key and is the right approach for working with auto-generated oneOf wrappers. The only non-blocking suggestion is trimming the docstring to match the project's comment style. Approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant