Skip to content

feat(evals): add skia category with 20 react-native-skia evals#377

Open
andriicallstack wants to merge 15 commits into
mainfrom
feat/skia-evals
Open

feat(evals): add skia category with 20 react-native-skia evals#377
andriicallstack wants to merge 15 commits into
mainfrom
feat/skia-evals

Conversation

@andriicallstack
Copy link
Copy Markdown
Collaborator

Summary

Adds a new skia eval category with 20 evaluations covering the core @shopify/react-native-skia API surface.

Each eval includes a focused prompt, atomic requirements, and a reference implementation. A category README.md with a best-practice inventory and eval traceability matrix is also included.

Evals added

# Eval Focus
01 canvas-fill-background Canvas, Fill, useCanvasSize
02 shape-primitives Rect, Circle, RoundedRect, Line
03 path-drawing Path, Skia.Path.Make()
04 paint-stroke-fill Paint, stroke vs fill
05 linear-gradient LinearGradient, vec
06 radial-gradient RadialGradient
07 image-display Image, useImage
08 text-rendering Text, matchFont
09 blur-filter Blur
10 color-matrix-filter ColorMatrix
11 reanimated-basic-animation useSharedValue, withRepeat, withTiming
12 derived-value-animation useDerivedValue
13 animated-color-interpolation interpolateColors
14 gesture-pan GestureDetector, Gesture.Pan
15 transforms transform, Group
16 clip-rect-and-path ClipRect, ClipPath
17 blend-mode blendMode
18 svg-path-rendering Skia.Path.MakeFromSVGString
19 runtime-effect-shader Skia.RuntimeEffect.Make, GLSL
20 canvas-snapshot useCanvasRef, makeImageSnapshot

Runner fixes

  • Apply extractJsonMiddleware in both the solver and judge client to handle Claude wrapping JSON responses in markdown code fences (AI_NoObjectGeneratedError)
  • Improve ensureOpencodeServerStarted to reuse an already-running OpenCode server instead of attempting a duplicate startup, and forward ANTHROPIC_API_KEY to newly spawned servers

Baseline scores

Solver and judge: anthropic/claude-sonnet-4-5

Score Evals
100% 01, 02, 04, 06, 07, 09, 10, 11, 12, 13, 16, 17
75% 03, 05, 08, 14, 15, 18, 20
50% 19

Average: ~88%

@artus9033 artus9033 requested review from artus9033 and grabbou and removed request for artus9033 May 19, 2026 13:04
@artus9033
Copy link
Copy Markdown
Contributor

I'll take a look at the runner changes. @lech-kalinowski can you take a look at those opencode-related changes?

@artus9033 artus9033 self-requested a review May 20, 2026 12:36
@artus9033 artus9033 self-assigned this May 20, 2026
@artus9033 artus9033 added the eval-content Eval category content label May 20, 2026
@artus9033 artus9033 requested a review from Copilot May 20, 2026 12:50
Comment thread evals/skia/01-rn-skia-canvas-fill-background/reference/App.tsx
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new evals/skia category to expand the benchmark suite with 20 focused React Native Skia evaluations, and updates the runner’s OpenCode integration to be more robust (JSON extraction middleware + server reuse/config forwarding).

Changes:

  • Add evals/skia category README plus 20 evals (each with prompt, requirements, and reference implementation).
  • Update solver and judge LLM clients to use extractJsonMiddleware via wrapLanguageModel to handle fenced/embedded JSON.
  • Improve OpenCode server startup to reuse an existing server and forward ANTHROPIC_API_KEY when spawning.

Reviewed changes

Copilot reviewed 64 out of 64 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
runner/utils/opencode.ts Reuse existing OpenCode server on a port; add default port and provider config forwarding.
runner/solver/index.ts Apply JSON-extraction middleware to solver model calls.
runner/evaluators/llm/judge-client.ts Apply JSON-extraction middleware to judge model calls.
evals/skia/README.md Category documentation: best-practice inventory, traceability matrix, and issue clusters.
evals/skia/01-rn-skia-canvas-fill-background/prompt.md Eval 01 prompt.
evals/skia/01-rn-skia-canvas-fill-background/requirements.yaml Eval 01 requirements.
evals/skia/01-rn-skia-canvas-fill-background/reference/App.tsx Eval 01 reference implementation.
evals/skia/02-rn-skia-shape-primitives/prompt.md Eval 02 prompt.
evals/skia/02-rn-skia-shape-primitives/requirements.yaml Eval 02 requirements.
evals/skia/02-rn-skia-shape-primitives/reference/App.tsx Eval 02 reference implementation.
evals/skia/03-rn-skia-path-drawing/prompt.md Eval 03 prompt.
evals/skia/03-rn-skia-path-drawing/requirements.yaml Eval 03 requirements.
evals/skia/03-rn-skia-path-drawing/reference/App.tsx Eval 03 reference implementation.
evals/skia/04-rn-skia-paint-stroke-fill/prompt.md Eval 04 prompt.
evals/skia/04-rn-skia-paint-stroke-fill/requirements.yaml Eval 04 requirements.
evals/skia/04-rn-skia-paint-stroke-fill/reference/App.tsx Eval 04 reference implementation.
evals/skia/05-rn-skia-linear-gradient/prompt.md Eval 05 prompt.
evals/skia/05-rn-skia-linear-gradient/requirements.yaml Eval 05 requirements.
evals/skia/05-rn-skia-linear-gradient/reference/App.tsx Eval 05 reference implementation.
evals/skia/06-rn-skia-radial-gradient/prompt.md Eval 06 prompt.
evals/skia/06-rn-skia-radial-gradient/requirements.yaml Eval 06 requirements.
evals/skia/06-rn-skia-radial-gradient/reference/App.tsx Eval 06 reference implementation.
evals/skia/07-rn-skia-image-display/prompt.md Eval 07 prompt.
evals/skia/07-rn-skia-image-display/requirements.yaml Eval 07 requirements.
evals/skia/07-rn-skia-image-display/reference/App.tsx Eval 07 reference implementation.
evals/skia/08-rn-skia-text-rendering/prompt.md Eval 08 prompt.
evals/skia/08-rn-skia-text-rendering/requirements.yaml Eval 08 requirements.
evals/skia/08-rn-skia-text-rendering/reference/App.tsx Eval 08 reference implementation.
evals/skia/09-rn-skia-blur-filter/prompt.md Eval 09 prompt.
evals/skia/09-rn-skia-blur-filter/requirements.yaml Eval 09 requirements.
evals/skia/09-rn-skia-blur-filter/reference/App.tsx Eval 09 reference implementation.
evals/skia/10-rn-skia-color-matrix-filter/prompt.md Eval 10 prompt.
evals/skia/10-rn-skia-color-matrix-filter/requirements.yaml Eval 10 requirements.
evals/skia/10-rn-skia-color-matrix-filter/reference/App.tsx Eval 10 reference implementation.
evals/skia/11-rn-skia-reanimated-basic-animation/prompt.md Eval 11 prompt.
evals/skia/11-rn-skia-reanimated-basic-animation/requirements.yaml Eval 11 requirements.
evals/skia/11-rn-skia-reanimated-basic-animation/reference/App.tsx Eval 11 reference implementation.
evals/skia/12-rn-skia-derived-value-animation/prompt.md Eval 12 prompt.
evals/skia/12-rn-skia-derived-value-animation/requirements.yaml Eval 12 requirements.
evals/skia/12-rn-skia-derived-value-animation/reference/App.tsx Eval 12 reference implementation.
evals/skia/13-rn-skia-animated-color-interpolation/prompt.md Eval 13 prompt.
evals/skia/13-rn-skia-animated-color-interpolation/requirements.yaml Eval 13 requirements.
evals/skia/13-rn-skia-animated-color-interpolation/reference/App.tsx Eval 13 reference implementation.
evals/skia/14-rn-skia-gesture-pan/prompt.md Eval 14 prompt.
evals/skia/14-rn-skia-gesture-pan/requirements.yaml Eval 14 requirements.
evals/skia/14-rn-skia-gesture-pan/reference/App.tsx Eval 14 reference implementation.
evals/skia/15-rn-skia-transforms/prompt.md Eval 15 prompt.
evals/skia/15-rn-skia-transforms/requirements.yaml Eval 15 requirements.
evals/skia/15-rn-skia-transforms/reference/App.tsx Eval 15 reference implementation.
evals/skia/16-rn-skia-clip-rect-and-path/prompt.md Eval 16 prompt.
evals/skia/16-rn-skia-clip-rect-and-path/requirements.yaml Eval 16 requirements.
evals/skia/16-rn-skia-clip-rect-and-path/reference/App.tsx Eval 16 reference implementation.
evals/skia/17-rn-skia-blend-mode/prompt.md Eval 17 prompt.
evals/skia/17-rn-skia-blend-mode/requirements.yaml Eval 17 requirements.
evals/skia/17-rn-skia-blend-mode/reference/App.tsx Eval 17 reference implementation.
evals/skia/18-rn-skia-svg-path-rendering/prompt.md Eval 18 prompt.
evals/skia/18-rn-skia-svg-path-rendering/requirements.yaml Eval 18 requirements.
evals/skia/18-rn-skia-svg-path-rendering/reference/App.tsx Eval 18 reference implementation.
evals/skia/19-rn-skia-runtime-effect-shader/prompt.md Eval 19 prompt.
evals/skia/19-rn-skia-runtime-effect-shader/requirements.yaml Eval 19 requirements.
evals/skia/19-rn-skia-runtime-effect-shader/reference/App.tsx Eval 19 reference implementation.
evals/skia/20-rn-skia-canvas-snapshot/prompt.md Eval 20 prompt.
evals/skia/20-rn-skia-canvas-snapshot/requirements.yaml Eval 20 requirements.
evals/skia/20-rn-skia-canvas-snapshot/reference/App.tsx Eval 20 reference implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread runner/utils/opencode.ts
Comment thread evals/skia/03-rn-skia-path-drawing/reference/App.tsx Outdated
Comment thread evals/skia/04-rn-skia-paint-stroke-fill/reference/App.tsx Outdated
Comment thread evals/skia/README.md Outdated
@artus9033
Copy link
Copy Markdown
Contributor

artus9033 commented May 21, 2026

CC @wcandillon 👀

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 101 out of 101 changed files in this pull request and generated 7 comments.

Comment thread runner/utils/opencode.ts
Comment thread runner/solver/index.ts
Comment thread runner/evaluators/llm/judge-client.ts
Comment thread evals/skia/README.md
Comment thread evals/skia/README.md
Comment thread evals/skia/21-rn-skia-sweep-gradient/reference/App.tsx
Comment thread evals/skia/22-rn-skia-group-layer-effect/reference/App.tsx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval-content Eval category content

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants