feat(evals): add skia category with 20 react-native-skia evals#377
Open
andriicallstack wants to merge 15 commits into
Open
feat(evals): add skia category with 20 react-native-skia evals#377andriicallstack wants to merge 15 commits into
andriicallstack wants to merge 15 commits into
Conversation
Contributor
|
I'll take a look at the runner changes. @lech-kalinowski can you take a look at those opencode-related changes? |
artus9033
reviewed
May 20, 2026
There was a problem hiding this comment.
Pull request overview
Adds a new evals/skia category to expand the benchmark suite with 20 focused React Native Skia evaluations, and updates the runner’s OpenCode integration to be more robust (JSON extraction middleware + server reuse/config forwarding).
Changes:
- Add
evals/skiacategory README plus 20 evals (each with prompt, requirements, and reference implementation). - Update solver and judge LLM clients to use
extractJsonMiddlewareviawrapLanguageModelto handle fenced/embedded JSON. - Improve OpenCode server startup to reuse an existing server and forward
ANTHROPIC_API_KEYwhen spawning.
Reviewed changes
Copilot reviewed 64 out of 64 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| runner/utils/opencode.ts | Reuse existing OpenCode server on a port; add default port and provider config forwarding. |
| runner/solver/index.ts | Apply JSON-extraction middleware to solver model calls. |
| runner/evaluators/llm/judge-client.ts | Apply JSON-extraction middleware to judge model calls. |
| evals/skia/README.md | Category documentation: best-practice inventory, traceability matrix, and issue clusters. |
| evals/skia/01-rn-skia-canvas-fill-background/prompt.md | Eval 01 prompt. |
| evals/skia/01-rn-skia-canvas-fill-background/requirements.yaml | Eval 01 requirements. |
| evals/skia/01-rn-skia-canvas-fill-background/reference/App.tsx | Eval 01 reference implementation. |
| evals/skia/02-rn-skia-shape-primitives/prompt.md | Eval 02 prompt. |
| evals/skia/02-rn-skia-shape-primitives/requirements.yaml | Eval 02 requirements. |
| evals/skia/02-rn-skia-shape-primitives/reference/App.tsx | Eval 02 reference implementation. |
| evals/skia/03-rn-skia-path-drawing/prompt.md | Eval 03 prompt. |
| evals/skia/03-rn-skia-path-drawing/requirements.yaml | Eval 03 requirements. |
| evals/skia/03-rn-skia-path-drawing/reference/App.tsx | Eval 03 reference implementation. |
| evals/skia/04-rn-skia-paint-stroke-fill/prompt.md | Eval 04 prompt. |
| evals/skia/04-rn-skia-paint-stroke-fill/requirements.yaml | Eval 04 requirements. |
| evals/skia/04-rn-skia-paint-stroke-fill/reference/App.tsx | Eval 04 reference implementation. |
| evals/skia/05-rn-skia-linear-gradient/prompt.md | Eval 05 prompt. |
| evals/skia/05-rn-skia-linear-gradient/requirements.yaml | Eval 05 requirements. |
| evals/skia/05-rn-skia-linear-gradient/reference/App.tsx | Eval 05 reference implementation. |
| evals/skia/06-rn-skia-radial-gradient/prompt.md | Eval 06 prompt. |
| evals/skia/06-rn-skia-radial-gradient/requirements.yaml | Eval 06 requirements. |
| evals/skia/06-rn-skia-radial-gradient/reference/App.tsx | Eval 06 reference implementation. |
| evals/skia/07-rn-skia-image-display/prompt.md | Eval 07 prompt. |
| evals/skia/07-rn-skia-image-display/requirements.yaml | Eval 07 requirements. |
| evals/skia/07-rn-skia-image-display/reference/App.tsx | Eval 07 reference implementation. |
| evals/skia/08-rn-skia-text-rendering/prompt.md | Eval 08 prompt. |
| evals/skia/08-rn-skia-text-rendering/requirements.yaml | Eval 08 requirements. |
| evals/skia/08-rn-skia-text-rendering/reference/App.tsx | Eval 08 reference implementation. |
| evals/skia/09-rn-skia-blur-filter/prompt.md | Eval 09 prompt. |
| evals/skia/09-rn-skia-blur-filter/requirements.yaml | Eval 09 requirements. |
| evals/skia/09-rn-skia-blur-filter/reference/App.tsx | Eval 09 reference implementation. |
| evals/skia/10-rn-skia-color-matrix-filter/prompt.md | Eval 10 prompt. |
| evals/skia/10-rn-skia-color-matrix-filter/requirements.yaml | Eval 10 requirements. |
| evals/skia/10-rn-skia-color-matrix-filter/reference/App.tsx | Eval 10 reference implementation. |
| evals/skia/11-rn-skia-reanimated-basic-animation/prompt.md | Eval 11 prompt. |
| evals/skia/11-rn-skia-reanimated-basic-animation/requirements.yaml | Eval 11 requirements. |
| evals/skia/11-rn-skia-reanimated-basic-animation/reference/App.tsx | Eval 11 reference implementation. |
| evals/skia/12-rn-skia-derived-value-animation/prompt.md | Eval 12 prompt. |
| evals/skia/12-rn-skia-derived-value-animation/requirements.yaml | Eval 12 requirements. |
| evals/skia/12-rn-skia-derived-value-animation/reference/App.tsx | Eval 12 reference implementation. |
| evals/skia/13-rn-skia-animated-color-interpolation/prompt.md | Eval 13 prompt. |
| evals/skia/13-rn-skia-animated-color-interpolation/requirements.yaml | Eval 13 requirements. |
| evals/skia/13-rn-skia-animated-color-interpolation/reference/App.tsx | Eval 13 reference implementation. |
| evals/skia/14-rn-skia-gesture-pan/prompt.md | Eval 14 prompt. |
| evals/skia/14-rn-skia-gesture-pan/requirements.yaml | Eval 14 requirements. |
| evals/skia/14-rn-skia-gesture-pan/reference/App.tsx | Eval 14 reference implementation. |
| evals/skia/15-rn-skia-transforms/prompt.md | Eval 15 prompt. |
| evals/skia/15-rn-skia-transforms/requirements.yaml | Eval 15 requirements. |
| evals/skia/15-rn-skia-transforms/reference/App.tsx | Eval 15 reference implementation. |
| evals/skia/16-rn-skia-clip-rect-and-path/prompt.md | Eval 16 prompt. |
| evals/skia/16-rn-skia-clip-rect-and-path/requirements.yaml | Eval 16 requirements. |
| evals/skia/16-rn-skia-clip-rect-and-path/reference/App.tsx | Eval 16 reference implementation. |
| evals/skia/17-rn-skia-blend-mode/prompt.md | Eval 17 prompt. |
| evals/skia/17-rn-skia-blend-mode/requirements.yaml | Eval 17 requirements. |
| evals/skia/17-rn-skia-blend-mode/reference/App.tsx | Eval 17 reference implementation. |
| evals/skia/18-rn-skia-svg-path-rendering/prompt.md | Eval 18 prompt. |
| evals/skia/18-rn-skia-svg-path-rendering/requirements.yaml | Eval 18 requirements. |
| evals/skia/18-rn-skia-svg-path-rendering/reference/App.tsx | Eval 18 reference implementation. |
| evals/skia/19-rn-skia-runtime-effect-shader/prompt.md | Eval 19 prompt. |
| evals/skia/19-rn-skia-runtime-effect-shader/requirements.yaml | Eval 19 requirements. |
| evals/skia/19-rn-skia-runtime-effect-shader/reference/App.tsx | Eval 19 reference implementation. |
| evals/skia/20-rn-skia-canvas-snapshot/prompt.md | Eval 20 prompt. |
| evals/skia/20-rn-skia-canvas-snapshot/requirements.yaml | Eval 20 requirements. |
| evals/skia/20-rn-skia-canvas-snapshot/reference/App.tsx | Eval 20 reference implementation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ady up, deintegrate provider-specific code
32b7003 to
c44d24d
Compare
Contributor
|
CC @wcandillon 👀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new
skiaeval category with 20 evaluations covering the core@shopify/react-native-skiaAPI surface.Each eval includes a focused prompt, atomic requirements, and a reference implementation. A category
README.mdwith a best-practice inventory and eval traceability matrix is also included.Evals added
canvas-fill-backgroundCanvas,Fill,useCanvasSizeshape-primitivesRect,Circle,RoundedRect,Linepath-drawingPath,Skia.Path.Make()paint-stroke-fillPaint, stroke vs filllinear-gradientLinearGradient,vecradial-gradientRadialGradientimage-displayImage,useImagetext-renderingText,matchFontblur-filterBlurcolor-matrix-filterColorMatrixreanimated-basic-animationuseSharedValue,withRepeat,withTimingderived-value-animationuseDerivedValueanimated-color-interpolationinterpolateColorsgesture-panGestureDetector,Gesture.Pantransformstransform,Groupclip-rect-and-pathClipRect,ClipPathblend-modeblendModesvg-path-renderingSkia.Path.MakeFromSVGStringruntime-effect-shaderSkia.RuntimeEffect.Make, GLSLcanvas-snapshotuseCanvasRef,makeImageSnapshotRunner fixes
extractJsonMiddlewarein both the solver and judge client to handle Claude wrapping JSON responses in markdown code fences (AI_NoObjectGeneratedError)ensureOpencodeServerStartedto reuse an already-running OpenCode server instead of attempting a duplicate startup, and forwardANTHROPIC_API_KEYto newly spawned serversBaseline scores
Solver and judge:
anthropic/claude-sonnet-4-5Average: ~88%