Skip to content

fix: improve Android snapshot fidelity#580

Open
thymikee wants to merge 4 commits into
mainfrom
codex/android-snapshot-fidelity
Open

fix: improve Android snapshot fidelity#580
thymikee wants to merge 4 commits into
mainfrom
codex/android-snapshot-fidelity

Conversation

@thymikee
Copy link
Copy Markdown
Member

@thymikee thymikee commented May 21, 2026

Summary

Improve Android snapshot -i and screenshot --overlay-refs fidelity by replacing Android-helper-specific noise cleanup with generic row promotion/collapse heuristics and by making Android overlay refs use screenshot-pixel coordinates directly.

Before/after cases checked during verification:

  • Android Settings row overlays: before, refs for rows such as Google / Storage could draw boxes that were too tall or misaligned with the screenshot. After, Android uiautomator bounds are treated as screenshot pixels and unlabeled row containers are visually trimmed around their visible content, while overlayRefs[].rect still reports the underlying accessibility bounds.
  • Test app bottom tabs: before, unlabeled Android ViewGroup tab controls were visible but did not get clickable overlay refs. After, bounded unlabeled clickable controls get refs without inventing labels, so agents can target bottom navigation from the screenshot.
  • Android Settings / YouTube list rows: before, unlabeled clickable rows exposed text/icon children as separate noisy nodes, and some row text was easy to miss. After, passive contained text is promoted into the row label, decorative child nodes collapse, and trailing actionable controls remain separate.
  • React Native list rows: before, parallel image/text/button descendants produced ambiguous repeated entries such as separate Adam / Hello from Adam controls inside the same row. After, repeated descendants collapse into the row when they are contained in the same hittable target, preserving distinct sibling controls.
  • Scrollable lists: before, hidden-content hints could be ambiguous when depth was flattened. After, [content above scroll-area hidden] renders at the top of the scroll area and [content below scroll-area hidden] renders at the bottom.
  • Raw fidelity: snapshot --raw still bypasses the Android presentation filters, so the full helper hierarchy remains available for debugging and exact inspection.

Why this is better:

The interactive snapshot is smaller and more agent-oriented without reducing raw information fidelity. Agents see fewer duplicate passive nodes, clearer row-level labels, clickable screenshot refs for visible unlabeled controls, and less ambiguous scroll context. The cleanup stays generic rather than baking in app-specific cases like email rows or composer-adjacent bottom navigation.

Scope: 18 touched files. Android snapshot presentation, screenshot overlay refs, shared snapshot output formatting, focused tests, and SkillGym runner-environment guard/docs.

Validation

Focused tests passed:

  • pnpm exec vitest run src/utils/__tests__/output.test.ts
  • pnpm exec vitest run src/daemon/__tests__/screenshot-overlay.test.ts
  • node --test test/skillgym/runner-environment.test.ts
  • pnpm check:quick
  • pnpm build
  • git diff --check

Manual verification covered Android Settings, built-in Android apps, YouTube, Expensify with Metro prepared on port 8082, and the test app across more than six screens with scrolling, snapshot -i, snapshot --raw, and screenshot --overlay-refs comparisons.

SkillGym:

  • Single-case run across both configured runners reproduced the sandbox issue: pnpm exec skillgym run ./test/skillgym/suites/agent-device-smoke-suite.ts --config ./test/skillgym/skillgym.config.ts --case open-and-snapshot failed before assertions because both external runners crashed in the network-restricted Codex sandbox.
  • Added a SkillGym runner-environment guard so pnpm test:skillgym, pnpm test:skillgym:case open-and-snapshot, and direct skillgym run --config ./test/skillgym/skillgym.config.ts fail fast in CODEX_SANDBOX_NETWORK_DISABLED=1 with instructions to run from a normal authenticated local shell.
  • The same guard is bypassed in normal local shells, and can be explicitly overridden with SKILLGYM_ALLOW_EXTERNAL_RUNNERS_IN_SANDBOX=1 for sandboxes that have approved network access.
  • Confirmed sandbox behavior with pnpm test:skillgym:case open-and-snapshot and direct skillgym run --case open-and-snapshot; both now stop before build/runner launch with the explanatory message.

Known local gap: pnpm check:unit still fails locally on the pre-existing MP4 fallback test in src/utils/__tests__/video.test.ts; sandbox-only listener failures cleared when rerun outside the sandbox.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstackincubator.github.io/agent-device/pr-preview/pr-580/

Built to branch gh-pages at 2026-05-22 12:16 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@thymikee thymikee force-pushed the codex/android-snapshot-fidelity branch 3 times, most recently from ec3e5c5 to 3645382 Compare May 21, 2026 19:26
@thymikee thymikee force-pushed the codex/android-snapshot-fidelity branch from 3645382 to 5c2d9f6 Compare May 22, 2026 05:50
Copy link
Copy Markdown
Member Author

Code review

Overview

Refactors Android helper snapshot presentation into a new src/utils/android-helper-presentation/ module set and reworks Android screenshot-overlay ref generation. The headline correctness change — making Android overlay rects use raw uiautomator pixel coordinates instead of scaling through snapshotBounds — is sound: parseBounds (src/platforms/android/ui-hierarchy.ts) yields absolute screen-pixel rects ([x1,y1][x2,y2]), the same coordinate space as the screenshot PNG, so skipping the bounds-based scale factor in projectRectToScreenshot is correct. The raw-snapshot-preservation claim also holds: the text filters only mutate a removed set / replacements map and rebuild a new array via .filter().map() with spreads (never mutating original nodes), and the raw path bypasses presentation entirely in output.ts.

The new heuristics are intricate but generally well-structured. A few asymmetries and behavioral reductions deserve attention.

Findings

Bugs

1 — Passive descendants are removed without label promotion (asymmetric → can silently drop visible text). (structural-noise.ts, markUnlabeledActionRowsForPromotion) — confirmed by reading the file.
collectPromotableRowLabels only promotes a descendant's label when isRectContainedBy(descendant.rect, parent.rect) is true (and it's passive). But the removal loop removes every isPassiveRowContent(descendant) regardless of containment, and isPassiveRowContent does not look at the rect at all:

for (const descendant of descendants) {
  if (isPassiveRowContent(descendant)) {
    markNodeAndDescendantsForRemoval(nodes, descendant.index, removed);
  }
}

So a passive text/image child whose rect falls outside the parent bounds — or that is rectless (isRectContainedBy returns false for it) — gets deleted from the agent-facing snapshot without its text promoted into the parent label. Tests only cover the contained case. Recommend gating removal on the same containment predicate used for promotion (or only removing descendants whose label was actually collected).

Issues to verify

2 — rect field semantics change on Android. (screenshot-overlay.ts, candidate construction) The candidate's rect is now set to overlaySourceRect (the trimmed/balanced row rect) instead of target.rect. For non-Android backends resolveOverlaySourceRect returns target.rect!, so only Android changes. Combined with the no-scale change, Android rect and overlayRect are now effectively identical pixel rects, and the reported rect no longer reflects the true accessibility-element bounds for trimmed rows. Downstream consumers I checked use overlayRect/center (e.g. screenshot-diff-overlay-matches.ts), so this is likely safe — please confirm nothing relies on rect being the untrimmed element bounds.

3 — Overlay labels can now be undefined; suppression dedupes on equal undefined. (resolveOverlayLabel dropped the ?? resolveRefLabel(...) fallback; Android now emits unlabeled clickable sources.) suppressContainedCandidates dedupes when current.label === candidate.label AND one rect contains the other. With multiple unlabeled controls, undefined === undefined is true, so two overlapping/nested unlabeled Android controls would be collapsed to one. The bottom-tab test passes only because the tabs don't overlap. Worth verifying this doesn't hide nested unlabeled controls.

4 — Loss of [likely navigation] and email-dedup signals. The removed markBottomNavNodesNearComposerForRemoval and markDuplicateEmailButtonsForRemoval eliminate the previously-emitted [likely navigation] hint and the email-button dedup (output.test.ts was updated to no longer expect [likely navigation]). Confirm these are intentionally subsumed by the new collapsing/promotion paths rather than accidental regressions.

5 — More aggressive collapsing of existing scenarios. The RN-row test changed from "8 visible / Collapsed 2" to "6 visible / Collapsed 4", merging what were previously separate @e5 [button] "Adam" / @e7 [button] "Hello from Adam" rows into one @e3 [button] "Adam, 9:41 AM, Hello from Adam". This is a real change to agent-facing output for existing inputs — confirm it's the desired product behavior and not over-collapsing that could merge two genuinely separate tap targets.

6 — inferRectlessNodeDirection infers above/below from sibling index order, applied to a possibly-distant scroll ancestor. (android-helper-snapshot-presentation.ts) Direction is derived from the rectless node's direct siblings' index ordering, then attached to the nearest scrollable ancestor. This assumes index order tracks visual order and that local position implies the same above/below for the ancestor scroll area. Usually true for lists, but it's a heuristic that can mislabel — worth a comment.

Nits

  • Two parentIndex resolution styles coexist: the new tree.ts (findAncestor/collectDescendants) and screenshot-overlay-android.ts resolve parents via a Map keyed by node.index, while snapshot-processing.ts findNearestAncestor uses array position (nodes[current.parentIndex]). Both are correct only because reindexSnapshotNodes makes index === array position. This invariant is load-bearing and undocumented — consider standardizing on the index-keyed Map form.
  • Overlay code checks snapshot.backend === 'android' while presentation checks data.androidSnapshot.backend === 'android-helper' — two different fields; the 'android' vs 'android-helper' proximity is easy to confuse, a brief comment would help.
  • unionRects returns null only for empty input, but callers already guard contentRects.length < 2, so the null branch is effectively dead.

Test coverage

The cited tests genuinely exercise most new behavior: Android overlay raw-pixel alignment, unlabeled bottom-tab inclusion (no invented labels), tall-row trimming (balanceAndroidActionRowRect), unlabeled action-row promotion, duplicate collapsing (RN-row + a negative "keeps single repeated child control"), rectless offscreen → above/below scroll hints (incl. flattened-depth placement), and raw-output preservation on every presentation test.

Gaps:

  • No test for Finding fix: skill should work, even if the npm package is not installed #1 (passive descendant outside parent rect, or rectless passive descendant) — the exact case where promotion and removal diverge.
  • No test for Finding bug: --session flag does not take effect #3 (two overlapping/nested unlabeled controls and the undefined === undefined suppression).
  • No test for inferRectlessNodeDirection returning null (rectless node with no rendered siblings → content dropped with no hint).
  • markRepeatedActionRowDescendantsForRemoval's active-control removal branch (repeatedControls.length >= 2 && labels.size >= 2) isn't directly exercised; only the negative < 2 path is tested.
  • The PR description cites snapshot-processing.test.ts, but that file is not in the diff; the only snapshot-processing.ts change (new findNearestAncestor) is covered indirectly via overlay tests, not a dedicated unit test.

Summary

The core coordinate-system fix is correct (Android uiautomator bounds == screenshot pixels, so dropping snapshotBounds scaling is right) and raw output is genuinely preserved. The refactor is reasonable and well-covered by the cited tests. The one thing I'd fix before merge is the asymmetric passive-descendant removal vs. promotion (Finding #1), which can silently drop visible text. Secondary items to confirm are intentional: dropped [likely navigation]/email-dedup signals (#4), more aggressive row collapsing of existing inputs (#5), the rect semantics shift (#2), and the undefined-label suppression edge (#3). The parentIndex index-vs-position invariant is correct today but fragile and undocumented.

Reviewed with Claude Code.


Generated by Claude Code

Copy link
Copy Markdown
Member Author

Addressed the review points in 96ac61fcf.

What changed:

  • Fixed the passive-descendant asymmetry: unlabeled Android action-row promotion now removes only contained passive descendants, so visible text outside the parent bounds is no longer silently dropped.
  • Added regression coverage for that case in output.test.ts.
  • Tightened overlay suppression so nested/overlapping unlabeled Android controls are not deduped just because both labels are absent.
  • Added regression coverage for nested unlabeled overlay controls.
  • Restored overlayRefs[].rect to the underlying accessibility element rect; overlayRect remains the drawn/trimmed screenshot box. This keeps the metadata semantics stable while preserving the shorter visual overlay.
  • Added a comment documenting the rectless offscreen-content direction heuristic.

Confirmed intentional:

  • Dropping [likely navigation] and the email-specific deduper is intentional; those were replaced by generic row promotion/collapse behavior.
  • The more aggressive RN-row collapse is intentional for repeated label/action descendants inside the same hittable row; distinct sibling tap targets remain preserved by the containment and repeated-label checks.

Validation:

  • pnpm exec vitest run src/utils/__tests__/output.test.ts passed.
  • pnpm exec vitest run src/daemon/__tests__/screenshot-overlay.test.ts passed.
  • pnpm check:quick passed.
  • pnpm build passed.
  • git diff --check passed.
  • pnpm check:unit still has the known local MP4 fallback failure in src/utils/__tests__/video.test.ts; the sandbox-only listener failures cleared when rerun outside the sandbox.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant