Skip to content

feat(cli): capture-video on-demand fetcher + capture pipeline robustness#1447

Open
ukimsanov wants to merge 1 commit into
mainfrom
feat/cli-capture-video
Open

feat(cli): capture-video on-demand fetcher + capture pipeline robustness#1447
ukimsanov wants to merge 1 commit into
mainfrom
feat/cli-capture-video

Conversation

@ukimsanov

@ukimsanov ukimsanov commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

What

  • New hyperframes capture video <project> subcommand — on-demand mp4 fetch from the capture manifest.
  • Capture pipeline robustness: structural logo signals, content-hash SVG slugs, SVG → PNG rasterization for Gemini Vision, regex-escape fix.
  • New lintMissingLocalAsset rule — flags <video> / <img> / <source> src pointing at missing local files.

Why

heygen-logo.svg on main today contains Google's wordmark — label-derived slugs picked the wrong DOM context. The new website-to-video skill inherits this and would ship off-brand videos. Same root for: class-substring isLogo catching 0/32 SVGs on heygen.com, Vision hallucinating wordmarks from raw SVG path text, and capture being 100% blocked by a \/ regex collapse in page.evaluate.

How

  • capture video: nested under capture per the team's auth/cloud/lambda subcommand pattern. safeFetch (SSRF + redirect-hop revalidation), 250 MB cap, content-type whitelist, flag: "wx" exclusive-create write, lookup by entry.index (not array offset — manifest can have gaps).
  • Content-hash SVG slugs (svg-<8-char-sha1>.svg) so filename can never drift from content.
  • sharp rasterizes SVGs to PNG before Vision; polarity detection flips white-glyph SVGs to a dark background.
  • lintMissingLocalAsset masks <!-- --> / <style> / <script> ranges before scanning so commented-out examples aren't false positives.

Test plan

  • bun run --filter @hyperframes/cli typecheck + test pass (17 capture/video cases + 51 lintProject cases)
  • Manual: captured 9 sites (heygen, vercel, stripe, linear, notion, huggingface, remotion, posthog, anthropic, airbnb) — all clean; brand mark findable via description grep on each
  • Manual: hyperframes capture video <dir> --list + --index 0 downloads mp4 from heygen + notion + webm from airbnb — all valid per file(1)

Comment thread packages/cli/src/commands/capture/video.ts Fixed
@ukimsanov ukimsanov force-pushed the feat/cli-capture-video branch 4 times, most recently from e4e8b44 to b56b7c5 Compare June 14, 2026 23:47
Comment thread packages/cli/src/cli.ts Outdated
Comment on lines +137 to +138
capture: () => import("./commands/capture.js").then((m) => m.default),
"capture-video": () => import("./commands/capture-video.js").then((m) => m.default),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if capture-video could be done via a subcommand of capture?

capture video

or capture --video

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah actually capture video should work

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me do that real quick

@ukimsanov ukimsanov force-pushed the feat/cli-capture-video branch from b56b7c5 to 8404517 Compare June 15, 2026 05:23
const buf = await fetchToBuffer(entry.url);
// `flag: "wx"` is exclusive-create — atomic check-and-create, no TOCTOU window.
try {
writeFileSync(outPath, buf, { flag: "wx" });
@ukimsanov ukimsanov force-pushed the feat/cli-capture-video branch from 8404517 to 51c9c6c Compare June 15, 2026 05:34
For the hyperframes.dev website-to-video flow. Real-AI-test runs against
heygen.com, huly.io, and heygen-showcase surfaced two gaps: (1) capture's
logo / asset-captioning signals missed modern React/Tailwind builds; and
(2) there was no CLI surface to pull the videos the manifest references.

New command:

  • `hyperframes capture-video <project>` — on-demand downloader for
    entries in capture/extracted/video-manifest.json. Capture writes the
    manifest + preview PNGs but skips the mp4s; this pulls one entry by
    `--index N` (matched against the entry's `index` field, NOT array
    offset — gaps are possible when a preview screenshot fails). SSRF-safe
    via safeFetch, 250 MB cap, content-type whitelist, race-free
    exclusive-create write. Layout-aware (handles both standalone capture
    and W2H project layouts).

Capture pipeline fixes:

  • Structural logo signals (assetCataloger + tokenExtractor): inBanner /
    inHomeLink / matchesTitleBrand. Class-substring alone caught 0/32 SVGs
    on heygen.com — modern builds don't put 'logo' / 'brand' in any
    className.

  • Content-hash SVG slugs (assetDownloader): `svg-<8char-sha1>.svg` —
    label-derived slugs mis-attributed partner-logo carousels
    (heygen-logo.svg actually contained Google, hubspot-logo.svg contained
    Trivago, etc.). Content-hash names are invariant by construction.

  • SVG → PNG rasterization before Gemini Vision (contentExtractor): the
    raw-SVG-as-text path was hallucinating wordmarks (VIVIENNE for HubSpot,
    'wrestling' for Workday). Adds polarity detection so a white-glyph SVG
    flattened to a blank PNG gets inverted before captioning. LOGO tag in
    asset-descriptions.md when structural signals fire (independent of
    Gemini key presence).

  • Double-escape \/ inside the page.evaluate template literal in
    assetCataloger + tokenExtractor: the original `/^https?:\/\/.../`
    collapsed to `/` mid-template and threw `Unexpected token ^`. Capture
    was 100% blocked on this until the escape was fixed.

  • `asset-descriptions.md` header branches on Gemini-key presence with
    an explicit 'Vision OFF — catalog-derived descriptions' warning.

New lint rule:

  • `lintMissingLocalAsset` (cli/utils/lintProject): scans <video> / <img>
    / <source> src for local files that don't exist in the project.
    Empirically the most common sub-agent mistake across multi-URL runs
    (~5+ per run). Uses `resolveExistingLocalAsset` so the existence check
    matches the bundler's notion of 'resolves'. Masks comment / style /
    script ranges before scanning so a literal `<img src=missing.png>`
    inside a tutorial comment isn't reported.

Tests: 17 new for capture-video (safeFilename decoding/sanitization,
VIDEO_CONTENT_TYPE_RE accept/reject, pickManifestEntry index-field lookup
with gaps, URL-mismatch + bad-index rejection, --index over --url
priority); 70 cases under lintProject.test.ts covering the new rule and
existing rules.

Sibling PRs in this stack:
  • #PR_A1 — fix(producer): __dirname ESM banner shim
  • #PR_A2 — fix(core/lint): findRootTag masks comment/style/script
@ukimsanov ukimsanov force-pushed the feat/cli-capture-video branch from 51c9c6c to 343d5ac Compare June 15, 2026 05:43
@ukimsanov ukimsanov requested a review from jrusso1020 June 15, 2026 05:49

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified at HEAD 343d5ac — layering on @via's review at b56b7c5 and James's inline ask.

Verdict: approve-with-concerns. James's "make it a capture subcommand" landed cleanly; 2 of @via's 6 items resolved, 2 partial, 2 open. None blocking, but the buffering-vs-streaming gap is worth a follow-up since it shifts from "cosmetic duplication" to "actual OOM surface" on a hostile CDN.


Status table

James's askcapture-videocapture video subcommand

subCommands: { video: ... } wired in capture.ts; "capture-video" entry removed from cli.ts registry; file moved to commands/capture/video.ts; example updated to hyperframes capture video ./linear-video --index 0. Clean follow-through.

@via's 6 findings at HEAD

# Finding Status Notes
1 Duplicate download primitive (buffers full 250 MB via arrayBuffer() vs cloud/download.ts:downloadToFile's streaming) ❌ open Still Buffer.from(await r.arrayBuffer()) at video.ts:55. See concern below — this is now a real safety gap, not just a duplication.
2 run() untested (manifest discovery, --list, EEXIST short-circuit, exit codes) ❌ open video.test.ts still covers 4 helpers; integration surface uncovered. Acceptable for an author-time CLI but worth a follow-up.
3 maskNonScannableRanges masking has no regression test ✅ resolved New case at lintProject.test.ts: "does not flag <img>/<video> tokens inside <!-- -->, <style>, or <script>". Good.
4 SVG polarity light/dark fill regex asymmetric ⚠️ partial Light glyph regex broadened to #fff(fff)?|white|#[ef][ef][ef]|#[ef]{6} — now catches #eee, #efe, etc. Still misses CSS named colors (gainsboro, silver), rgb()/hsl(), and class-driven fills. Light/dark sides still asymmetric in shape, but the heygen-on-#eee failure mode @via flagged is now caught.
5 Sharp failure swallows to empty caption silently ⚠️ partial Now increments svgsSkipped and emits progress("design", "skipped rasterizing N SVG(s) — fell back to label-derived") after the loop. User learns something failed. Open: bare catch {} still drops the actual error, so when one user reports "Vision says X" you can't diagnose which SVG broke or why. Suggest logging err.message per skipped file at progress("design", ...) verbosity.
6 CodeQL network data written to file needs suppression w/ rationale ✅ resolved video.ts:223 now reads // lgtm[js/http-to-file-access] — manifest-vetted URL, content-type whitelist, 250MB cap, SSRF-safe fetch. Rationale explicit.

My additional findings (independent read)

Concerns

video.ts:55Buffering 250 MB before checking byteLength is a real OOM surface, not just cosmetic. MAX_VIDEO_BYTES content-length pre-check (line ~38) is advisory: a hostile or misconfigured CDN can send content-length: 100 and then stream 2 GB. The post-arrayBuffer() byteLength check fires AFTER node has buffered the whole response — process memory has already spiked. Streaming with a running byte cap (the shape cloud/download.ts:downloadToFile uses) would refuse mid-stream. This strengthens @via's #1 from "duplicate primitive" to "shared streaming primitive would actually be safer." Not blocking for v1 against trusted CDNs, but flag for a follow-up before this is used against arbitrary user-controlled manifests.

video.ts:217safeFilename is non-injective; collisions silently EEXIST. decodeURIComponent + [^A-Za-z0-9._-]+_ collapse means "hero 1.mp4" and "hero_1.mp4" both produce "hero_1.mp4". If a manifest has two such entries, the second writeFileSync({ flag: "wx" }) throws EEXIST and the code logs "already downloaded: ... (skipping)" — misleading because the bytes on disk are entry A's, not entry B's. The PR's own test covers "a b___c" → "a_b___c", which proves the collapse is lossy. Suggest prefixing the filename with entry.index (e.g. 0-hero_1.mp4) so the manifest's own gap-tolerant key disambiguates on disk too.

commands/capture.ts:56Manifest schema is duck-typed. JSON.parse then directly indexed via manifest.map((e) => e.index) etc. If capture's manifest shape ever changes (extra fields fine, but renamed indexentryIndex, or wrapped in { entries: [...] }), capture-video silently breaks or misreports. No schemaVersion on the manifest, no shape check on read. Author-time tool, so a runtime crash is acceptable — but a one-line "expected manifest version N" check on read would future-proof this cheaply.

Nits

commands/capture.ts:56if (url === "video") return; early-return relies on citty's "parent run fires after subcommand routing" behavior. Comment is helpful but consider: if a user ever captures https://video.example.com and the URL is normalized somewhere upstream, this could short-circuit incorrectly. Worth a if (args._?.[0] === "video") return;-style guard against subCommand name only (not URL value). (nit)

contentExtractor.ts:285 — bare catch {} (no (err)) means sharp's real failure cause is unrecoverable for diagnosis. Combined with the new svgsSkipped count, you get "I skipped 3 SVGs" but no breadcrumb to which SVGs or why. Pair with @via's #5 fix: at minimum catch (err) and emit a progress("design", ...) line per file. (nit, builds on Via #5)

video.ts:223 — synchronous writeFileSync of a 250 MB Buffer blocks the event loop for the duration of the disk write. Fine for a CLI, but if this primitive ever gets reused server-side, prefer the fs/promises writeFile async version. (nit, forward-looking)

Questions

video.ts:38application/octet-stream in the accept-list is broad. A lot of static-asset CDNs do send application/octet-stream for .mp4, so this is pragmatic — but it also means any 500-page HTML error served with that mimetype passes the check. Do you have a fallback "first 4 bytes are an mp4/webm/mov magic header" sniff, or is the URL trust model strong enough that magic-byte verification isn't worth the complexity?

• Cross-package coupling — @via flagged the download primitive duplication. Are there other ones emerging? sharp is now used in contentExtractor.ts for SVG rasterization; is it used anywhere else in the cli/core packages? If so, a packages/core/src/imageOps.ts-shaped landing zone might be worth carving before the third caller appears.


What I didn't verify

• I didn't run the new test suite locally — assumed the green CI claim in the PR body is accurate.
• I didn't manually exercise capture video against a real captured project; trusted the manual test plan in the PR body and the helper-level unit coverage.
• I didn't trace the new beats CLI surface that landed alongside this PR (visible in the b56b7c5..343d5ac compare diff) — it's adjacent feature work, not the capture-video review scope. Worth a separate look if it's in this PR's surface.

Review by Rames D Jusso

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants