From 6a024a367b007d837827e9a5d01cdb2ce76619d5 Mon Sep 17 00:00:00 2001 From: ukimsanov Date: Sun, 14 Jun 2026 15:02:08 -0700 Subject: [PATCH] feat(cli): capture-video on-demand fetcher + capture pipeline robustness MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit For the hyperframes.dev website-to-video flow. Real-AI-test runs against heygen.com, huly.io, and heygen-showcase surfaced two gaps: (1) capture's logo / asset-captioning signals missed modern React/Tailwind builds; and (2) there was no CLI surface to pull the videos the manifest references. New command: • `hyperframes capture-video ` — on-demand downloader for entries in capture/extracted/video-manifest.json. Capture writes the manifest + preview PNGs but skips the mp4s; this pulls one entry by `--index N` (matched against the entry's `index` field, NOT array offset — gaps are possible when a preview screenshot fails). SSRF-safe via safeFetch, 250 MB cap, content-type whitelist, race-free exclusive-create write. Layout-aware (handles both standalone capture and W2H project layouts). Capture pipeline fixes: • Structural logo signals (assetCataloger + tokenExtractor): inBanner / inHomeLink / matchesTitleBrand. Class-substring alone caught 0/32 SVGs on heygen.com — modern builds don't put 'logo' / 'brand' in any className. • Content-hash SVG slugs (assetDownloader): `svg-<8char-sha1>.svg` — label-derived slugs mis-attributed partner-logo carousels (heygen-logo.svg actually contained Google, hubspot-logo.svg contained Trivago, etc.). Content-hash names are invariant by construction. • SVG → PNG rasterization before Gemini Vision (contentExtractor): the raw-SVG-as-text path was hallucinating wordmarks (VIVIENNE for HubSpot, 'wrestling' for Workday). Adds polarity detection so a white-glyph SVG flattened to a blank PNG gets inverted before captioning. LOGO tag in asset-descriptions.md when structural signals fire (independent of Gemini key presence). • Double-escape \/ inside the page.evaluate template literal in assetCataloger + tokenExtractor: the original `/^https?:\/\/.../` collapsed to `/` mid-template and threw `Unexpected token ^`. Capture was 100% blocked on this until the escape was fixed. • `asset-descriptions.md` header branches on Gemini-key presence with an explicit 'Vision OFF — catalog-derived descriptions' warning. New lint rule: • `lintMissingLocalAsset` (cli/utils/lintProject): scans