Extract Van Gogh Paintings Carousel by Simar-malhotra09 · Pull Request #389 · serpapi/code-challenge

Simar-malhotra09 · 2026-06-17T22:09:06Z

Extract Van Gogh Paintings Carousel

Ruby parser that reads files/van-gogh-paintings.html and extracts each painting's:

Name
Extensions
Link
Image

The output matches the provided expected-array.json exactly.

All data is parsed from the local HTML file. No network requests are made.

Approach

Identifying carousel items

The main challenge is determining which elements belong to the carousel.

Google's CSS class names are generated and obfuscated (pgNMRc, iELo6, etc.). They do not carry meaning and can change at any time, so the parser does not rely on them.

Instead, carousel items are identified using the stick= query parameter in the link URL. Any <a> element with a /search?...&stick=... URL that wraps an <img> is treated as a carousel item.

This relies on Google's search URL structure rather than presentation specific CSS classes, making it a more stable signal.

Extracting metadata

The painting name and year are read from the innermost text nodes within each anchor element.

The year is only added to extensions when it matches a four digit year:

/\A\d{4}\z/

This prevents title fragments from being incorrectly included in the extensions field.

Image extraction

The HTML contains two thumbnail formats.

Images embedded in scripts

The first set of paintings stores image data inside inline <script> blocks through _setImagesSrc(ii, s) calls.

During initialization, the parser:

Extracts the image data from the scripts
Decodes escaped characters such as \x3d to =
Builds an ID to image lookup table

Images from `data-src`

The remaining paintings are lazy loaded and expose the image URL directly through the data-src attribute.

These images are read directly from the corresponding <img> elements.

Validation

The parser was tested against three carousel layouts:

File	Items	Content Type
`files/van-gogh-paintings.html`	47	Artworks, exact match with provided JSON
`spec/fixtures/deniro-movies.html`	12	Movies
`spec/fixtures/shinkai-books.html`	12	Books

Running

bin/setup
# Verifies Ruby >= 3.1 and installs dependencies

bundle exec bin/parse files/van-gogh-paintings.html
# Outputs JSON to stdout

bundle exec rspec
# 16 examples, 0 failures

Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

…resolver Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

…d schema Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

…d spec Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

…lity Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

Simar-malhotra09 added 8 commits June 17, 2026 17:45

build(deps): add nokolexbor and rspec, pin ruby 4.0.1

a9b7c2d

Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

feat(parser): implement structure-based carousel extractor and image …

dc629d6

…resolver Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

test(spec): add RSpec suite with exact-match and structural assertions

e5fd29e

Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

fix(parser): omit extensions and image keys when nil to match expecte…

30c4d0b

…d schema Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

test(fixtures): add De Niro movies and Shinkai books carousels, exten…

78e8550

…d spec Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

feat(bin): add parse entry point, outputs carousel JSON to stdout

d07b267

Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

fix(bin): add bundler/setup, drop BUNDLED WITH constraint for portabi…

5c33109

…lity Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

feat(bin): add setup script with ruby version guard

cfa8d9b

Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract Van Gogh Paintings Carousel#389

Extract Van Gogh Paintings Carousel#389
Simar-malhotra09 wants to merge 8 commits into
serpapi:masterfrom
Simar-malhotra09:master

Simar-malhotra09 commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Simar-malhotra09 commented Jun 17, 2026

Extract Van Gogh Paintings Carousel

Approach

Identifying carousel items

Extracting metadata

Image extraction

Images embedded in scripts

Images from data-src

Validation

Running

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Images from `data-src`