Skip to content

Implement CSSOM and Enhanced Visibility Filtering#1797

Merged
karlseguin merged 43 commits intomainfrom
css-improvements
Mar 23, 2026
Merged

Implement CSSOM and Enhanced Visibility Filtering#1797
karlseguin merged 43 commits intomainfrom
css-improvements

Conversation

@arrufat
Copy link
Copy Markdown
Contributor

@arrufat arrufat commented Mar 12, 2026

This PR upgrades our CSS Object Model (CSSOM) to support stylesheets and dynamic rules, enabling the Semantic Tree to accurately prune hidden elements.

Core Improvements:

  • CSSOM Support: Implemented insertRule, deleteRule, replace, and replaceSync.
  • Automatic Parsing: Standard <style> tags now automatically parse their content and register rules with the document upon insertion.
  • Visibility Engine: Updated Element.checkVisibility to match elements against all active document stylesheets, not just inline styles.

Impact on Semantic Tree & test_semantic_css.html:

Previously, elements hidden via CSS classes or <style> tags were incorrectly included in the Semantic Tree. With these changes:

  1. Correct Pruning: Elements like .hidden-via-style-tag { display: none; } are now correctly identified as invisible and removed from the tree.
  2. Dynamic JS Support: Rules added via sheet.insertRule() (common in modern web apps) are now respected by the tree walker.
  3. Cleaner Output: Significantly reduces token noise by eliminating "phantom" elements that are technically in the DOM but hidden from the user.

Verification: New JS tests added for CSSStyleSheet APIs and verified pruning in test_semantic_css.html.

test_semantic_css.html
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>CSS Semantic Tree Showcase</title>
</head>
<body>

  <h1>Visible Content</h1>
  <p>This paragraph is visible and should be in the tree.</p>

  <div class="hidden-via-style-tag">
    <p>This should NOT be in the semantic tree because it's hidden via a style tag!</p>
    <button>Secret Button</button>
  </div>

  <div id="hidden-via-js">
    <p>This should NOT be in the semantic tree because it's hidden via JS insertRule!</p>
    <a href="https://example.com">[Secret Link](https://example.com/)</a>
  </div>

  <div class="hidden-via-visibility">
    <p>This should NOT be in the semantic tree because it has visibility: hidden!</p>
    <button>Hidden Button</button>
  </div>

  <div class="hidden-via-opacity">
    <p>This should NOT be in the semantic tree because it has opacity: 0!</p>
    <input type="text" value="Secret Input">
  </div>

  <div class="non-interactive-via-pointer-events">
    <p>This should be in the tree, but the button should NOT be interactive!</p>
    <button>Unclickable Button</button>
  </div>

  <script>
    // 1. Hide using a standard <style> tag.
    // Our new CSS parser automatically reads the text content of this tag
    // and populates the document's stylesheet list.
    const styleEl = document.createElement('style');
    styleEl.textContent = `
      .hidden-via-style-tag { display: none; }
      .hidden-via-visibility { visibility: hidden; }
      .hidden-via-opacity { opacity: 0; }
      .non-interactive-via-pointer-events { pointer-events: none; }
    `;
    document.head.appendChild(styleEl);

    // 2. Hide using JavaScript sheet.insertRule().
    // This API now correctly parses the rule and adds it to the list,
    // where checkVisibility can now find it.
    const dynamicStyle = document.createElement('style');
    document.head.appendChild(dynamicStyle);
    dynamicStyle.sheet.insertRule('#hidden-via-js { display: none; }', 0);
  </script>
</body>
</html>

arrufat added 11 commits March 12, 2026 16:27
Adds a CSS rule parser and implements `insertRule`, `deleteRule`, and
`replaceSync` in `CSSStyleSheet`. Also updates `CSSRuleList` to use
dynamic storage and populates sheets from `<style>` elements.
Refactors `CSSRule` to a union type for better type safety and updates
`CSSStyleRule` to use `CSSStyleProperties`. Adds comprehensive tests for
`insertRule`, `deleteRule`, and `replaceSync`.
Updates `Element.checkVisibility` to iterate through document
stylesheets and check for matching rules with `display: none`.
Also ensures `<style>` elements register their sheets and
initializes them immediately upon addition to the DOM.
@arrufat arrufat marked this pull request as draft March 13, 2026 01:29
@karlseguin
Copy link
Copy Markdown
Collaborator

Didn't go over it in detail, since it's still pending, but two thoughts:

1 - This only handles inline styles, not external. That seems of limited use, but a good first step. Is the goal to eventually support external style sheets too?

2 - checkVisibility can be called a lot, normally through other APIs, e.g. getBoundingClientRect, so if we go down this route, do we need a better-for-querying in-memory data representation of the "computed" styles? It's called +20K times on an amazon product page. (wow, now I want to see if we can optimize the current code).

arrufat added 2 commits March 13, 2026 13:33
Adds support for `pointer-events: none` in interactivity classification
and expands `checkVisibility` to include `visibility` and `opacity`.
Refactors CSS property lookup into a shared helper.
Introduces `CssCache` to store computed CSS properties, avoiding
redundant stylesheet lookups during DOM traversals.
@arrufat arrufat marked this pull request as ready for review March 13, 2026 05:32
@arrufat arrufat marked this pull request as draft March 13, 2026 05:35
@arrufat arrufat force-pushed the css-improvements branch 3 times, most recently from fcdc540 to f37862a Compare March 13, 2026 11:17
@arrufat arrufat marked this pull request as ready for review March 13, 2026 11:17
@krichprollsch
Copy link
Copy Markdown
Member

I agree not loading external stylesheets limits the benefit of the change, but yes, it's a good start.

The question will be: how important is it to load external stylesheets, I suppose it is for accuracy, but it will slow the browser loading...

And yes, thinking of perf for visiblity check is important. I guess it can be done in a following PR?

So since here we are talking only of inline CSS + pure JS changes, the impact of the PR is limited.

Copy link
Copy Markdown
Collaborator

@karlseguin karlseguin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I run

./pandpanda fetch \
 --http_proxy "http://127.0.0.1:3000" \
 --insecure_disable_tls_host_verification \
 "https://www.amazon.com/A_PRODUCT_PAGe"

On main, it takes ~1.5 - 2 seconds, I get a number of errors, but it seems to dump properly.

If I run it on this branch, I get fewer errors, but it takes 5-6 seconds. I'm using our implementation proxy to reduce the impact of random network differences.

I don't think this is a result of checkVisibility now being much slower. I profiled it, and while it shows up, I don't think it explains a 3x slowdown. Given that we get fewer errors now, I imagine we're doing a better job. Which is good, but...Can anyone replicate my numbers and, if so, it would e really nice to know what's going on.

Comment thread src/browser/tests/css/stylesheet.html Outdated
Comment thread src/browser/webapi/css/CSSStyleSheet.zig
Comment thread src/browser/css/Parser.zig
Comment thread src/browser/webapi/css/CSSStyleSheet.zig
Comment thread src/browser/webapi/css/CSSStyleSheet.zig
Comment thread src/browser/webapi/Element.zig Outdated
Comment thread src/browser/webapi/Element.zig Outdated
Comment thread src/browser/webapi/Element.zig Outdated
@arrufat
Copy link
Copy Markdown
Contributor Author

arrufat commented Mar 14, 2026

I am observing a similar behavior: 5 vs 8 seconds, but with fewer errors. Isn't it expected (to a certain point) because we're executing more JS/CSS?

Because we fixed CSSStyleSheet.replace() returning the sheet itself and insertRule() handling edge-cases, Amazon's complex JS bundles are no longer crashing early. They are now successfully executing complex logic and layout queries (like getBoundingClientRect).

Because Element.checkVisibility() currently passes null for the CSS cache, every layout query triggers a full, un-cached linear scan of every CSS rule in the document to run el.matches().

We should definitely try to close the gap, though, probably by implementing a proper layout-scoped CssCache that invalidates on DOM/CSS changes.

@karlseguin karlseguin marked this pull request as draft March 16, 2026 09:57
karlseguin and others added 11 commits March 18, 2026 17:52
A Page now has a StyleManager. The StyleManager currently answers two questions:
1 - Is an element hidden
2 - Does an element have pointer-events == none

This is used in calls such as element.checkVisibility which, on some pages, can
be called tens of thousands of times (often through other methods, like
element.getBoundingClientRect). This _can_ be a bottleneck.

The StyleManager keeps a list of rules. The rules include the selector,
specificity, and properties that we care about. Rules in a stylesheet that
contain no properties of interest are ignored. This is the first and likely
most significant optimization. Presumably, most CSS rules don't have a
display/visibility/opacity or pointer-events property.

The list is rules is cached until stylesheets are modified or delete. When this
happens, the StyleManager is flagged as "dirty" and rebuilt on-demand in the
next query.  This is our second major optimization.

For now, to check if an element is visible, we still need to scan all rules.
But having a pre-build subset of all the rules is a first step.

The next step might be to optimize the matching, or possibly optimizing common
cases (e.g. id and/or simple class selector)
Co-authored-by: Adrià Arrufat <1671644+arrufat@users.noreply.github.com>
StyleManager: restore dirty state on rebuild allocation failure
In the first iteration of this, we kept an ArrayList of all rules with
visibility properties. Why bother evaluating if a rule's selector matches an
element if that rule doesn't have any meanignful (i.e. visibility) properties?

This commit enhances that approach by bucketing the rules. Given the following
selectors:

.hidden {....}
.footer > .small {...}

We can store the rules based on their right-most selector. So, given an element
we can do:

if (getId(el)) |id| {
   const rules = id_lookup.get(id) orelse continue;
   // check rules
}

if (getClasses(el)) |classes| {
   for (classes) |c| {
     const rules = class_lookup(c) orelse continue;
     // chck rules
   }
}
...

On an amazon product page, the total list of visibility-related rules was ~230.
Now, scanning 230 rules for a match isn't _aweful_, but remember that this has
to be done up the ancestor tree AND, for Amazon, this is called over 20K times.

This change requires that the StyleManager becomes more matching/parsing-aware
but a typical visibility check on that same Amazon product page only has to
check 2 rules (down from 230) and often has to check 0 rules.

Also, we now filter out a few more interactive-related pseudo-elements, e.g.
:hover. These aren't supported by the browser as a whole (i.e. they can _never_
match), so they can be filtered out early, when building the rules lookup.
@karlseguin karlseguin marked this pull request as ready for review March 19, 2026 12:11
@karlseguin karlseguin merged commit 60c2359 into main Mar 23, 2026
12 checks passed
@karlseguin karlseguin deleted the css-improvements branch March 23, 2026 11:11
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants