Implement CSSOM and Enhanced Visibility Filtering#1797
Conversation
Adds a CSS rule parser and implements `insertRule`, `deleteRule`, and `replaceSync` in `CSSStyleSheet`. Also updates `CSSRuleList` to use dynamic storage and populates sheets from `<style>` elements.
Refactors `CSSRule` to a union type for better type safety and updates `CSSStyleRule` to use `CSSStyleProperties`. Adds comprehensive tests for `insertRule`, `deleteRule`, and `replaceSync`.
Updates `Element.checkVisibility` to iterate through document stylesheets and check for matching rules with `display: none`. Also ensures `<style>` elements register their sheets and initializes them immediately upon addition to the DOM.
|
Didn't go over it in detail, since it's still pending, but two thoughts: 1 - This only handles inline styles, not external. That seems of limited use, but a good first step. Is the goal to eventually support external style sheets too? 2 - checkVisibility can be called a lot, normally through other APIs, e.g. getBoundingClientRect, so if we go down this route, do we need a better-for-querying in-memory data representation of the "computed" styles? It's called +20K times on an amazon product page. (wow, now I want to see if we can optimize the current code). |
Adds support for `pointer-events: none` in interactivity classification and expands `checkVisibility` to include `visibility` and `opacity`. Refactors CSS property lookup into a shared helper.
Introduces `CssCache` to store computed CSS properties, avoiding redundant stylesheet lookups during DOM traversals.
fcdc540 to
f37862a
Compare
|
I agree not loading external stylesheets limits the benefit of the change, but yes, it's a good start. The question will be: how important is it to load external stylesheets, I suppose it is for accuracy, but it will slow the browser loading... And yes, thinking of perf for visiblity check is important. I guess it can be done in a following PR? So since here we are talking only of inline CSS + pure JS changes, the impact of the PR is limited. |
karlseguin
left a comment
There was a problem hiding this comment.
If I run
./pandpanda fetch \
--http_proxy "http://127.0.0.1:3000" \
--insecure_disable_tls_host_verification \
"https://www.amazon.com/A_PRODUCT_PAGe"
On main, it takes ~1.5 - 2 seconds, I get a number of errors, but it seems to dump properly.
If I run it on this branch, I get fewer errors, but it takes 5-6 seconds. I'm using our implementation proxy to reduce the impact of random network differences.
I don't think this is a result of checkVisibility now being much slower. I profiled it, and while it shows up, I don't think it explains a 3x slowdown. Given that we get fewer errors now, I imagine we're doing a better job. Which is good, but...Can anyone replicate my numbers and, if so, it would e really nice to know what's going on.
Co-authored-by: Karl Seguin <karlseguin@users.noreply.github.com>
Co-authored-by: Karl Seguin <karlseguin@users.noreply.github.com>
Co-authored-by: Karl Seguin <karlseguin@users.noreply.github.com>
|
I am observing a similar behavior: 5 vs 8 seconds, but with fewer errors. Isn't it expected (to a certain point) because we're executing more JS/CSS? Because we fixed Because We should definitely try to close the gap, though, probably by implementing a proper layout-scoped |
A Page now has a StyleManager. The StyleManager currently answers two questions: 1 - Is an element hidden 2 - Does an element have pointer-events == none This is used in calls such as element.checkVisibility which, on some pages, can be called tens of thousands of times (often through other methods, like element.getBoundingClientRect). This _can_ be a bottleneck. The StyleManager keeps a list of rules. The rules include the selector, specificity, and properties that we care about. Rules in a stylesheet that contain no properties of interest are ignored. This is the first and likely most significant optimization. Presumably, most CSS rules don't have a display/visibility/opacity or pointer-events property. The list is rules is cached until stylesheets are modified or delete. When this happens, the StyleManager is flagged as "dirty" and rebuilt on-demand in the next query. This is our second major optimization. For now, to check if an element is visible, we still need to scan all rules. But having a pre-build subset of all the rules is a first step. The next step might be to optimize the matching, or possibly optimizing common cases (e.g. id and/or simple class selector)
Co-authored-by: Adrià Arrufat <1671644+arrufat@users.noreply.github.com>
Introduce StyleManager
StyleManager: restore dirty state on rebuild allocation failure
In the first iteration of this, we kept an ArrayList of all rules with
visibility properties. Why bother evaluating if a rule's selector matches an
element if that rule doesn't have any meanignful (i.e. visibility) properties?
This commit enhances that approach by bucketing the rules. Given the following
selectors:
.hidden {....}
.footer > .small {...}
We can store the rules based on their right-most selector. So, given an element
we can do:
if (getId(el)) |id| {
const rules = id_lookup.get(id) orelse continue;
// check rules
}
if (getClasses(el)) |classes| {
for (classes) |c| {
const rules = class_lookup(c) orelse continue;
// chck rules
}
}
...
On an amazon product page, the total list of visibility-related rules was ~230.
Now, scanning 230 rules for a match isn't _aweful_, but remember that this has
to be done up the ancestor tree AND, for Amazon, this is called over 20K times.
This change requires that the StyleManager becomes more matching/parsing-aware
but a typical visibility check on that same Amazon product page only has to
check 2 rules (down from 230) and often has to check 0 rules.
Also, we now filter out a few more interactive-related pseudo-elements, e.g.
:hover. These aren't supported by the browser as a whole (i.e. they can _never_
match), so they can be filtered out early, when building the rules lookup.
Bucket stylesheet rules
Optimize CSS visibility engine with lazy parsing and cache-friendly evaluation
This PR upgrades our CSS Object Model (CSSOM) to support stylesheets and dynamic rules, enabling the Semantic Tree to accurately prune hidden elements.
Core Improvements:
insertRule,deleteRule,replace, andreplaceSync.<style>tags now automatically parse their content and register rules with the document upon insertion.Element.checkVisibilityto match elements against all active document stylesheets, not just inline styles.Impact on Semantic Tree &
test_semantic_css.html:Previously, elements hidden via CSS classes or
<style>tags were incorrectly included in the Semantic Tree. With these changes:.hidden-via-style-tag { display: none; }are now correctly identified as invisible and removed from the tree.sheet.insertRule()(common in modern web apps) are now respected by the tree walker.Verification: New JS tests added for
CSSStyleSheetAPIs and verified pruning intest_semantic_css.html.test_semantic_css.html