ENH prototype histogram splitter for RandomForest benchmarks#6
Open
cakedev0 wants to merge 10 commits into
Open
ENH prototype histogram splitter for RandomForest benchmarks#6cakedev0 wants to merge 10 commits into
cakedev0 wants to merge 10 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is an experimental branch for investigating whether a histogram-style splitter can close the RandomForest fit-time gap with
sklearnex/ oneDAL on CPU.It adds:
max_bins=Noneparameter to denseDecisionTreeClassifier,DecisionTreeRegressor,RandomForestClassifier, andRandomForestRegressor;HistBestSplitterfor supported criteria (gini,entropy/log_loss,squared_error);max_binsordered bins, using exact value bins whenn_unique <= max_binsand quantile-derived thresholds otherwise;absolute_error,poisson, andfriedman_mse;benchmarks/rf_intelex/and Markdown reports underreports/rf_intelex/.How We Got Here
The initial benchmark suite compared scikit-learn RandomForest fit time against
sklearnex.ensemblewhile avoiding unsupported/fallback sklearnex configurations. The suite keepsn_estimators=20,n_jobs=1, per-fit timings under 10s, and the retained suite under 3 minutes.Profiling and source inspection suggested two main sklearnex advantages:
The branch first explored global sorted indices, then moved to a histogram-style splitter. Important iterations included:
max_binssemantics to mean actual feature binning, similar in spirit to HistGradientBoosting, rather than only enabling exact low-cardinality bins;BestSplitter.initin the histogram path;memset.Performance
Final retained-suite benchmark,
max_bins=255,n_estimators=20,n_jobs=1, with 30s warm-up and 10 timed repeats.varis(max - min) / medianacross repeats.clf_12f_full_deepclf_12f_shallow_bootstrapclf_24f_low_cardclf_96f_sqrt_leaf8reg_12f_full_deepreg_12f_full_f64reg_12f_shallow_bootstrapreg_1f_deep_fullreg_24f_low_cardreg_80f_sqrt_leaf8A follow-up run forcing every generated
Xtofloat64produced the same conclusion: all retained cases stayed below 2x slower than sklearnex, with worst ratio 1.84x.Raw benchmark outputs are committed under
reports/rf_intelex/results_max_bins_255_warmup30_repeats10/andreports/rf_intelex/results_max_bins_255_warmup30_repeats10_xfloat64/.Validation
Local checks run during development:
pytest sklearn/tree/tests/test_tree.py -qpytest sklearn/ensemble/tests/test_forest.py -k 'max_bins' -qpre-commit run ruff-check --files benchmarks/rf_intelex/bench_rf_fit.py sklearn/ensemble/_forest.py sklearn/tree/_classes.pypre-commit run ruff-format --files benchmarks/rf_intelex/bench_rf_fit.py sklearn/ensemble/_forest.py sklearn/tree/_classes.pypre-commit run cython-lint --files sklearn/tree/_splitter.pyxThe commit hooks also passed when creating the final benchmark commit.
Notes / Follow-ups
This is intentionally a prototype branch, not a polished upstream-ready API proposal. Likely follow-ups:
uint8/uint16) instead ofint32;max_bins=None;Branch vs main benchmark
I also reran the retained benchmark suite comparing this branch directly against
main(commitffc6cdc20b8d5eb58e38042fd90a2aeecc33dfb8). The branch usesmax_bins=255;mainuses the standard exact RandomForest implementation. Both runs usedn_estimators=20,n_jobs=1, 30s warm-up, and 10 timed repeats.varis(max - min) / medianacross repeats.clf_12f_full_deepclf_12f_shallow_bootstrapclf_24f_low_cardclf_96f_sqrt_leaf8reg_12f_full_deepreg_12f_full_f64reg_12f_shallow_bootstrapreg_1f_deep_fullreg_24f_low_cardreg_80f_sqrt_leaf8This branch is faster than
mainon every retained case in this run, with speedups ranging from 1.55x to 9.23x. The wide/sqrt regression case has high branch variability, so the exact point estimate there should be treated with some caution.