ENH prototype histogram splitter for RandomForest benchmarks by cakedev0 · Pull Request #6 · cakedev0/scikit-learn

cakedev0 · 2026-05-19T16:56:43Z

Summary

This is an experimental branch for investigating whether a histogram-style splitter can close the RandomForest fit-time gap with sklearnex / oneDAL on CPU.

It adds:

an opt-in max_bins=None parameter to dense DecisionTreeClassifier, DecisionTreeRegressor, RandomForestClassifier, and RandomForestRegressor;
a dense HistBestSplitter for supported criteria (gini, entropy / log_loss, squared_error);
per-feature binning into up to max_bins ordered bins, using exact value bins when n_unique <= max_bins and quantile-derived thresholds otherwise;
RandomForest-level bin-code precomputation so all newly grown trees reuse the same binned representation;
support for sample weights and missing values;
explicit errors for unsupported cases: sparse input, random splitter, multi-output, monotonic constraints, absolute_error, poisson, and friedman_mse;
a compact benchmark/profiling harness under benchmarks/rf_intelex/ and Markdown reports under reports/rf_intelex/.

How We Got Here

The initial benchmark suite compared scikit-learn RandomForest fit time against sklearnex.ensemble while avoiding unsupported/fallback sklearnex configurations. The suite keeps n_estimators=20, n_jobs=1, per-fit timings under 10s, and the retained suite under 3 minutes.

Profiling and source inspection suggested two main sklearnex advantages:

high-cardinality dense data benefits from a precomputed indexed/binned feature representation;
low-cardinality data avoids repeated per-node sorting by accumulating split statistics over compact bins.

The branch first explored global sorted indices, then moved to a histogram-style splitter. Important iterations included:

changing max_bins semantics to mean actual feature binning, similar in spirit to HistGradientBoosting, rather than only enabling exact low-cardinality bins;
moving bin-code computation to the RandomForest fit level to avoid recomputing codes per tree;
using C pointers in hot histogram loops rather than generic memoryview indexing;
removing dead sorted-index setup inherited from BestSplitter.init in the histogram path;
clearing only criterion-relevant histogram workspaces with memset.

Performance

Final retained-suite benchmark, max_bins=255, n_estimators=20, n_jobs=1, with 30s warm-up and 10 timed repeats. var is (max - min) / median across repeats.

Case	branch s	sklearnex s	ratio	branch var	sklearnex var
`clf_12f_full_deep`	0.768	0.576	1.33x	0.043	0.049
`clf_12f_shallow_bootstrap`	0.386	0.304	1.27x	0.075	0.034
`clf_24f_low_card`	0.747	0.511	1.46x	0.110	0.163
`clf_96f_sqrt_leaf8`	0.477	0.287	1.66x	0.161	0.040
`reg_12f_full_deep`	1.772	1.196	1.48x	0.039	0.118
`reg_12f_full_f64`	1.154	0.789	1.46x	0.034	0.047
`reg_12f_shallow_bootstrap`	0.390	0.287	1.36x	0.047	0.028
`reg_1f_deep_full`	0.144	0.104	1.38x	0.010	0.016
`reg_24f_low_card`	1.765	1.319	1.34x	0.215	0.127
`reg_80f_sqrt_leaf8`	0.310	0.165	1.88x	0.149	0.128

A follow-up run forcing every generated X to float64 produced the same conclusion: all retained cases stayed below 2x slower than sklearnex, with worst ratio 1.84x.

Raw benchmark outputs are committed under reports/rf_intelex/results_max_bins_255_warmup30_repeats10/ and reports/rf_intelex/results_max_bins_255_warmup30_repeats10_xfloat64/.

Validation

Local checks run during development:

pytest sklearn/tree/tests/test_tree.py -q
pytest sklearn/ensemble/tests/test_forest.py -k 'max_bins' -q
pre-commit run ruff-check --files benchmarks/rf_intelex/bench_rf_fit.py sklearn/ensemble/_forest.py sklearn/tree/_classes.py
pre-commit run ruff-format --files benchmarks/rf_intelex/bench_rf_fit.py sklearn/ensemble/_forest.py sklearn/tree/_classes.py
pre-commit run cython-lint --files sklearn/tree/_splitter.pyx

The commit hooks also passed when creating the final benchmark commit.

Notes / Follow-ups

This is intentionally a prototype branch, not a polished upstream-ready API proposal. Likely follow-ups:

compact bin-code dtype (uint8 / uint16) instead of int32;
more specialized histogram update kernels for regression/classification;
sparse workspace clearing / touched-bin lists;
possible auto-detection of low-cardinality features when max_bins=None;
broader criterion support if this direction is pursued.

Branch vs main benchmark

I also reran the retained benchmark suite comparing this branch directly against main (commit ffc6cdc20b8d5eb58e38042fd90a2aeecc33dfb8). The branch uses max_bins=255; main uses the standard exact RandomForest implementation. Both runs used n_estimators=20, n_jobs=1, 30s warm-up, and 10 timed repeats. var is (max - min) / median across repeats.

Case	branch s	main s	speedup vs main	branch var	main var
`clf_12f_full_deep`	1.083	5.590	5.16x	0.088	0.035
`clf_12f_shallow_bootstrap`	0.549	2.951	5.37x	0.141	0.092
`clf_24f_low_card`	1.103	2.465	2.23x	0.178	0.012
`clf_96f_sqrt_leaf8`	0.585	1.551	2.65x	0.129	0.041
`reg_12f_full_deep`	1.936	4.754	2.46x	0.238	0.263
`reg_12f_full_f64`	1.607	2.943	1.83x	0.108	0.041
`reg_12f_shallow_bootstrap`	0.424	2.116	4.99x	0.181	0.082
`reg_1f_deep_full`	0.181	1.667	9.23x	0.210	0.037
`reg_24f_low_card`	2.519	3.901	1.55x	0.085	0.040
`reg_80f_sqrt_leaf8`	0.427	1.062	2.49x	0.445	0.125

This branch is faster than main on every retained case in this run, with speedups ranging from 1.55x to 9.23x. The wide/sqrt regression case has high branch variability, so the exact point estimate there should be treated with some caution.

…itter

cakedev0 added 10 commits May 15, 2026 11:00

intel hardware support

6699654

update

c47ea87

rm details

dd9e815

update

d264f18

Merge remote-tracking branch 'upstream/main' into doc/dpnp_xpu_support

989575f

EXP global sorted index with intp samples

c2f9993

EXP add histogram best splitter

83a5f44

ENH add histogram random forest splitter benchmark

e6666a4

Merge remote-tracking branch 'upstream/main' into rfopt/hist-best-spl…

f1955fa

…itter

BENCH add float64 random forest benchmark results

b64d08c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH prototype histogram splitter for RandomForest benchmarks#6

ENH prototype histogram splitter for RandomForest benchmarks#6
cakedev0 wants to merge 10 commits into
mainfrom
rfopt/hist-best-splitter

cakedev0 commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cakedev0 commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How We Got Here

Performance

Validation

Notes / Follow-ups

Branch vs main benchmark

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cakedev0 commented May 19, 2026 •

edited

Loading