Skip to content

shuf: use memchr in split_seps#11358

Draft
xtqqczze wants to merge 1 commit intouutils:mainfrom
xtqqczze:memchr/split_seps
Draft

shuf: use memchr in split_seps#11358
xtqqczze wants to merge 1 commit intouutils:mainfrom
xtqqczze:memchr/split_seps

Conversation

@xtqqczze
Copy link
Copy Markdown
Contributor

No description provided.

@github-actions
Copy link
Copy Markdown

GNU testsuite comparison:

Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/seq/seq-epipe is now being skipped but was previously passing.
Congrats! The gnu test tests/expand/bounded-memory is now passing!

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Mar 17, 2026

Merging this PR will degrade performance by 13.4%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks
❌ 1 regressed benchmark
✅ 306 untouched benchmarks
⏩ 46 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation shuf_lines[100000] 27.4 ms 31.7 ms -13.4%
Memory shuf_lines[100000] 10.3 MB 9.8 MB +5.31%
Memory shuf_repeat_sampling[50000] 1.2 MB 1.1 MB +4.72%

Comparing xtqqczze:memchr/split_seps (651812e) with main (2d5e77d)2

Open in CodSpeed

Footnotes

  1. 46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (02104f3) during the generation of this report, so 2d5e77d was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@oech3
Copy link
Copy Markdown
Contributor

oech3 commented Mar 17, 2026

It touches data twice. I think it is why perf -13.49% .

@oech3
Copy link
Copy Markdown
Contributor

oech3 commented Mar 17, 2026

You might try to scan first few byte and predict capacity.

@xtqqczze
Copy link
Copy Markdown
Contributor Author

xtqqczze commented Mar 17, 2026

first we should improve benchmarks to include short average line length: #11364

@oech3
Copy link
Copy Markdown
Contributor

oech3 commented Mar 17, 2026

You might try to scan first few byte and predict capacity.

Can we get length of one SIMD scan? Averadge is enough (no need to get from system)?

@xtqqczze xtqqczze force-pushed the memchr/split_seps branch from 40cc40e to 651812e Compare April 15, 2026 14:26
@github-actions
Copy link
Copy Markdown

GNU testsuite comparison:

GNU test failed: tests/tail/retry. tests/tail/retry is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/pr/bounded-memory (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/cut/bounded-memory (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.
Congrats! The gnu test tests/unexpand/bounded-memory is now passing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants