SymDB: extraction performance benchmark by p-datadog · Pull Request #5698 · DataDog/dd-trace-rb

p-datadog · 2026-05-06T21:14:39Z

What does this PR do?

Adds a Symbol Database extraction performance benchmark. The harness generates 2500 user-code classes in a tmpdir, requires them, then runs Extractor#extract_all once and captures memory + CPU + wall time, writing symbol_database_extraction-results.json.

Wires into existing benchmark infrastructure:

new symbol_database_ prefix in benchmarks/README.md
symbol_database_extraction.rb added to the &other group in benchmarks/execution.yml so it runs in dtr CI
validate spec at spec/datadog/symbol_database/validate_benchmarks_spec.rb runs the harness with VALIDATE_BENCHMARK=true (10 classes) to catch bitrot

Stdlib only (Process.times, /proc/self/status, GC.stat) — no new gem deps.

Motivation:

Verifies the SymDB performance requirements: memory overhead < 50 MB and CPU overhead < 5% during extraction. Backlog item "Performance testing" in the symdb project.

Change log entry

None.

Additional Notes:

Local run on Ruby 3.2.3, x86_64-linux: 269 ms wall, 27 MB peak memory overhead. The CPU% emitted by the harness is single-core utilisation (near 100% by construction for a one-shot CPU-bound operation); the < 5% requirement is interpretable only when amortized over a long-running process — the harness emits raw CPU time and wall time, results doc decides PASS/FAIL.

Branched off symbol-database-upload (#5431) so the perf measurement targets the same code as the main tracer PR.

How to test the change?

# Validate spec (fast — 10 classes, run in fork)
bundle exec rspec spec/datadog/symbol_database/validate_benchmarks_spec.rb

# Full benchmark (2500 classes)
bundle exec ruby benchmarks/symbol_database_extraction.rb
cat benchmarks/symbol_database_extraction-results.json

Generates 2500 user-code classes in a tmpdir, requires them, then runs Extractor#extract_all once and captures memory + CPU + wall time. Outputs symbol_database_extraction-results.json. Wires into existing benchmark infrastructure: - new symbol_database_ prefix added to benchmarks/README.md - symbol_database_extraction.rb added to the &other group in execution.yml so it runs in dtr CI - validate spec at spec/datadog/symbol_database/validate_benchmarks_spec.rb runs the harness with VALIDATE_BENCHMARK=true (10 classes) to catch bitrot Verifies the performance requirements in projects/symdb/requirements.md (memory < 50 MB overhead during extraction; CPU < 5%). Plan in projects/symdb/testing/performance-test-plan.md. Local run on Ruby 3.2.3, x86_64-linux: 269 ms wall, 27 MB peak overhead. Stdlib only (Process.times, /proc/self/status, GC.stat) — no new gem deps.

Measures how SymDB extraction running on a background thread impacts a concurrent main-thread workload. Runs three arms (baseline / treatment / baseline_post) and reports the p99 latency ratio as the headline statistic. Addresses the deferred sub-item of the SymDB performance plan: "Extraction must not block the application's request handling" (requirements.md item 23). Originally scoped to require Rails; a synthetic non-allocating CPU workload is sufficient since the impact vector is GVL hold time during ObjectSpace traversal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

datadog-datadog-prod-us1 · 2026-05-18T16:58:15Z

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 92.86%
• Overall Coverage: 97.15% (-0.00%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 8d1101b | Docs | Datadog PR Page | Give us feedback!}

The benchmark read /proc/self/status to get VmRSS, which doesn't exist on macOS. Test failed on all seven macOS CI configs with Errno::ENOENT @ rb_sysopen - /proc/self/status. Fall back to `ps -o rss=` on platforms without /proc. Both return RSS in KB. Linux fast-path (file read, no fork) preserved. Verified VALIDATE_BENCHMARK=true on x86_64-linux-gnu Ruby 3.2.3 — both benchmarks pass and produce the same fields. macOS execution to be verified by CI re-run since this host has no macOS.

p-datadog added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label May 6, 2026

dd-octo-sts Bot added the dev/testing Involves testing processes (e.g. RSpec) label May 6, 2026

p-datadog force-pushed the symbol-database-perf-benchmark branch from 8e4b46b to 6341a20 Compare May 11, 2026 15:21

p-datadog force-pushed the symbol-database-perf-benchmark branch from 6341a20 to 9e875c6 Compare May 18, 2026 16:14

p-datadog changed the base branch from symbol-database-upload to master May 18, 2026 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SymDB: extraction performance benchmark#5698

SymDB: extraction performance benchmark#5698
p-datadog wants to merge 3 commits into
masterfrom
symbol-database-perf-benchmark

p-datadog commented May 6, 2026 •

edited by atlassian Bot

Loading

Uh oh!

datadog-datadog-prod-us1 Bot commented May 18, 2026 •

edited by datadog-prod-us1-3 Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

p-datadog commented May 6, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-datadog-prod-us1 Bot commented May 18, 2026 • edited by datadog-prod-us1-3 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

p-datadog commented May 6, 2026 •

edited by atlassian Bot

Loading

datadog-datadog-prod-us1 Bot commented May 18, 2026 •

edited by datadog-prod-us1-3 Bot

Loading