Database Benchmarking Framework

A parametrized and extensible benchmarking framework for testing PostgreSQL-compatible branchable database backends (Dolt, Neon, etc.) with support for branching, schema, and data related operations. Includes both macrobenchmark and microbenchmark workloads.

Quick Start

# 1. Setup environment
python3 -m venv venv
source venv/bin/activate
pip3 install .

# 2. Run a macrobenchmark 
# Mini config, always start with this
./run_macrobench.sh --mini --outdir run_stats software_dev dolt 1 db_setup/ch-w1.sql

# Full config with 2hr timeout
./run_macrobench.sh --outdir run_stats --max-runtime-sec 7200 software_dev dolt 5 db_setup/ch-w5.sql

# 3. Generate comparison plots
python scripts/macro_comparison.py --dolt-dir run_stats_final/macro/dolt_full --neon-dir run_stats_final/macro/neon_full --outdir figures/

# 4. Run a microbenchmark (latency)
./run_single_thread_bench.sh dolt db_setup/tpcc_schema.sql 16

# 5. Run a microbenchmark (throughput)
./run_throughput_bench.sh dolt db_setup/ch-w1.sql --sweep-proportional

Macrobenchmarks

Macrobenchmarks simulate real-world workflows with multiple concurrent workers performing sequences of database operations.

Running Macrobenchmarks

Use the run_macrobench.sh script in the root directory:

./run_macrobench.sh [OPTIONS] <workflow> <backend> <db_scale> <sql_path>

Arguments

Argument	Description	Options
`workflow`	Workflow type	`software_dev`, `failure_repro`, `data_cleaning`, `mcts`, `simulation`
`backend`	Database backend	`dolt`, `neon`, `kpg`, `xata`, `file_copy`, `txn`
`db_scale`	Database scale (number of warehouses)	Integer (e.g., `1`, `5`, `10`)
`sql_path`	Path to SQL schema dump	e.g., `db_setup/ch-w1.sql`, `db_setup/ch-w5.sql`

Options

Option	Description
`--mini`	Use mini config (fewer workers/steps, suitable for testing)
`--outdir DIR`	Output directory (default: `run_stats/`)
`--max-runtime-sec N`	Cap total runtime in seconds (0 = no limit)
`--measure-storage`	Enable Neon storage measurement (15-min sleep before/after)

Examples

# Run software development workflow on Dolt with 5 warehouses
./run_macrobench.sh software_dev dolt 5 db_setup/ch-w5.sql

# Run MCTS workflow on Neon with mini config
./run_macrobench.sh --mini mcts neon 1 db_setup/ch-w1.sql

# Run simulation workflow on Neon with custom output directory
./run_macrobench.sh --outdir run_stats/neon_mini simulation neon 1 db_setup/ch-w1.sql

# Run with runtime limit (10 minutes)
./run_macrobench.sh --max-runtime-sec 600 data_cleaning dolt 5 db_setup/ch-w5.sql

# Run with storage measurement for Neon
./run_macrobench.sh --measure-storage mcts neon 5 db_setup/ch-w5.sql

Output Files

Macrobenchmark results are saved to the output directory (default: run_stats/):

run_stats/
├── macro_<workflow>_<backend>_<scale>.parquet           # Operation-level latency data
└── macro_<workflow>_<backend>_<scale>_e2e_stats.json    # End-to-end statistics

Microbenchmarks

Microbenchmarks measure latency and throughput for specific database operations under controlled conditions.

Latency Benchmarks

Measure operation latency with varying numbers of branches and threads.

Single-Threaded Latency

Use run_single_thread_bench.sh to measure single-threaded operation latency:

./run_single_thread_bench.sh <backend> <sql_dump_path> <num_branches> [OPTIONS]

Arguments

Argument	Description
`backend`	Database backend: `dolt`, `neon`, `kpg`, `xata`, `file_copy`, `txn`, `tiger`
`sql_dump_path`	Path to SQL dump file (e.g., `db_setup/tpcc_schema.sql`)
`num_branches`	Number of branches to create for testing

Options

Option	Description
`--seed <seed>`	Random seed for reproducibility
`--shape <shape>`	Branch tree shape: `spine`, `bushy`, or `fan_out` (default: `spine`)
`--operations <ops>`	Comma-separated list (e.g., `UPDATE,RANGE_UPDATE`)
`--range-size <n>`	Range size for RANGE_UPDATE operation (default: 200)
`--num-ops <n>`	Number of operations to perform
`--output-dir <dir>`	Output directory (default: `/tmp/run_stats`)

Examples

# Run with 16 branches
./run_single_thread_bench.sh dolt db_setup/tpcc_schema.sql 16

# Run with custom seed and bushy branch shape
./run_single_thread_bench.sh neon db_setup/tpcc_schema.sql 32 --seed 12345 --shape bushy

# Run with custom range size
./run_single_thread_bench.sh dolt db_setup/tpcc_schema.sql 16 --operations RANGE_UPDATE --range-size 500

Multi-Threaded Latency

Use run_multithread_bench.sh to measure multi-threaded operation latency:

./run_multithread_bench.sh <backend> <sql_dump_path> [OPTIONS]

Arguments

Argument	Description
`backend`	Database backend: `dolt`, `neon`, `kpg`, `xata`, `file_copy`, `txn`, `tiger`
`sql_dump_path`	Path to SQL dump file

Options

Option	Description
`--seed <seed>`	Random seed for reproducibility
`--max-branches <max>`	Maximum number of branches (default: 1024)
`--shape <shape>`	Branch tree shape: `spine`, `bushy`, or `fan_out` (default: `spine`)
`--num-ops <n>`	Number of operations to perform
`--operations <ops>`	Comma-separated list (e.g., `READ,UPDATE`)
`--output-dir <dir>`	Output directory (default: `/tmp/run_stats`)

Examples

# Sweep from 2 to 1024 branches (threads = branches at each configuration)
./run_multithread_bench.sh dolt db_setup/tpcc_schema.sql

# Test up to 128 branches
./run_multithread_bench.sh dolt db_setup/tpcc_schema.sql --max-branches 128

# Run only READ and UPDATE operations with 100 ops per test
./run_multithread_bench.sh neon db_setup/tpcc_schema.sql --operations READ,UPDATE --num-ops 100

Note: In multi-threaded latency benchmarks, the number of threads always equals the number of branches. For independent thread/branch control, use throughput benchmarks.

Output Files

Latency benchmark results are saved to the output directory:

<output_dir>/
├── single_thread/
│   ├── branch/
│   │   ├── <backend>_<dataset>_<N>_<shape>_branch.parquet
│   │   ├── <backend>_<dataset>_<N>_<shape>_branch_setup.parquet
│   │   └── <backend>_<dataset>_<N>_<shape>_branch_summary.json
│   └── connect/
│       └── (similar structure)
└── multithread/
    └── (similar structure for multi-threaded runs)

Throughput Benchmarks

Measure throughput (operations per second) with independent control over threads and branches.

Running Throughput Benchmarks

Use run_throughput_bench.sh with one of three sweep modes:

# Sweep threads (fix branches, vary threads)
./run_throughput_bench.sh <backend> <sql_dump_path> --sweep-threads --branches <N> [OPTIONS]

# Sweep branches (fix threads, vary branches)
./run_throughput_bench.sh <backend> <sql_dump_path> --sweep-branches --threads <N> [OPTIONS]

# Sweep proportionally (vary both threads and branches together)
./run_throughput_bench.sh <backend> <sql_dump_path> --sweep-proportional [OPTIONS]

Arguments

Argument	Description
`backend`	Database backend: `dolt`, `neon`, `kpg`, `xata`, `txn`, `file_copy`, `tiger`
`sql_dump_path`	Path to SQL dump file
`--sweep-threads`	Fix branches, vary threads (requires `--branches`)
`--sweep-branches`	Fix threads, vary branches (requires `--threads`)
`--sweep-proportional`	Vary both threads and branches proportionally

Options

Option	Description
`--threads <N>`	Fixed thread count (for `--sweep-branches` mode)
`--branches <N>`	Fixed branch count (for `--sweep-threads` mode)
`--threads-per-branch <N>`	Threads per branch ratio for `--sweep-proportional` (default: 4)
`--thread-list <list>`	Comma-separated thread counts (e.g., `1,2,4,8,16`)
`--branch-list <list>`	Comma-separated branch counts (e.g., `1,2,4,8,16`)
`--seed <seed>`	Random seed for reproducibility
`--num-ops <n>`	Number of operations per test
`--operations <ops>`	Comma-separated list (e.g., `READ,RANGE_READ`)
`--output-dir <dir>`	Output directory (default: `/tmp/run_stats`)

Examples

# Fix branches at 1, vary threads: 1,2,4,8,16,32,64,128
./run_throughput_bench.sh dolt db_setup/ch-w1.sql --sweep-threads --branches 1

# Fix threads at 128, vary branches: 1,2,4,8,16,32
./run_throughput_bench.sh dolt db_setup/ch-w1.sql --sweep-branches --threads 128

# Vary both proportionally (default: 4 threads per branch)
./run_throughput_bench.sh neon db_setup/ch-w1.sql --sweep-proportional

# Custom thread/branch lists
./run_throughput_bench.sh dolt db_setup/ch-w1.sql --sweep-threads --branches 16 --thread-list "1,2,4,8,16,32"
./run_throughput_bench.sh dolt db_setup/ch-w1.sql --sweep-branches --threads 128 --branch-list "1,2,4,8,16"

# Proportional with custom ratio (8 threads per branch)
./run_throughput_bench.sh neon db_setup/ch-w1.sql --sweep-proportional --threads-per-branch 8

# Run only specific operations
./run_throughput_bench.sh dolt db_setup/ch-w1.sql --sweep-proportional --operations READ,RANGE_READ

Output Files

Throughput benchmark results are saved to the output directory:

<output_dir>/
├── <backend>_<dataset>_tp_t<threads>_b<branches>.parquet
├── <backend>_<dataset>_tp_t<threads>_b<branches>_<operation>_threads<threads>_summary.json
└── <backend>_<dataset>_tp_t<threads>_b<branches>_setup.parquet

Plotting Results

After running benchmarks, use the plotting scripts in the scripts/ directory to generate visualizations.

Macrobenchmark Comparison Plots

Compare macrobenchmark results between Dolt and Neon:

python scripts/macro_comparison.py \
    --dolt-dir <dolt_results_dir> \
    --neon-dir <neon_results_dir> \
    --outdir <output_figures_dir>

Arguments

Argument	Description
`--dolt-dir`	Directory with Dolt parquet files
`--neon-dir`	Directory with Neon parquet files
`--outdir`	Directory to save figures (default: `macro-analysis/figures_comparison`)
`--label-position`	Position for step labels as `x,y` in axes coordinates (default: `0.98,0.05`)
`--label-fontsize`	Font size for step labels (default: 16)

Generated Plots

The script generates the following figures in the output directory:

latency_boxplot_comparison.png - Box plots of latency by operation type
time_breakdown_comparison.png - Stacked bar chart of time breakdown by operation
heatmap_comparison.png - Heatmap showing latency comparison with ratios
elapsed_time_comparison.png - Elapsed time comparison by workflow
steps_over_time.png - Steps completion over time

Microbenchmark Latency Plots

Plot microbenchmark latency results:

python scripts/plot_branch_latency_micro.py \
    --data-dir <data_directory> \
    --outdir <output_figures_dir> \
    --operation <operation_type>

Arguments

Argument	Description
`--data-dir`	Directory with microbenchmark parquet files
`--branch-dir`	Directory with branch operation data (for combined plots)
`--connect-dir`	Directory with connect operation data (for combined plots)
`--outdir`	Directory to save figures
`--operation`	Operation type: `branch`, `connect`, `both`, or `combined`

Examples

# Plot branch creation latency only
python scripts/plot_branch_latency_micro.py \
    --data-dir run_stats_final/micro/single_thread/branch \
    --outdir figures/ \
    --operation branch

# Plot branch connection latency only
python scripts/plot_branch_latency_micro.py \
    --data-dir run_stats_final/micro/single_thread/connect \
    --outdir figures/ \
    --operation connect

# Plot combined (branch and connect on single plot)
python scripts/plot_branch_latency_micro.py \
    --branch-dir run_stats_final/micro/single_thread/branch \
    --connect-dir run_stats_final/micro/single_thread/connect \
    --outdir figures/ \
    --operation combined

Microbenchmark Throughput Plots

Plot throughput vs threads/branches:

python scripts/plot_throughput_experiments.py \
    --data-dir <data_directory> \
    --output <output_figures_path>

Output Files

Benchmark results are saved as Parquet files and JSON summaries.

Macrobenchmark Output Structure

run_stats_final/macro/
├── dolt_full/          # Full-scale Dolt runs
│   ├── macro_software_dev_dolt_5.parquet
│   ├── macro_software_dev_dolt_5_e2e_stats.json
│   ├── macro_failure_repro_dolt_5.parquet
│   ├── macro_data_cleaning_dolt_5.parquet
│   └── macro_mcts_dolt_5.parquet
├── dolt_mini/          # Mini-scale Dolt runs (for testing)
├── neon_full/          # Full-scale Neon runs
└── neon_mini/          # Mini-scale Neon runs

Microbenchmark Output Structure

run_stats_final/micro/
├── single_thread/      # Single-threaded latency benchmarks
│   ├── branch/
│   └── connect/
├── multithread/        # Multi-threaded latency benchmarks
├── tp_fix_branch/      # Throughput: fixed branches, varying threads
├── tp_fix_thread/      # Throughput: fixed threads, varying branches
└── tp_proportional/    # Throughput: proportional threads and branches

Parquet Schema (TODO)

Prerequisites

Python 3.11+ with virtual environment
PostgreSQL-compatible backend:
- Dolt: Follow setup at https://github.com/dolthub/doltgresql
- Neon: Configure via Neon console
psql client for database setup
Required Python packages: Install with pip install .

Environment Variables

Create a .env file for backend-specific configuration:

# Neon API key for programmatic access
NEON_API_KEY_ORG=your_key_here

# Database connection strings (if needed)
DOLT_CONNECTION_STRING=postgresql://user:pass@localhost:5432/dbname
NEON_CONNECTION_STRING=postgresql://user:pass@host.neon.tech/dbname

Additional Resources

Workflow configurations: See macrobench/configs/ for workflow definitions
Microbenchmark configs: See microbench/configs/ for example configurations
Database schemas: See db_setup/ for SQL dump files

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
agent		agent
db_setup		db_setup
dblib		dblib
macrobench		macrobench
microbench		microbench
run_stats_final		run_stats_final
scripts		scripts
util		util
.gitignore		.gitignore
README.md		README.md
bench_lib.sh		bench_lib.sh
build_protos.py		build_protos.py
pyproject.toml		pyproject.toml
run_macrobench.sh		run_macrobench.sh
run_multithread_bench.sh		run_multithread_bench.sh
run_single_thread_bench.sh		run_single_thread_bench.sh
run_storage_all.sh		run_storage_all.sh
run_throughput_bench.sh		run_throughput_bench.sh

Folders and files

Latest commit

History

Repository files navigation

Database Benchmarking Framework

Quick Start

Table of Contents

Macrobenchmarks

Running Macrobenchmarks

Arguments

Options

Examples

Output Files

Microbenchmarks

Latency Benchmarks

Single-Threaded Latency

Arguments

Options

Examples

Multi-Threaded Latency

Arguments

Options

Examples

Output Files

Throughput Benchmarks

Running Throughput Benchmarks

Arguments

Options

Examples

Output Files

Plotting Results

Macrobenchmark Comparison Plots

Arguments

Generated Plots

Microbenchmark Latency Plots

Arguments

Examples

Microbenchmark Throughput Plots

Output Files

Macrobenchmark Output Structure

Microbenchmark Output Structure

Parquet Schema (TODO)

Prerequisites

Environment Variables

Additional Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages