API benchmark platform for comparing backend service implementations. Define your APIs, generate implementations across different tech stacks using Claude Code, then measure and compare their performance.
- Node.js >= 22
- Docker with Compose v2
- Claude Code for scaffolding implementations
- A Grafana Cloud account (optional, for publishing results)
k6 load tests run inside Docker (grafana/k6:1.6.1) — no local k6 installation required.
npm installTo publish benchmark results to Grafana Cloud, create a .env file at the repository root:
cp .env.example .envThen fill in the standard OpenTelemetry env vars:
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <token>To find these values, sign in to Grafana Cloud, open your stack, and go to Connections > OpenTelemetry (OTLP). Generate an API token and copy the two environment variables shown on the page.
The toolkit uses the official OpenTelemetry JS SDK to export metrics, so it reads these env vars natively.
Use the /define-api slash command in Claude Code to create a new API project:
> /define-api
# Claude will walk you through:
# 1. Name the project → user-api
# 2. Describe the domain → User entity with name, email, role
# 3. Pick API styles → openapi, protobuf, graphql
# 4. Generate specs + k6 → projects/user-api/_shared/
This creates the shared API specs and k6 load test scripts under projects/<project>/_shared/:
projects/user-api/
_shared/
openapi/
api-spec.yaml # OpenAPI 3.1 spec
k6/user-api.js # k6 load test script
protobuf/
user.proto # Protocol Buffers service definition
k6/user-api.js # k6 load test script
graphql/
schema.graphql # GraphQL schema
k6/user-api.js # k6 load test script
Use the /scaffold-implementation slash command in Claude Code to generate a complete implementation for any tech stack:
> /scaffold-implementation
# Claude will walk you through:
# 1. Pick a project → content-api
# 2. Pick an API style → protobuf
# 3. Name it → connect-rpc
# 4. Language + framework → go + connect-rpc
# 5. Generate everything → Dockerfile, docker-compose, source code, build files
# 6. Register the target → benchmark.config.json updated
Repeat for as many stacks as you want to compare (e.g., spring-boot in Java, express in TypeScript, ktor in Kotlin).
# list all configured targets
npm run benchmark -- list
# run a full benchmark (build → deploy → loadtest → collect → cleanup)
npm run benchmark -- run content-api/connect-rpc
# benchmark multiple implementations and compare
npm run benchmark -- run content-api/connect-rpc
npm run benchmark -- run content-api/spring-boot
npm run benchmark -- compare results/content-api-connect-rpc-*.json results/content-api-spring-boot-*.json
# publish results to Grafana Cloud
npm run benchmark -- publish results/content-api-connect-rpc-*.jsonAll targets are defined in benchmark.config.json at the repo root. Each target points to a directory containing a docker-compose.yml.
{
"targets": {
"my-go-api": {
"path": "./examples/go-api",
"composeFile": "docker-compose.yml",
"service": "api",
"port": 8080,
"protocol": "http",
"readinessProbe": {
"httpGet": {
"path": "/health",
"port": 8080,
"expectedStatus": 200
},
"initialDelayMs": 2000,
"intervalMs": 1000,
"timeoutMs": 120000
},
"k6": {
"script": "toolkit/k6/scripts/default-http.js",
"vus": 50,
"duration": "30s",
"env": {
"BASE_URL": "http://localhost:8080"
}
},
"tags": {
"language": "go",
"framework": "stdlib"
}
}
},
"defaults": {
"k6": { "vus": 50, "duration": "30s" },
"readinessProbe": {
"initialDelayMs": 2000,
"intervalMs": 1000,
"timeoutMs": 120000
}
},
"output": { "dir": "./results" }
}| Field | Required | Default | Description |
|---|---|---|---|
path |
yes | Path to the project directory containing the compose file | |
composeFile |
no | docker-compose.yml |
Compose file name |
service |
yes | Primary service name in the compose file | |
port |
yes | Port the service exposes on localhost | |
protocol |
no | http |
http or grpc |
readinessProbe |
no | see defaults | How to check if the service is ready |
k6 |
no | see defaults | k6 load test configuration |
tags |
no | {} |
Metadata labels (language, framework, etc.) |
Grafana Cloud credentials are read from the standard OTEL_EXPORTER_OTLP_* environment variables when publishing results. The CLI automatically loads a .env file from the repository root (see Setup).
| Variable | Description |
|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT |
Grafana Cloud OTLP gateway URL |
OTEL_EXPORTER_OTLP_HEADERS |
Auth header (Authorization=Basic <token>) |
Runs the full benchmark pipeline: build → deploy → loadtest → collect → cleanup.
npm run benchmark -- run my-api
npm run benchmark -- run my-api --tag "go-v1.22" --publish
npm run benchmark -- run my-api --skip-build --k6-vus 100 --k6-duration 60s| Option | Description |
|---|---|
--skip-build |
Skip the Docker build step |
--skip-loadtest |
Skip the k6 load test step |
--k6-vus <n> |
Override virtual users count |
--k6-duration <d> |
Override test duration (e.g. 30s, 1m) |
--tag <label> |
Label this run for comparison |
--publish |
Push results to Grafana Cloud after the run |
--cache |
Allow Docker build cache (default: no cache for fair benchmarks) |
Results are written to results/<target>-<timestamp>.json.
Runs only the Docker Compose build step and reports the build time.
npm run benchmark -- build my-api
npm run benchmark -- build my-api --cacheStarts the service with docker compose up -d and waits for it to become healthy. Reports the time from start to ready.
npm run benchmark -- deploy my-apiRuns a k6 load test against an already-running target.
npm run benchmark -- loadtest my-api
npm run benchmark -- loadtest my-api --k6-vus 100 --k6-duration 1mPublishes one or more results JSON files to Grafana Cloud. Accepts multiple files so you can batch-publish results from previous runs.
# publish a single result
npm run benchmark -- publish results/my-api-2026-03-04T12-00-00-000Z.json
# publish multiple results at once
npm run benchmark -- publish results/connect-rpc-*.json results/spring-boot-*.jsonCompares multiple result files side-by-side in a terminal table. Best values are highlighted in green.
npm run benchmark -- compare results/go-api-*.json results/java-api-*.jsonLists all targets defined in the config file.
npm run benchmark -- listTears down a target with docker compose down.
npm run benchmark -- clean my-api
npm run benchmark -- clean my-api --no-volumes # keep volumes| Option | Default | Description |
|---|---|---|
-c, --config <path> |
benchmark.config.json |
Path to config file |
-o, --output <dir> |
./results |
Directory for result files |
-v, --verbose |
false |
Enable debug logging |
Wall-clock time of docker compose build --no-cache, measured with process.hrtime.bigint().
Time from docker compose up -d until the service health check passes. Health is verified by polling the configured readinessProbe.httpGet endpoint.
Parsed from k6's --summary-export JSON output. Supports both HTTP (http_req_duration) and gRPC (grpc_req_duration) protocols:
| Metric | Description |
|---|---|
reqs |
Total requests (iterations) |
reqsPerSec |
Throughput (requests/second) |
reqDuration.avg |
Average response time (ms) |
reqDuration.med |
Median / p50 response time (ms) |
reqDuration.p90 |
90th percentile response time (ms) |
reqDuration.p95 |
95th percentile response time (ms) |
reqDuration.p99 |
99th percentile response time (ms) |
reqFailedRate |
Error rate (0.0 - 1.0) |
checksPassRate |
k6 check pass rate (0.0 - 1.0) |
Results are pushed to Grafana Cloud using the OpenTelemetry JS SDK via OTLP/HTTP. Each metric is sent as a gauge data point with the target name, tag, and environment info as resource attributes. The SDK reads OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS from the environment (loaded via .env).
Metrics appear in Grafana with the benchmark.* prefix:
benchmark.build.durationbenchmark.deploy.durationbenchmark.reqs_per_secbenchmark.req.duration.{avg,med,p90,p95,p99}benchmark.req_failed_ratebenchmark.checks_pass_rate
A pre-built dashboard is included at grafana/benchmark-dashboard.json. To import it:
- In Grafana, go to Dashboards > New > Import
- Upload
grafana/benchmark-dashboard.json - Select your Prometheus data source
- Click Import
The dashboard includes:
- Overview — stat panels for throughput, error rate, checks pass rate, build and deploy times
- Latency Comparison — bar chart of avg/med/p90/p95/p99 grouped by target
- Throughput & Volume — bar gauges for requests/sec and iterations/sec
- Build & Deploy Comparison — bar gauges comparing build and deploy times
- Summary Table — all metrics in a sortable table
Use the Target and Tag dropdowns at the top to filter by implementation and run tag.
The toolkit ships with default k6 scripts for HTTP and gRPC APIs in toolkit/k6/scripts/. To use a custom script, set the k6.script path in your target config:
{
"k6": {
"script": "./my-custom-k6-script.js",
"env": {
"BASE_URL": "http://localhost:8080",
"ENDPOINT": "/api/v1/users"
}
}
}Custom scripts receive environment variables defined in k6.env. The bundled scripts support BASE_URL and ENDPOINT.
The recommended way is to use Claude Code's /scaffold-implementation command, which handles everything end-to-end.
To add one manually:
- Create your project under
projects/<project-name>/<implementation>/with adocker-compose.yml - Make sure the service has a health endpoint
- Add a target entry to
benchmark.config.json - Run
npm run benchmark -- listto verify - Run
npm run benchmark -- run <target-name>
This repo ships with Claude Code slash commands and architecture agents for scaffolding benchmark implementations.
Interactive command that walks you through defining a new API project with shared specs and k6 load test scripts:
- Name the project in kebab-case (e.g.,
user-api,order-api) - Describe the domain model (entity, fields, types)
- Choose API styles to generate (
openapi,graphql,protobuf— pick one or more) - Generate specs and k6 scripts under
projects/<project>/_shared/
Once defined, use /scaffold-implementation to generate implementations for the API.
Interactive command that walks you through creating a new benchmark implementation:
- Pick a project (e.g.,
content-api) - Pick an API style (
openapi,graphql,protobuf) - Name the implementation in kebab-case (e.g.,
connect-rpc,spring-boot) - Specify language and framework
- Generate the full implementation (Dockerfile, docker-compose, source code, build files)
- Register the target in
benchmark.config.json - Verify with
npm run benchmark -- list
If a specialized architecture agent exists at .claude/agents/<implementation>-<language>.md, the command delegates to it for code generation. Otherwise it falls back to generic scaffolding.
Architecture agents live under .claude/agents/ and define opinionated, layered code generation guides for specific tech stacks.
| Agent | Stack | Description |
|---|---|---|
connect-rpc-go.md |
Go + Connect RPC | Layered architecture (APP/API/DOMAIN/OUTBOX) with sqlc, River queue, goose migrations, zerolog, godotenv, and buf validate |
To add a new agent, create .claude/agents/<implementation>-<language>.md following the same conventions. The scaffold command will automatically delegate to it.
benchmark.config.json # central config (targets, grafana, defaults)
toolkit/ # the benchmark CLI toolkit
bin/benchmark.js # CLI entry point
src/
cli/commands/ # command implementations
config/ # zod schema, loader, defaults
core/ # docker, k6, timer, health check
metrics/ # result collector, k6 parser
publish/ # OpenTelemetry OTLP metrics export
report/ # comparison logic, terminal table, JSON writer
k6/scripts/ # bundled k6 test scripts
projects/ # benchmark target projects
<project>/
_shared/ # shared API specs and k6 scripts per style
<implementation>/ # individual implementations to benchmark
.claude/
commands/ # Claude Code slash commands
agents/ # architecture agents for code generation
grafana/ # Grafana dashboard JSON (importable)
results/ # benchmark output (gitignored)
Apache-2.0