Run benchmarks and configure Grafana dashboard#5
Merged
Conversation
Run k6 load tests using grafana/k6 Docker image instead of requiring a local k6 binary. Mounts workspace at /workspace (read-only) and remaps file paths (PROTO_DIR etc.) to container paths. Uses --network host for target access. Replace remote randomString import with built-in crypto.randomUUID() in the k6 script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add connectrpc.com/grpcreflect to the server so k6 can discover services at runtime via reflect:true. Remove proto file loading from the k6 script and PROTO_DIR from benchmark config. Update agent and scaffold command accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use comma-separated string format for google.protobuf.FieldMask as required by protobuf JSON encoding spec, fixing serialization error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Return execa result from exec() so version collection works - Parse gRPC metrics (grpc_req_duration) alongside HTTP metrics - Remove .values wrapper missing in k6 summary-export format - Use protocol-agnostic field names (reqs, reqDuration, reqFailedRate) - Get k6 version from Docker image instead of local binary - Update all consumers (compare, formatter, run, loadtest commands) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add dotenv to auto-load .env file on CLI startup - Create .env.example with Grafana Cloud variables - Update README: add Grafana setup section, remove local k6 prerequisite, update metric names to protocol-agnostic format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace instanceId/apiKey with a single base64 token that Grafana Cloud generates directly on the OTLP configuration page. Also make the endpoint configurable via env var. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Align on OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS which are the exact env vars Grafana Cloud generates on the OTLP configuration page. Parse the headers string at publish time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Accept variadic arguments so results can be batch-published without needing to re-run benchmarks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use @opentelemetry/exporter-metrics-otlp-http which natively reads OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS from env vars. Remove custom header parsing, OTLP formatter, and grafana config section from benchmark.config.json. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lues Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix resourceFromAttributes import (OTel resources v2 API change) - Use static import for PeriodicExportingMetricReader - Add build/fix scripts to toolkit and root package.json - Move OTel and dotenv deps from root to toolkit workspace Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add importable Grafana dashboard (grafana/benchmark-dashboard.json) with overview stats, latency comparison, throughput, and summary table - Move benchmark attributes from resource to data point attributes so they appear as Prometheus labels without requiring promotion - Remove metric units to avoid Grafana Cloud appending suffixes - Update README with dashboard import instructions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the benchmark toolkit to run k6 load tests via a pinned Docker image, publish results to Grafana Cloud using the OpenTelemetry JS SDK and standard OTEL env vars, and adds an importable Grafana dashboard for comparing benchmark runs.
Changes:
- Run k6 inside Docker (
grafana/k6:1.6.1) and enable gRPC reflection-based scripts (no local k6 / proto load required) - Replace the custom Grafana Cloud OTLP publisher with OpenTelemetry JS OTLP/HTTP metrics export +
.envloading - Add a Grafana dashboard JSON and update CLI/docs/config accordingly (including multi-file
publish)
Reviewed changes
Copilot reviewed 23 out of 26 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| toolkit/src/util/exec.js | Return execa() result so callers can read stdout/stderr (needed for version/result collection). |
| toolkit/src/report/compare.js | Align comparison output fields with the new parsed k6 summary shape. |
| toolkit/src/publish/otlp.js | New OpenTelemetry-based OTLP metrics publisher using standard OTEL_EXPORTER_OTLP_* env vars. |
| toolkit/src/publish/grafana-cloud.js | Removed custom Grafana Cloud client in favor of OTel SDK. |
| toolkit/src/publish/formatter.js | Removed custom OTLP JSON formatting in favor of OTel SDK. |
| toolkit/src/metrics/k6-parser.js | Update k6 summary parsing to support HTTP vs gRPC metrics and renamed fields. |
| toolkit/src/metrics/collector.js | Collect k6 version via Docker image rather than local k6 binary. |
| toolkit/src/core/k6.js | Run k6 via Docker with mounted workspace/results; map env paths into container. |
| toolkit/src/config/schema.js | Remove Grafana config from the benchmark config schema. |
| toolkit/src/cli/commands/run.js | Update logging to new summary field names and publish via new OTLP exporter. |
| toolkit/src/cli/commands/publish.js | Support publishing multiple result files; publish via new OTLP exporter (no config load). |
| toolkit/src/cli/commands/loadtest.js | Update logging to new summary field names. |
| toolkit/package.json | Add OTel metric exporter deps and devloop scripts (build, fix). |
| toolkit/bin/benchmark.js | Load .env automatically via dotenv/config. |
| projects/content-api/connect-rpc/go.mod | Add connectrpc.com/grpcreflect (and adjust direct/indirect deps). |
| projects/content-api/connect-rpc/go.sum | Add checksums for new grpcreflect dependency. |
| projects/content-api/connect-rpc/cmd/server/setup_gateway.go | Enable gRPC reflection endpoints for tooling/k6 reflection mode. |
| projects/content-api/_shared/protobuf/k6/content-api.js | Switch to reflect: true and remove manual proto loading. |
| package.json | Add root fix script that runs workspace fixes. |
| package-lock.json | Lockfile updates for new OTel dependencies and dotenv workspace placement. |
| grafana/benchmark-dashboard.json | Add importable Grafana dashboard for benchmark overview/latency/throughput comparisons. |
| benchmark.config.json | Remove gRPC PROTO_DIR env since scripts use reflection now; remove grafana config block. |
| README.md | Document Docker-based k6, OTEL env var publishing, dashboard import, and updated commands. |
| .env.example | Add example OTEL exporter env vars for Grafana Cloud. |
| .claude/commands/scaffold-implementation.md | Update scaffolding guidance to remove PROTO_DIR for gRPC targets. |
| .claude/agents/connect-rpc-go.md | Update agent docs/snippets to include gRPC reflection wiring. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix k6 Docker networking: use --network host on Linux, --add-host host.docker.internal on macOS/Windows for Docker Desktop compatibility - Use k6 image tag as version instead of running a container to detect it - Restore .values accessor in k6 summary parser (--summary-export format) - Fix checksPassRate to use checks.values.rate instead of checks.value - Fix dashboard panel descriptions: "in seconds" → "in milliseconds" - Update README: clarify OTLP credentials are read at publish time Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Parse BuildKit --progress=plain output to extract per-stage durations (e.g. generate, builder, runtime) - Publish per-stage metrics as benchmark.build.stage.duration with benchmark.build.stage label for Grafana filtering - Add stage columns to comparison table - Add Build Stage Breakdown panel to Grafana dashboard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Restore --network host on all platforms (works on Docker Desktop) - Rewrite localhost to host.docker.internal in k6 env vars on non-Linux so k6 container can reach services published on the host - Add health check to standalone loadtest command to prevent running k6 against a service that is not ready Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reduce from 12 panels to 6 by removing redundant stat panels, duplicate bar gauges, and the summary table. Use consistent bar gauge and bar chart panels with clear labels and palette-classic-by-name coloring for easy target comparison. Layout: - Performance: Throughput + Error Rate (bar gauges) - Latency: percentile distribution (bar chart) - Build & Deploy: build time + deploy time (bar gauges) - Build Stages: per-stage breakdown (bar chart) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BuildKit progress output may go to stdout or stderr depending on Docker version and compose configuration. Parse both to reliably extract per-stage durations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BuildKit natively exports per-stage build traces via the same OTEL env vars already configured for metrics publishing. This removes the custom --progress=plain output parser which was unreliable, and delegates build stage visibility to Grafana Cloud Tempo. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
k6 1.6.1 --summary-export puts metrics directly on the metric object (e.g. metrics.grpc_req_duration.avg) without a .values wrapper. Also fix checksPassRate to read checks.value instead of checks.rate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove row separator panels and tighten grid positions for a cleaner 5-panel layout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
grafana/k6:1.6.1) with gRPC reflection mode supportOTEL_EXPORTER_OTLP_ENDPOINTandOTEL_EXPORTER_OTLP_HEADERSenv vars.envsupport for Grafana Cloud credentials withdotenvgrafana/benchmark-dashboard.json) with overview stats, latency comparison, throughput, and summary tablebuild/fixdevloop scripts to toolkit and rootpublishcommand for multiple result filesTest plan
npm run buildto verify lint/format passesnpm run benchmark -- run content-api/connect-rpcto execute a full benchmarknpm run benchmark -- publish results/content-api-connect-rpc-*.jsonto publish resultsgrafana/benchmark-dashboard.jsoninto Grafana Cloud and verify panels populate🤖 Generated with Claude Code