Skip to content

Run benchmarks and configure Grafana dashboard#5

Merged
viqueen merged 23 commits into
mainfrom
feature/run-benchmark-and-configure-grafana
Mar 8, 2026
Merged

Run benchmarks and configure Grafana dashboard#5
viqueen merged 23 commits into
mainfrom
feature/run-benchmark-and-configure-grafana

Conversation

@viqueen

@viqueen viqueen commented Mar 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Run k6 load tests via Docker (grafana/k6:1.6.1) with gRPC reflection mode support
  • Fix benchmark result collection: exec return values, gRPC metric parsing, version detection
  • Replace custom OTLP publisher with OpenTelemetry JS SDK, using standard OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS env vars
  • Add .env support for Grafana Cloud credentials with dotenv
  • Add importable Grafana dashboard (grafana/benchmark-dashboard.json) with overview stats, latency comparison, throughput, and summary table
  • Add build/fix devloop scripts to toolkit and root
  • Pin Docker image versions, support standalone publish command for multiple result files

Test plan

  • Run npm run build to verify lint/format passes
  • Run npm run benchmark -- run content-api/connect-rpc to execute a full benchmark
  • Run npm run benchmark -- publish results/content-api-connect-rpc-*.json to publish results
  • Import grafana/benchmark-dashboard.json into Grafana Cloud and verify panels populate

🤖 Generated with Claude Code

viqueen and others added 14 commits March 8, 2026 09:22
Run k6 load tests using grafana/k6 Docker image instead of requiring
a local k6 binary. Mounts workspace at /workspace (read-only) and
remaps file paths (PROTO_DIR etc.) to container paths. Uses
--network host for target access.

Replace remote randomString import with built-in crypto.randomUUID()
in the k6 script.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add connectrpc.com/grpcreflect to the server so k6 can discover
services at runtime via reflect:true. Remove proto file loading
from the k6 script and PROTO_DIR from benchmark config. Update
agent and scaffold command accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use comma-separated string format for google.protobuf.FieldMask as
required by protobuf JSON encoding spec, fixing serialization error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Return execa result from exec() so version collection works
- Parse gRPC metrics (grpc_req_duration) alongside HTTP metrics
- Remove .values wrapper missing in k6 summary-export format
- Use protocol-agnostic field names (reqs, reqDuration, reqFailedRate)
- Get k6 version from Docker image instead of local binary
- Update all consumers (compare, formatter, run, loadtest commands)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add dotenv to auto-load .env file on CLI startup
- Create .env.example with Grafana Cloud variables
- Update README: add Grafana setup section, remove local k6 prerequisite,
  update metric names to protocol-agnostic format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace instanceId/apiKey with a single base64 token that Grafana Cloud
generates directly on the OTLP configuration page. Also make the
endpoint configurable via env var.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Align on OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS
which are the exact env vars Grafana Cloud generates on the OTLP
configuration page. Parse the headers string at publish time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Accept variadic arguments so results can be batch-published without
needing to re-run benchmarks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use @opentelemetry/exporter-metrics-otlp-http which natively reads
OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS from env
vars. Remove custom header parsing, OTLP formatter, and grafana config
section from benchmark.config.json.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lues

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix resourceFromAttributes import (OTel resources v2 API change)
- Use static import for PeriodicExportingMetricReader
- Add build/fix scripts to toolkit and root package.json
- Move OTel and dotenv deps from root to toolkit workspace

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add importable Grafana dashboard (grafana/benchmark-dashboard.json)
  with overview stats, latency comparison, throughput, and summary table
- Move benchmark attributes from resource to data point attributes so
  they appear as Prometheus labels without requiring promotion
- Remove metric units to avoid Grafana Cloud appending suffixes
- Update README with dashboard import instructions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 8, 2026 00:22
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the benchmark toolkit to run k6 load tests via a pinned Docker image, publish results to Grafana Cloud using the OpenTelemetry JS SDK and standard OTEL env vars, and adds an importable Grafana dashboard for comparing benchmark runs.

Changes:

  • Run k6 inside Docker (grafana/k6:1.6.1) and enable gRPC reflection-based scripts (no local k6 / proto load required)
  • Replace the custom Grafana Cloud OTLP publisher with OpenTelemetry JS OTLP/HTTP metrics export + .env loading
  • Add a Grafana dashboard JSON and update CLI/docs/config accordingly (including multi-file publish)

Reviewed changes

Copilot reviewed 23 out of 26 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
toolkit/src/util/exec.js Return execa() result so callers can read stdout/stderr (needed for version/result collection).
toolkit/src/report/compare.js Align comparison output fields with the new parsed k6 summary shape.
toolkit/src/publish/otlp.js New OpenTelemetry-based OTLP metrics publisher using standard OTEL_EXPORTER_OTLP_* env vars.
toolkit/src/publish/grafana-cloud.js Removed custom Grafana Cloud client in favor of OTel SDK.
toolkit/src/publish/formatter.js Removed custom OTLP JSON formatting in favor of OTel SDK.
toolkit/src/metrics/k6-parser.js Update k6 summary parsing to support HTTP vs gRPC metrics and renamed fields.
toolkit/src/metrics/collector.js Collect k6 version via Docker image rather than local k6 binary.
toolkit/src/core/k6.js Run k6 via Docker with mounted workspace/results; map env paths into container.
toolkit/src/config/schema.js Remove Grafana config from the benchmark config schema.
toolkit/src/cli/commands/run.js Update logging to new summary field names and publish via new OTLP exporter.
toolkit/src/cli/commands/publish.js Support publishing multiple result files; publish via new OTLP exporter (no config load).
toolkit/src/cli/commands/loadtest.js Update logging to new summary field names.
toolkit/package.json Add OTel metric exporter deps and devloop scripts (build, fix).
toolkit/bin/benchmark.js Load .env automatically via dotenv/config.
projects/content-api/connect-rpc/go.mod Add connectrpc.com/grpcreflect (and adjust direct/indirect deps).
projects/content-api/connect-rpc/go.sum Add checksums for new grpcreflect dependency.
projects/content-api/connect-rpc/cmd/server/setup_gateway.go Enable gRPC reflection endpoints for tooling/k6 reflection mode.
projects/content-api/_shared/protobuf/k6/content-api.js Switch to reflect: true and remove manual proto loading.
package.json Add root fix script that runs workspace fixes.
package-lock.json Lockfile updates for new OTel dependencies and dotenv workspace placement.
grafana/benchmark-dashboard.json Add importable Grafana dashboard for benchmark overview/latency/throughput comparisons.
benchmark.config.json Remove gRPC PROTO_DIR env since scripts use reflection now; remove grafana config block.
README.md Document Docker-based k6, OTEL env var publishing, dashboard import, and updated commands.
.env.example Add example OTEL exporter env vars for Grafana Cloud.
.claude/commands/scaffold-implementation.md Update scaffolding guidance to remove PROTO_DIR for gRPC targets.
.claude/agents/connect-rpc-go.md Update agent docs/snippets to include gRPC reflection wiring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread toolkit/src/core/k6.js
Comment thread toolkit/src/metrics/collector.js Outdated
Comment thread projects/content-api/connect-rpc/cmd/server/setup_gateway.go
Comment thread README.md Outdated
Comment thread grafana/benchmark-dashboard.json Outdated
Comment thread grafana/benchmark-dashboard.json Outdated
Comment thread toolkit/src/metrics/k6-parser.js
Comment thread toolkit/src/metrics/k6-parser.js
viqueen and others added 8 commits March 8, 2026 12:42
- Fix k6 Docker networking: use --network host on Linux, --add-host
  host.docker.internal on macOS/Windows for Docker Desktop compatibility
- Use k6 image tag as version instead of running a container to detect it
- Restore .values accessor in k6 summary parser (--summary-export format)
- Fix checksPassRate to use checks.values.rate instead of checks.value
- Fix dashboard panel descriptions: "in seconds" → "in milliseconds"
- Update README: clarify OTLP credentials are read at publish time

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Parse BuildKit --progress=plain output to extract per-stage durations
  (e.g. generate, builder, runtime)
- Publish per-stage metrics as benchmark.build.stage.duration with
  benchmark.build.stage label for Grafana filtering
- Add stage columns to comparison table
- Add Build Stage Breakdown panel to Grafana dashboard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Restore --network host on all platforms (works on Docker Desktop)
- Rewrite localhost to host.docker.internal in k6 env vars on non-Linux
  so k6 container can reach services published on the host
- Add health check to standalone loadtest command to prevent running
  k6 against a service that is not ready

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reduce from 12 panels to 6 by removing redundant stat panels,
duplicate bar gauges, and the summary table. Use consistent
bar gauge and bar chart panels with clear labels and
palette-classic-by-name coloring for easy target comparison.

Layout:
- Performance: Throughput + Error Rate (bar gauges)
- Latency: percentile distribution (bar chart)
- Build & Deploy: build time + deploy time (bar gauges)
- Build Stages: per-stage breakdown (bar chart)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BuildKit progress output may go to stdout or stderr depending on
Docker version and compose configuration. Parse both to reliably
extract per-stage durations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BuildKit natively exports per-stage build traces via the same OTEL env
vars already configured for metrics publishing. This removes the custom
--progress=plain output parser which was unreliable, and delegates
build stage visibility to Grafana Cloud Tempo.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
k6 1.6.1 --summary-export puts metrics directly on the metric object
(e.g. metrics.grpc_req_duration.avg) without a .values wrapper. Also
fix checksPassRate to read checks.value instead of checks.rate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove row separator panels and tighten grid positions for a cleaner
5-panel layout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@viqueen viqueen enabled auto-merge (squash) March 8, 2026 03:14
@viqueen viqueen merged commit 77e0d61 into main Mar 8, 2026
2 checks passed
@viqueen viqueen deleted the feature/run-benchmark-and-configure-grafana branch March 8, 2026 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants