Summary
The backend (API + indexer) currently has only plain-text logs to stdout via tracing. There are no metrics, no distributed tracing, no structured log output, and minimal health checks. This issue tracks adding proper observability instrumentation.
Current State
- Logging:
tracing + tracing-subscriber with fmt::layer() (text). The json feature is compiled but unused.
- Metrics: None. No Prometheus, no
/metrics endpoint.
- Distributed tracing: None. No OpenTelemetry, no span propagation.
- Health checks:
GET /health returns "OK" with no dependency verification. Indexer has no health endpoint at all.
#[instrument]: Not used — no request-scoped spans, no correlation IDs.
tower_http::TraceLayer: Configured but effectively silent at production log levels (info).
Proposed Work
1. Prometheus Metrics (/metrics endpoint)
API server:
- Request count by route, method, status code
- Request latency histogram by route
- Active connections gauge
- Database query latency histogram
- Database connection pool stats (active, idle, max)
Indexer:
blocks_indexed_total counter
blocks_per_second gauge
batch_duration_seconds histogram
failed_blocks_total counter
rpc_requests_total counter (by status: success/failure)
rpc_request_duration_seconds histogram
indexer_head_block gauge (current indexed height)
chain_head_block gauge (latest chain height)
indexer_lag_blocks gauge (chain head - indexed head)
db_insert_duration_seconds histogram
Crate candidates: metrics + metrics-exporter-prometheus or prometheus-client.
2. Structured JSON Logging
- Activate
tracing-subscriber's json formatter behind a config flag (e.g., LOG_FORMAT=json)
- Ensure batch-complete stats are emitted as named
tracing fields, not embedded in format strings
- Add
#[instrument] to API handler functions and key indexer methods for automatic span context
3. Improved Health Checks
API:
GET /health (liveness) — keep as-is
GET /health/ready (readiness) — verify DB connectivity + indexer_state freshness (e.g., last update < 5 min)
Indexer:
- Add a lightweight HTTP server (separate port) with
/health that reports:
- Process is alive
- Last successful block indexed + timestamp
- Current lag from chain head
failed_blocks table row count
4. OpenTelemetry Integration (optional / future)
- Wire
tracing spans to OTLP exporter via tracing-opentelemetry
- Propagate trace context through RPC calls
- Export to Jaeger/Tempo/etc.
Priority
Prometheus metrics and structured logging are the highest priority — they unblock dashboards, alerting, and log aggregation. Health check improvements are a close second. OTEL tracing is a nice-to-have for later.
References
Summary
The backend (API + indexer) currently has only plain-text logs to stdout via
tracing. There are no metrics, no distributed tracing, no structured log output, and minimal health checks. This issue tracks adding proper observability instrumentation.Current State
tracing+tracing-subscriberwithfmt::layer()(text). Thejsonfeature is compiled but unused./metricsendpoint.GET /healthreturns"OK"with no dependency verification. Indexer has no health endpoint at all.#[instrument]: Not used — no request-scoped spans, no correlation IDs.tower_http::TraceLayer: Configured but effectively silent at production log levels (info).Proposed Work
1. Prometheus Metrics (
/metricsendpoint)API server:
Indexer:
blocks_indexed_totalcounterblocks_per_secondgaugebatch_duration_secondshistogramfailed_blocks_totalcounterrpc_requests_totalcounter (by status: success/failure)rpc_request_duration_secondshistogramindexer_head_blockgauge (current indexed height)chain_head_blockgauge (latest chain height)indexer_lag_blocksgauge (chain head - indexed head)db_insert_duration_secondshistogramCrate candidates:
metrics+metrics-exporter-prometheusorprometheus-client.2. Structured JSON Logging
tracing-subscriber'sjsonformatter behind a config flag (e.g.,LOG_FORMAT=json)tracingfields, not embedded in format strings#[instrument]to API handler functions and key indexer methods for automatic span context3. Improved Health Checks
API:
GET /health(liveness) — keep as-isGET /health/ready(readiness) — verify DB connectivity + indexer_state freshness (e.g., last update < 5 min)Indexer:
/healththat reports:failed_blockstable row count4. OpenTelemetry Integration (optional / future)
tracingspans to OTLP exporter viatracing-opentelemetryPriority
Prometheus metrics and structured logging are the highest priority — they unblock dashboards, alerting, and log aggregation. Health check improvements are a close second. OTEL tracing is a nice-to-have for later.
References
metrics-rstracing-opentelemetrytower-httpTraceLayer docs