Server-layer observability (uvicorn/granian/hypercorn) + live admin Observability dashboard#146
Merged
Merged
Conversation
Add server-observability backend: a ServerStatsPort outbound port (best-effort native stats on the serve_async path), per-adapter sample() implementations, a pure-ASGI server-metrics middleware (the uniform primary source for connections/in-flight/requests across all servers and worker counts), and a ServerMetricsBinder that emits worker/uptime/lifecycle meters from the in-worker ASGI lifespan. Gated by pyfly.server.observability.* (on under web/core starters). Wired into both the Starlette and FastAPI create_app lifespans.
Add pyfly.observability.multiprocess (init dir before workers fork, build an aggregating MultiProcessCollector registry). /actuator/prometheus aggregates across workers when PROMETHEUS_MULTIPROC_DIR is set; cli run enables it for workers>1. Fixes the pre-existing per-worker gap for http_server_requests + server_* meters.
Add an ObservabilityProvider (reads server_* meters, multiprocess-aware, with a per-worker breakdown), REST + SSE routes (/admin/api/observability[,/sse]), and a live observability.js view (stat cards, rolling charts, per-worker table, links to Metrics/Traces) registered in the SPA router + sidebar.
…stack End-to-end test boots a real uvicorn server via serve_async, fires HTTP traffic, and asserts the server_* meters move and are served at the exposition. Add a prometheus + grafana docker-compose stack (ops/prometheus/prometheus.yml) scraping /actuator/prometheus.
…ity view Update observability/server/admin module docs, README + ROADMAP, and the observability book chapter (EN + ES) with the server_* metric catalog, the pyfly.server.observability.* config, multi-worker aggregation, and the live admin Observability dashboard section.
…urity review
- Binder: guard all gauge writes + run sample() off-thread; _run never dies
silently; stop() always records server_stopped_total and cleans up even if the
sampling task died (was: stop() re-raised a dead task's exception, breaking
graceful shutdown). Mark workers dead in multiprocess mode on graceful stop.
- Resolve the concrete server type ('auto' -> uvicorn/granian/hypercorn) so the
server_* metric label is meaningful; binder falls back off the 'auto' sentinel.
- Admin provider: honor pyfly.server.observability.enabled (disabled -> dashboard
empty-state); fix falsy-zero native_connections; move requests/sec to per-stream
state (was corrupted by sharing one provider across REST + SSE + tabs).
- Multiprocess: graceful scrape fallback when the dir is missing; atexit cleanup +
stale-dir sweep so mmap dirs don't accumulate across restarts.
- ASGI exclusion matches /api/sse/ as a substring (custom admin paths too).
- docker-compose: bind prometheus/grafana to loopback, drop the hardcoded admin
password, downgrade anonymous Grafana to read-only Viewer (security review).
Bump version 26.06.112 -> 26.06.113 and add the CHANGELOG entry for the server-layer observability feature (metrics across uvicorn/granian/hypercorn, multi-worker aggregation, live admin Observability dashboard).
CI runs 'ruff format --check'; format the 4 new files to satisfy it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds observability for the ASGI server layer — until now pyfly only observed the application layer (the
http_server_requests_secondsfilter, tracing/correlation, process metrics). This surfaces metrics about the server itself across Uvicorn, Granian, and Hypercorn, with correct multi-worker aggregation, and a live admin Observability dashboard section.Targets release v26.06.113.
How it works (3 cooperating mechanisms)
All write to the Prometheus registry and are auto-exposed at
/actuator/prometheus; everything is gated onpyfly.server.observability.enabled(on under the web/core starters) and degrades to a no-op withoutprometheus_client.ServerMetricsASGIMiddleware(web/adapters/starlette/asgi_server_metrics.py) — the primary, uniform source, installed outermost so it runs in every worker for every server/worker-count. Emitsserver_active_connections,server_in_flight_requests,server_requests_total.ServerMetricsBinder(observability/server_metrics.py) — bound from the in-worker ASGI lifespan (besideregister_process_metrics/ManagementServer). Emitsserver_workers,server_uptime_seconds,server_started_total/server_stopped_total, and optionalserver_native_connections.ServerStatsPort(server/ports/server_stats.py) — best-effort per-adapter native stats; uvicorn surfaces true socket counts on theserve_asyncpath, granian/hypercorn report workers+uptime only.Multi-worker aggregation
pyfly runenablesprometheus_clientmultiprocess mode (PROMETHEUS_MULTIPROC_DIRset before forking) forworkers > 1, so one scrape aggregates across all workers viaMultiProcessCollector. This also fixes the prior per-worker gap forhttp_server_requests_*.Admin dashboard
New live Observability view (Monitoring group): stat cards (workers, uptime, active connections, in-flight, requests/sec), rolling charts, a per-worker breakdown table, lifecycle, and links to Metrics/Traces. Backed by
GET /admin/api/observability+ theobservabilitySSE stream.Config
pyfly.server.observability.{enabled,sample-interval-seconds,access-log}. Local Prometheus+Grafana stack added todocker-compose.yml(loopback-bound;ops/prometheus/prometheus.yml).Scope
Gunicorn is intentionally not added (stack stays async-only ASGI), but the
ServerStatsPort+ multiprocess design is gunicorn-ready.Quality
mypy --strictclean (683 files).tests/server/test_server_observability_e2e.py) boots uvicorn viaserve_async, fires HTTP traffic, and asserts theserver_*meters move and are served.autoserver-label resolution) and a security review (loopback-bound, no hardcoded Grafana admin password, anonymous read-only).Docs updated: observability/server/admin module docs, README, ROADMAP, and the observability book chapter (EN + ES).
🤖 Generated with Claude Code