Skip to content

feat(app): add monitoring API for liveness and readiness#500

Open
varex83 wants to merge 7 commits into
mainfrom
bohdan/montiroing-api
Open

feat(app): add monitoring API for liveness and readiness#500
varex83 wants to merge 7 commits into
mainfrom
bohdan/montiroing-api

Conversation

@varex83

@varex83 varex83 commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds the monitoring API module (crates/app/src/monitoringapi/) serving /livez and /readyz, ported from Charon's app/monitoringapi.go.

  • router.rs — axum router exposing /livez (always 200) and /readyz (200 when ready, 500 + failure reason otherwise).
  • readiness.rsReadinessError enum of failure reasons, ReadyState shared via a watch channel, and a ReadinessCheck trait so the HTTP layer stays decoupled from node wiring. Each error variant maps to a Charon-compatible /readyz metric code via readyz_code().
  • checker.rs — background readiness checker tracking beacon-node sync status, peer counts, cluster quorum connectivity, and validator-client activity per epoch, with Charon-equivalent error precedence and the 320-slot / 6-round thresholds.
  • metrics.rsapp-prefixed gauges (readyz, beacon-node syncing, beacon-node peers).

Notes

  • Base is feat/app-health since this builds on that branch.
  • Readiness state machine, error precedence, epoch/slot math, and the 1–8 readyz codes are functionally equivalent to the Go reference.
  • Tick intervals use MissedTickBehavior::Skip to match Go's drop-missed-ticks ticker semantics.
  • Beacon-node errors preserve their source chain; chain timing params flow through a named ChainConfig.

Test plan

  • cargo test -p pluto-app monitoringapi (13 tests pass)
  • cargo clippy -p pluto-app --all-targets --all-features -- -D warnings
  • cargo +nightly fmt

Follow-up (not in this PR): tests for readyz codes 7/8, a gauge-mapping table test, and the None peer-count invariant.

iamquang95 and others added 5 commits June 23, 2026 18:43
Add the monitoring API module serving /livez and /readyz, with a
background readiness checker that tracks beacon-node sync status, peer
counts, cluster quorum connectivity, and validator-client activity per
epoch. Readiness failure reasons map to Charon-compatible /readyz metric
codes.
@varex83 varex83 linked an issue Jun 25, 2026 that may be closed by this pull request
pub fn router_with_state(state: MonitoringState) -> Router {
Router::new()
.route("/livez", get(livez))
.route("/readyz", get(readyz))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing /metrics endpoint

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we naturally expose it by spawning the vise exporter

/// Metrics that back the monitoring API readiness checks.
#[derive(Debug, Metrics)]
#[metrics(prefix = "app")]
pub struct MonitoringMetrics {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing app_beacon_node_version and app_validator_stack_params

ct: CancellationToken,
readiness: ReadyState,
) {
let config = match tokio::select! {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a background run on beaconNodeVersionMetric

Base automatically changed from feat/app-health to main June 25, 2026 15:17
varex83agent and others added 2 commits June 25, 2026 19:13
Address PR #500 review: define the missing app_beacon_node_version and
app_validator_stack_params metrics, and add a background task that
periodically refreshes the beacon node version gauge (on startup, then
every 10 minutes) and runs the version compatibility check, mirroring
Charon's beaconNodeVersionMetric.

Co-Authored-By: Bohdan Ohorodnii <varex83@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts:
#	crates/app/src/health/checker.rs
#	crates/app/src/health/checks.rs
#	crates/app/src/health/gatherer.rs
#	crates/app/src/health/model.rs
#	crates/app/src/health/reducers.rs
#	crates/app/src/health/select.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement app/monitoringapi

3 participants