Skip to content

Use Stopwatch for INFO uptime and master_sync_last_io_seconds_ago#1771

Open
unsafePtr wants to merge 1 commit intomicrosoft:mainfrom
unsafePtr:fix/info-durations-stopwatch
Open

Use Stopwatch for INFO uptime and master_sync_last_io_seconds_ago#1771
unsafePtr wants to merge 1 commit intomicrosoft:mainfrom
unsafePtr:fix/info-durations-stopwatch

Conversation

@unsafePtr
Copy link
Copy Markdown

uptime_in_seconds, uptime_in_days, and master_sync_last_io_seconds_ago in INFO are computed from DateTime.UtcNow.Ticks deltas. A wall-clock adjustment (NTP step, manual time change) shifts the reported duration — backward steps make the values overshoot, forward steps make them go small or negative. Sentinel and similar tooling read master_sync_last_io_seconds_ago to decide failover candidacy.

These fields are defined by Redis as durations ("seconds since X"), not timestamps, so monotonic time is what the spec actually means. Switch to Stopwatch.GetTimestamp() / Stopwatch.GetElapsedTime for the elapsed-time math. INFO server and INFO replication continue to expose the same field names and semantics, just sourced from a monotonic clock.

Same NTP-resilience reasoning was applied recently in dotnet/runtime#127303 for EventCounter's polling timer.

Changes

  • StoreWrapper.startupTimestartupTimestamp (Stopwatch tick anchor).
  • GarnetInfoMetrics.PopulateServerInfo reads uptime via Stopwatch.GetElapsedTime(startupTimestamp).
  • ReplicationManager.primary_sync_last_timeprimary_sync_last_timestamp; LastPrimarySyncSeconds and UpdateLastPrimarySyncTime use Stopwatch.
  • New test RespInfoTests.UptimeIncreasesAcrossInfoCalls asserts uptime monotonically increases between two INFO calls.

Test plan

  • New test passes on net8.0 and net10.0.
  • Garnet.server and Garnet.cluster build clean.
  • CI

Copilot AI review requested due to automatic review settings May 5, 2026 23:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces wall-clock-based elapsed-time calculations in Garnet's INFO output with Stopwatch-based monotonic timing so uptime and replica sync age are not distorted by system clock adjustments. It affects both the standalone server metrics path and the cluster replication metrics path, with a new INFO test intended to cover the uptime behavior.

Changes:

  • Replaced server startup elapsed-time math with a monotonic startupTimestamp anchor in StoreWrapper and GarnetInfoMetrics.
  • Replaced replica sync age tracking with monotonic timestamps in ReplicationManager for master_sync_last_io_seconds_ago.
  • Added a new INFO test that checks uptime_in_seconds increases across two calls.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
test/Garnet.test/RespInfoTests.cs Adds a new INFO uptime test.
libs/server/StoreWrapper.cs Renames/stores startup anchor as a stopwatch timestamp.
libs/server/Metrics/Info/GarnetInfoMetrics.cs Computes server uptime from monotonic elapsed time.
libs/cluster/Server/Replication/ReplicationManager.cs Computes replica sync age from monotonic elapsed time.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +79 to 96
public void UptimeIncreasesAcrossInfoCalls()
{
using var redis = ConnectionMultiplexer.Connect(TestUtils.GetConfig());
var db = redis.GetDatabase(0);

static long ParseUptime(string info) =>
long.Parse(info.Split("\r\n").First(x => x.StartsWith("uptime_in_seconds:")).Split(':')[1]);

var first = ParseUptime(db.Execute("INFO", "SERVER").ToString());
ClassicAssert.GreaterOrEqual(first, 0);

Thread.Sleep(TimeSpan.FromSeconds(1.1));

var second = ParseUptime(db.Execute("INFO", "SERVER").ToString());
ClassicAssert.Greater(second, first, "uptime_in_seconds should increase between INFO calls");
}

[Test]
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — catches regressions in the new code path without simulating a clock jump (would need host-clock control or a production injection seam). Same shape dotnet/runtime#127303 and opentelemetry-dotnet#7193 shipped with.

Comment on lines +37 to +39
internal long LastPrimarySyncSeconds => IsRecovering ? (long)Stopwatch.GetElapsedTime(primary_sync_last_timestamp).TotalSeconds : 0;

internal void UpdateLastPrimarySyncTime() => this.primary_sync_last_time = DateTime.UtcNow.Ticks;
internal void UpdateLastPrimarySyncTime() => this.primary_sync_last_timestamp = Stopwatch.GetTimestamp();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants