Use Stopwatch for INFO uptime and master_sync_last_io_seconds_ago#1771
Use Stopwatch for INFO uptime and master_sync_last_io_seconds_ago#1771unsafePtr wants to merge 1 commit intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR replaces wall-clock-based elapsed-time calculations in Garnet's INFO output with Stopwatch-based monotonic timing so uptime and replica sync age are not distorted by system clock adjustments. It affects both the standalone server metrics path and the cluster replication metrics path, with a new INFO test intended to cover the uptime behavior.
Changes:
- Replaced server startup elapsed-time math with a monotonic
startupTimestampanchor inStoreWrapperandGarnetInfoMetrics. - Replaced replica sync age tracking with monotonic timestamps in
ReplicationManagerformaster_sync_last_io_seconds_ago. - Added a new INFO test that checks
uptime_in_secondsincreases across two calls.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
test/Garnet.test/RespInfoTests.cs |
Adds a new INFO uptime test. |
libs/server/StoreWrapper.cs |
Renames/stores startup anchor as a stopwatch timestamp. |
libs/server/Metrics/Info/GarnetInfoMetrics.cs |
Computes server uptime from monotonic elapsed time. |
libs/cluster/Server/Replication/ReplicationManager.cs |
Computes replica sync age from monotonic elapsed time. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public void UptimeIncreasesAcrossInfoCalls() | ||
| { | ||
| using var redis = ConnectionMultiplexer.Connect(TestUtils.GetConfig()); | ||
| var db = redis.GetDatabase(0); | ||
|
|
||
| static long ParseUptime(string info) => | ||
| long.Parse(info.Split("\r\n").First(x => x.StartsWith("uptime_in_seconds:")).Split(':')[1]); | ||
|
|
||
| var first = ParseUptime(db.Execute("INFO", "SERVER").ToString()); | ||
| ClassicAssert.GreaterOrEqual(first, 0); | ||
|
|
||
| Thread.Sleep(TimeSpan.FromSeconds(1.1)); | ||
|
|
||
| var second = ParseUptime(db.Execute("INFO", "SERVER").ToString()); | ||
| ClassicAssert.Greater(second, first, "uptime_in_seconds should increase between INFO calls"); | ||
| } | ||
|
|
||
| [Test] |
There was a problem hiding this comment.
Intentional — catches regressions in the new code path without simulating a clock jump (would need host-clock control or a production injection seam). Same shape dotnet/runtime#127303 and opentelemetry-dotnet#7193 shipped with.
| internal long LastPrimarySyncSeconds => IsRecovering ? (long)Stopwatch.GetElapsedTime(primary_sync_last_timestamp).TotalSeconds : 0; | ||
|
|
||
| internal void UpdateLastPrimarySyncTime() => this.primary_sync_last_time = DateTime.UtcNow.Ticks; | ||
| internal void UpdateLastPrimarySyncTime() => this.primary_sync_last_timestamp = Stopwatch.GetTimestamp(); |
uptime_in_seconds,uptime_in_days, andmaster_sync_last_io_seconds_agoinINFOare computed fromDateTime.UtcNow.Ticksdeltas. A wall-clock adjustment (NTP step, manual time change) shifts the reported duration — backward steps make the values overshoot, forward steps make them go small or negative. Sentinel and similar tooling readmaster_sync_last_io_seconds_agoto decide failover candidacy.These fields are defined by Redis as durations ("seconds since X"), not timestamps, so monotonic time is what the spec actually means. Switch to
Stopwatch.GetTimestamp()/Stopwatch.GetElapsedTimefor the elapsed-time math.INFO serverandINFO replicationcontinue to expose the same field names and semantics, just sourced from a monotonic clock.Same NTP-resilience reasoning was applied recently in dotnet/runtime#127303 for
EventCounter's polling timer.Changes
StoreWrapper.startupTime→startupTimestamp(Stopwatch tick anchor).GarnetInfoMetrics.PopulateServerInforeads uptime viaStopwatch.GetElapsedTime(startupTimestamp).ReplicationManager.primary_sync_last_time→primary_sync_last_timestamp;LastPrimarySyncSecondsandUpdateLastPrimarySyncTimeuseStopwatch.RespInfoTests.UptimeIncreasesAcrossInfoCallsasserts uptime monotonically increases between twoINFOcalls.Test plan
net8.0andnet10.0.Garnet.serverandGarnet.clusterbuild clean.