Fix the memory-pressure throttle and its RSS metric#13219
Merged
Conversation
9992445 to
111ef74
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR repairs the memory-pressure connection throttle (proxy.config.memory.max_usage) and the proxy.process.traffic_server.memory.rss gauge by switching from peak RSS reporting to a portable current-RSS helper, ensuring the metric is refreshed continuously, and re-wiring the throttle flag into the accept path.
Changes:
- Rework
MemoryLimitto publish current RSS every 10s and drive the memory throttle based on current RSS in bytes. - Restore the memory-based connection throttle by honoring
net_memory_throttlefor inbound accepts incheck_net_throttle(ACCEPT). - Add
ink_get_current_rss()(Linux + macOS) plus a unit test, and update monitoring docs to reflect current-vs-peak semantics.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/tscore/unit_tests/test_ink_sys_control.cc |
Adds a unit test for ink_get_current_rss() behavior. |
src/tscore/ink_sys_control.cc |
Implements ink_get_current_rss() using /proc/self/statm (Linux) and task_info (macOS). |
include/tscore/ink_sys_control.h |
Declares and documents ink_get_current_rss() API. |
src/traffic_server/traffic_server.cc |
Updates MemoryLimit to publish current RSS continuously and apply/release memory throttling. |
src/iocore/net/P_UnixNet.h |
Restores accept-path throttling by checking net_memory_throttle for inbound accepts. |
src/tscore/CMakeLists.txt |
Adds the new unit test to the test_tscore build. |
doc/admin-guide/monitoring/statistics/core/general.en.rst |
Updates metric documentation to reflect current RSS and refresh behavior. |
The proxy.config.memory.max_usage throttle and the proxy.process.traffic_server.memory.rss gauge are both driven by the MemoryLimit continuation, and both were broken. The throttle was silently dead. MemoryLimit sets the global net_memory_throttle flag, but the only reader in the accept path was removed in 2018 (4bfaf36), so for years the flag has been written and never consulted. Restore it by honoring net_memory_throttle in check_net_throttle(ACCEPT); gating on ACCEPT keeps outbound CONNECTs flowing so in-flight transactions can complete and release memory. The flag is now std::atomic<bool> (relaxed) since it is written on ET_TASK and read on the ET_NET accept path. The RSS reading was wrong. MemoryLimit stored getrusage().ru_maxrss << 10, which is PEAK RSS, not current -- wrong for a gauge named .rss, and it made the throttle latch on once peak crossed the limit and never release. ru_maxrss is also KiB on Linux but bytes on macOS/BSD, so the shift was wrong off-Linux. Read current RSS portably via a new ink_get_current_rss() helper (Linux /proc/self/statm, macOS task_info, FreeBSD sysctl KERN_PROC_PID) and compare current (not peak) RSS in bytes against the limit, so the throttle engages and releases correctly. The metric is tied to the feature. MemoryLimit is now scheduled only when max_usage > 0, and it samples and publishes RSS only while enabled, so the cost of reading RSS and the memory.rss gauge are incurred only when the memory-limit feature is turned on. max_usage requires a restart to change, so this gate is stable for the life of the process. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
111ef74 to
038b42e
Compare
Comment on lines
+125
to
+132
| unsigned long total_pages = 0; | ||
| unsigned long resident_pages = 0; | ||
| int matched = fscanf(fp, "%lu %lu", &total_pages, &resident_pages); | ||
| fclose(fp); | ||
|
|
||
| if (matched != 2) { | ||
| return 0; | ||
| } |
cmcfarlen
approved these changes
Jun 3, 2026
Contributor
cmcfarlen
left a comment
There was a problem hiding this comment.
Nice fix. I don't think we set that warning flag that copilot is complaining about, so its up to you if you want to address that.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
proxy.config.memory.max_usageconnection throttle and theproxy.process.traffic_server.memory.rssgauge are both driven by theMemoryLimitcontinuation insrc/traffic_server/traffic_server.cc, and bothwere broken. This restores the throttle, fixes the RSS reading, and ties the
metric's cost to the feature.
1. The throttle was silently dead (~7 years)
MemoryLimitsets the globalnet_memory_throttleflag when RSS exceedsmax_usage, but the only reader was in the accept path and it was removed in2018 (
4bfaf3645, "Make throttling feature more useful."). Since then theflag has been written and never consulted —
check_net_throttle()only honoredthe connection-count throttle. So configuring
proxy.config.memory.max_usagehad no effect on connection acceptance at all.
Restored by honoring
net_memory_throttleincheck_net_throttle(ACCEPT)(onespot, so every accept site picks it up). Gated on
ACCEPTonly: outboundCONNECTs keep flowing so in-flight transactions can complete and releasememory.
2. Wrong RSS source / units (peak vs current, platform bug)
It stored
getrusage(RUSAGE_SELF).ru_maxrss << 10:ru_maxrssis peak RSS (never decreases) — wrong for a gauge named.rss, and it made the throttle latch on once peak crossed the limit andnever release.
ru_maxrssis KiB on Linux (so<<10is correct there) but bytes onmacOS/BSD, so the shift was wrong off-Linux.
Reads current RSS portably via a new
ink_get_current_rss()helper intscore/ink_sys_control— Linux/proc/self/statm(resident pages * sysconf(_SC_PAGESIZE)), macOStask_info(... MACH_TASK_BASIC_INFO ...). Thethrottle now compares current (not peak) RSS in bytes against
max_usage(whichis documented in bytes), so it both engages and releases correctly.
3. Metric cost is tied to the feature
MemoryLimitis now scheduled only whenmax_usage > 0, and it samples andpublishes RSS only while enabled. The cost of reading RSS and the
memory.rssgauge are therefore incurred only when the memory-limit feature is turned on;
when it is disabled (the default) the process does not sample RSS and the metric
is not reported.
max_usagerequires a restart to change, so this gate isstable for the life of the process.
Why the throttle + current-RSS fixes are coupled
Reviving the throttle requires the current-RSS fix: with peak RSS the flag would
latch on forever and the server would refuse all new connections permanently.
Current RSS is what lets the throttle clear when memory actually drops.
Testing
src/tscore/unit_tests/test_ink_sys_control.cc: assertsink_get_current_rss()is plausible and grows after touching a 64 MiBallocation.
traffic_server,inknet,test_tscore); pre-commit formathooks (clang-format / cmake-format / whitespace) pass.
is enabled).
🤖 Generated with Claude Code