Skip to content

Fix the memory-pressure throttle and its RSS metric#13219

Merged
moonchen merged 1 commit into
apache:masterfrom
moonchen:worktree-fix-memory-rss-metric
Jun 5, 2026
Merged

Fix the memory-pressure throttle and its RSS metric#13219
moonchen merged 1 commit into
apache:masterfrom
moonchen:worktree-fix-memory-rss-metric

Conversation

@moonchen
Copy link
Copy Markdown
Contributor

@moonchen moonchen commented Jun 1, 2026

Summary

The proxy.config.memory.max_usage connection throttle and the
proxy.process.traffic_server.memory.rss gauge are both driven by the
MemoryLimit continuation in src/traffic_server/traffic_server.cc, and both
were broken. This restores the throttle, fixes the RSS reading, and ties the
metric's cost to the feature.

1. The throttle was silently dead (~7 years)

MemoryLimit sets the global net_memory_throttle flag when RSS exceeds
max_usage, but the only reader was in the accept path and it was removed in
2018
(4bfaf3645, "Make throttling feature more useful."). Since then the
flag has been written and never consulted — check_net_throttle() only honored
the connection-count throttle. So configuring proxy.config.memory.max_usage
had no effect on connection acceptance at all.

Restored by honoring net_memory_throttle in check_net_throttle(ACCEPT) (one
spot, so every accept site picks it up). Gated on ACCEPT only: outbound
CONNECTs keep flowing so in-flight transactions can complete and release
memory.

2. Wrong RSS source / units (peak vs current, platform bug)

It stored getrusage(RUSAGE_SELF).ru_maxrss << 10:

  • ru_maxrss is peak RSS (never decreases) — wrong for a gauge named
    .rss, and it made the throttle latch on once peak crossed the limit and
    never release.
  • ru_maxrss is KiB on Linux (so <<10 is correct there) but bytes on
    macOS/BSD
    , so the shift was wrong off-Linux.

Reads current RSS portably via a new ink_get_current_rss() helper in
tscore/ink_sys_control — Linux /proc/self/statm (resident pages * sysconf(_SC_PAGESIZE)), macOS task_info(... MACH_TASK_BASIC_INFO ...). The
throttle now compares current (not peak) RSS in bytes against max_usage (which
is documented in bytes), so it both engages and releases correctly.

3. Metric cost is tied to the feature

MemoryLimit is now scheduled only when max_usage > 0, and it samples and
publishes RSS only while enabled. The cost of reading RSS and the memory.rss
gauge are therefore incurred only when the memory-limit feature is turned on;
when it is disabled (the default) the process does not sample RSS and the metric
is not reported. max_usage requires a restart to change, so this gate is
stable for the life of the process.

Why the throttle + current-RSS fixes are coupled

Reviving the throttle requires the current-RSS fix: with peak RSS the flag would
latch on forever and the server would refuse all new connections permanently.
Current RSS is what lets the throttle clear when memory actually drops.

Testing

  • New unit test src/tscore/unit_tests/test_ink_sys_control.cc: asserts
    ink_get_current_rss() is plausible and grows after touching a 64 MiB
    allocation.
  • Builds clean (traffic_server, inknet, test_tscore); pre-commit format
    hooks (clang-format / cmake-format / whitespace) pass.
  • Docs updated for the metric (current vs peak, only published when the feature
    is enabled).

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings June 1, 2026 19:43
@moonchen moonchen force-pushed the worktree-fix-memory-rss-metric branch from 9992445 to 111ef74 Compare June 1, 2026 19:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR repairs the memory-pressure connection throttle (proxy.config.memory.max_usage) and the proxy.process.traffic_server.memory.rss gauge by switching from peak RSS reporting to a portable current-RSS helper, ensuring the metric is refreshed continuously, and re-wiring the throttle flag into the accept path.

Changes:

  • Rework MemoryLimit to publish current RSS every 10s and drive the memory throttle based on current RSS in bytes.
  • Restore the memory-based connection throttle by honoring net_memory_throttle for inbound accepts in check_net_throttle(ACCEPT).
  • Add ink_get_current_rss() (Linux + macOS) plus a unit test, and update monitoring docs to reflect current-vs-peak semantics.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/tscore/unit_tests/test_ink_sys_control.cc Adds a unit test for ink_get_current_rss() behavior.
src/tscore/ink_sys_control.cc Implements ink_get_current_rss() using /proc/self/statm (Linux) and task_info (macOS).
include/tscore/ink_sys_control.h Declares and documents ink_get_current_rss() API.
src/traffic_server/traffic_server.cc Updates MemoryLimit to publish current RSS continuously and apply/release memory throttling.
src/iocore/net/P_UnixNet.h Restores accept-path throttling by checking net_memory_throttle for inbound accepts.
src/tscore/CMakeLists.txt Adds the new unit test to the test_tscore build.
doc/admin-guide/monitoring/statistics/core/general.en.rst Updates metric documentation to reflect current RSS and refresh behavior.

Comment thread src/iocore/net/P_UnixNet.h
Comment thread src/traffic_server/traffic_server.cc
Comment thread src/tscore/unit_tests/test_ink_sys_control.cc Outdated
Comment thread src/tscore/unit_tests/test_ink_sys_control.cc Outdated
Comment thread src/tscore/ink_sys_control.cc
The proxy.config.memory.max_usage throttle and the
proxy.process.traffic_server.memory.rss gauge are both driven by the
MemoryLimit continuation, and both were broken.

The throttle was silently dead. MemoryLimit sets the global
net_memory_throttle flag, but the only reader in the accept path was
removed in 2018 (4bfaf36), so for years the flag has been written and
never consulted. Restore it by honoring net_memory_throttle in
check_net_throttle(ACCEPT); gating on ACCEPT keeps outbound CONNECTs
flowing so in-flight transactions can complete and release memory. The
flag is now std::atomic<bool> (relaxed) since it is written on ET_TASK
and read on the ET_NET accept path.

The RSS reading was wrong. MemoryLimit stored getrusage().ru_maxrss
<< 10, which is PEAK RSS, not current -- wrong for a gauge named .rss,
and it made the throttle latch on once peak crossed the limit and never
release. ru_maxrss is also KiB on Linux but bytes on macOS/BSD, so the
shift was wrong off-Linux. Read current RSS portably via a new
ink_get_current_rss() helper (Linux /proc/self/statm, macOS task_info,
FreeBSD sysctl KERN_PROC_PID) and compare current (not peak) RSS in
bytes against the limit, so the throttle engages and releases correctly.

The metric is tied to the feature. MemoryLimit is now scheduled only
when max_usage > 0, and it samples and publishes RSS only while enabled,
so the cost of reading RSS and the memory.rss gauge are incurred only
when the memory-limit feature is turned on. max_usage requires a restart
to change, so this gate is stable for the life of the process.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@moonchen moonchen force-pushed the worktree-fix-memory-rss-metric branch from 111ef74 to 038b42e Compare June 1, 2026 20:00
@moonchen moonchen marked this pull request as draft June 1, 2026 21:01
@moonchen moonchen marked this pull request as ready for review June 1, 2026 21:02
Copilot AI review requested due to automatic review settings June 1, 2026 21:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment on lines +125 to +132
unsigned long total_pages = 0;
unsigned long resident_pages = 0;
int matched = fscanf(fp, "%lu %lu", &total_pages, &resident_pages);
fclose(fp);

if (matched != 2) {
return 0;
}
@bryancall bryancall added this to the 11.0.0 milestone Jun 1, 2026
@bryancall bryancall added the Bug label Jun 1, 2026
@bryancall bryancall requested a review from cmcfarlen June 1, 2026 21:43
Copy link
Copy Markdown
Contributor

@cmcfarlen cmcfarlen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix. I don't think we set that warning flag that copilot is complaining about, so its up to you if you want to address that.

@moonchen moonchen merged commit bf4cf9b into apache:master Jun 5, 2026
15 checks passed
@github-project-automation github-project-automation Bot moved this to For v10.2.0 in ATS v10.2.x Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: For v10.2.0

Development

Successfully merging this pull request may close these issues.

4 participants