Skip to content

feat(shutdown): graceful shutdown timeout with force exit | 优雅关闭超时#227

Open
mechanic-Q wants to merge 1 commit intorohitg00:mainfrom
mechanic-Q:feature/graceful-shutdown-timeout
Open

feat(shutdown): graceful shutdown timeout with force exit | 优雅关闭超时#227
mechanic-Q wants to merge 1 commit intorohitg00:mainfrom
mechanic-Q:feature/graceful-shutdown-timeout

Conversation

@mechanic-Q
Copy link
Copy Markdown

@mechanic-Q mechanic-Q commented May 2, 2026

Summary | 概述

Add a shutdown timeout that force-exits the process if graceful shutdown hangs, preventing zombie processes that require manual kill.

添加关闭超时机制,防止进程因 sdk 连接卡死而僵死。

Motivation | 动机

When the iii-engine WebSocket connection is stuck or the viewer server refuses to close, the shutdown handler hangs indefinitely. SIGTERM from systemd times out after 90s and sends SIGKILL, but during those 90s the process is unresponsive. Adding a configurable force-exit timeout ensures clean termination in bounded time.

当 iii-engine 连接卡死或 viewer 服务器拒绝关闭时,shutdown 流程永久挂起,形成僵尸进程。

Changes | 改动

  • Added AGENTMEMORY_SHUTDOWN_TIMEOUT_MS env var (default: 10000ms)
  • On SIGINT/SIGTERM, starts a timer that calls process.exit(0) if shutdown exceeds the timeout
  • Timer uses unref() so it doesn't prevent normal exit if shutdown completes quickly
  • Calls clearTimeout when shutdown completes normally

Backwards Compatibility | 向后兼容

Default timeout is 10s. Completely transparent when shutdown works normally — the timer is cleared before firing.

Summary by CodeRabbit

  • Bug Fixes
    • Improved shutdown robustness by implementing a configurable timeout mechanism (default 10 seconds) that ensures the application exits cleanly rather than hanging indefinitely during shutdown.

Add AGENTMEMORY_SHUTDOWN_TIMEOUT_MS env var (default 10000ms).
If shutdown takes longer than the timeout (e.g. stuck sdk connection),
the process force-exits instead of hanging indefinitely.

The timeout uses timer.unref() so it doesn't prevent the process from
exiting naturally if shutdown completes within the timeout window.

添加优雅关闭超时机制,防止因 sdk 连接卡死导致进程僵死。
默认 10 秒超时,可通过 AGENTMEMORY_SHUTDOWN_TIMEOUT_MS 配置。
@vercel
Copy link
Copy Markdown

vercel Bot commented May 2, 2026

@mechanic-Q is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 2, 2026

📝 Walkthrough

Walkthrough

Added a shutdown timeout mechanism to the shutdown routine that enforces an overall timeout via the AGENTMEMORY_SHUTDOWN_TIMEOUT_MS environment variable (default 10000 ms). A timer forces process exit if shutdown exceeds the timeout, and the timer is cleared after normal shutdown completes.

Changes

Shutdown Timeout Mechanism

Layer / File(s) Summary
Timeout Setup
src/index.ts (lines 417–424)
Initialize a forceExit timer that logs a warning and forces process.exit(0) if shutdown exceeds the configured timeout. Timer is unref()d to avoid blocking process termination.
Timeout Cleanup
src/index.ts (lines 433–434)
Clear the forceExit timer after normal shutdown steps complete, then perform the final process.exit(0).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A rabbit's ode to patience lost,
No hanging shutdowns! Here's the cost—
A timeout timer, swift and bright,
Ensures the process says goodnight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a graceful shutdown timeout with force exit mechanism. It is concise, specific, and clearly summarizes the primary feature being added.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 5/8 reviews remaining, refill in 22 minutes and 28 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/index.ts (1)

417-437: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make shutdown idempotent.

SIGINT and SIGTERM can both reach this handler while the first invocation is still awaiting viewerServer.close(), indexPersistence.save(), or sdk.shutdown(). A second pass would start duplicate cleanup and a second force-exit timer concurrently.

🔁 Suggested fix
+  let shuttingDown = false;
+
   const shutdown = async () => {
+    if (shuttingDown) return;
+    shuttingDown = true;
     console.log(`\n[agentmemory] Shutting down...`);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/index.ts` around lines 417 - 437, Make the shutdown handler idempotent by
guarding the async shutdown() function with a single-invocation flag (e.g., let
isShuttingDown = false) so repeated SIGINT/SIGTERM calls return immediately;
only create the forceExit timer and call forceExit.unref() on the first
invocation, run healthMonitor.stop(), dedupMap.stop(), indexPersistence.stop(),
await viewerServer.close(), await indexPersistence.save().catch(...), await
sdk.shutdown(), then clearTimeout(forceExit) and exit; ensure the guard is
checked at the top of shutdown() and set true before starting async work so
concurrent signals don't start duplicate cleanup or timers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/index.ts`:
- Around line 419-424: The code uses parseInt on
process.env["AGENTMEMORY_SHUTDOWN_TIMEOUT_MS"] to compute timeoutMs but doesn't
validate NaN, which causes setTimeout(NaN) to behave incorrectly; update the
initialization of timeoutMs to parse the env var then validate it's a positive
finite integer (fallback to 10000 on missing, non-numeric, negative, zero, or
NaN), then use that validated timeoutMs when calling setTimeout for forceExit
(referencing timeoutMs, AGENTMEMORY_SHUTDOWN_TIMEOUT_MS, forceExit, and
setTimeout in the change).

---

Outside diff comments:
In `@src/index.ts`:
- Around line 417-437: Make the shutdown handler idempotent by guarding the
async shutdown() function with a single-invocation flag (e.g., let
isShuttingDown = false) so repeated SIGINT/SIGTERM calls return immediately;
only create the forceExit timer and call forceExit.unref() on the first
invocation, run healthMonitor.stop(), dedupMap.stop(), indexPersistence.stop(),
await viewerServer.close(), await indexPersistence.save().catch(...), await
sdk.shutdown(), then clearTimeout(forceExit) and exit; ensure the guard is
checked at the top of shutdown() and set true before starting async work so
concurrent signals don't start duplicate cleanup or timers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ad7038a5-c09a-4995-9427-57d11f9d7e07

📥 Commits

Reviewing files that changed from the base of the PR and between 94fc119 and b359e61.

📒 Files selected for processing (1)
  • src/index.ts

Comment thread src/index.ts
Comment on lines +419 to +424
const timeoutMs = parseInt(process.env["AGENTMEMORY_SHUTDOWN_TIMEOUT_MS"] || "10000", 10);
const forceExit = setTimeout(() => {
console.warn(`[agentmemory] Shutdown timed out after ${timeoutMs}ms, forcing exit`);
process.exit(0);
}, timeoutMs);
forceExit.unref();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/index.ts | sed -n '410,440p'

Repository: rohitg00/agentmemory

Length of output: 1405


🏁 Script executed:

# Verify parseInt behavior with NaN
node <<'EOF'
const timeoutMs = parseInt(process.env["AGENTMEMORY_SHUTDOWN_TIMEOUT_MS"] || "10000", 10);
console.log("timeoutMs with default:", timeoutMs);

// Test NaN case
const testNaN = parseInt(undefined || "10000", 10);
console.log("setTimeout with valid default:", testNaN);

// Test actual NaN scenario
const envUndefined = parseInt(undefined, 10);
console.log("parseInt(undefined, 10):", envUndefined);

// Test setTimeout behavior with NaN
console.log("Testing setTimeout with NaN:");
const timer = setTimeout(() => {
  console.log("Timeout fired");
}, NaN);
console.log("Timer object created:", typeof timer);
clearTimeout(timer);
EOF

Repository: rohitg00/agentmemory

Length of output: 388


🏁 Script executed:

# Check package.json for module type
cat package.json | grep -A2 -B2 '"type"'

Repository: rohitg00/agentmemory

Length of output: 391


Validate the shutdown timeout to prevent malformed env vars from breaking the graceful-shutdown window.

parseInt() on invalid input returns NaN. When process.env["AGENTMEMORY_SHUTDOWN_TIMEOUT_MS"] is set to an invalid value (e.g., empty string, non-numeric text), parseInt() returns NaN, and setTimeout(NaN) triggers a Node.js warning and sets the timeout to 1ms instead of the intended graceful-shutdown duration. The || "10000" fallback only handles the missing case; it does not validate invalid env var values.

🔧 Suggested fix
-    const timeoutMs = parseInt(process.env["AGENTMEMORY_SHUTDOWN_TIMEOUT_MS"] || "10000", 10);
+    const parsedTimeoutMs = Number.parseInt(process.env.AGENTMEMORY_SHUTDOWN_TIMEOUT_MS ?? "", 10);
+    const timeoutMs =
+      Number.isFinite(parsedTimeoutMs) && parsedTimeoutMs > 0 ? parsedTimeoutMs : 10_000;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const timeoutMs = parseInt(process.env["AGENTMEMORY_SHUTDOWN_TIMEOUT_MS"] || "10000", 10);
const forceExit = setTimeout(() => {
console.warn(`[agentmemory] Shutdown timed out after ${timeoutMs}ms, forcing exit`);
process.exit(0);
}, timeoutMs);
forceExit.unref();
const parsedTimeoutMs = Number.parseInt(process.env.AGENTMEMORY_SHUTDOWN_TIMEOUT_MS ?? "", 10);
const timeoutMs =
Number.isFinite(parsedTimeoutMs) && parsedTimeoutMs > 0 ? parsedTimeoutMs : 10_000;
const forceExit = setTimeout(() => {
console.warn(`[agentmemory] Shutdown timed out after ${timeoutMs}ms, forcing exit`);
process.exit(0);
}, timeoutMs);
forceExit.unref();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/index.ts` around lines 419 - 424, The code uses parseInt on
process.env["AGENTMEMORY_SHUTDOWN_TIMEOUT_MS"] to compute timeoutMs but doesn't
validate NaN, which causes setTimeout(NaN) to behave incorrectly; update the
initialization of timeoutMs to parse the env var then validate it's a positive
finite integer (fallback to 10000 on missing, non-numeric, negative, zero, or
NaN), then use that validated timeoutMs when calling setTimeout for forceExit
(referencing timeoutMs, AGENTMEMORY_SHUTDOWN_TIMEOUT_MS, forceExit, and
setTimeout in the change).

@rohitg00
Copy link
Copy Markdown
Owner

rohitg00 commented May 8, 2026

Thanks @mechanic-Q — the timeout itself is a real improvement and the unref() is a nice touch so the timer never holds the loop open after a clean exit.

Two things to fix before merge — both flagged by CodeRabbit's outside-diff review and worth addressing while you're here:

1. Make the shutdown handler idempotent.
SIGINT and SIGTERM can both reach shutdown() while the first invocation is still mid-flight (viewerServer.close(), indexPersistence.save(), sdk.shutdown()). A second pass spawns a duplicate force-exit timer and tries to double-close everything. Easiest fix:

let shuttingDown = false;
const shutdown = async () => {
  if (shuttingDown) return;
  shuttingDown = true;
  // ... existing body
};

2. Validate the env var.
parseInt("abc", 10) returns NaN, which setTimeout(..., NaN) collapses to 1ms — i.e. you'd force-exit before any cleanup runs. Fall back to the literal default when invalid:

const raw = parseInt(process.env["AGENTMEMORY_SHUTDOWN_TIMEOUT_MS"] || "10000", 10);
const timeoutMs = Number.isFinite(raw) && raw > 0 ? raw : 10000;

Optional but appreciated: add AGENTMEMORY_SHUTDOWN_TIMEOUT_MS to the env block in README.md so users know it exists.

Happy to merge once those land.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants