Skip to content

fix: remove in-memory cache to fix multi-instance status inconsistency#11

Merged
scuciatto merged 1 commit into
developfrom
fix/multi-instance-cache-inconsistency
Jun 25, 2026
Merged

fix: remove in-memory cache to fix multi-instance status inconsistency#11
scuciatto merged 1 commit into
developfrom
fix/multi-instance-cache-inconsistency

Conversation

@alfredodelfabro

Copy link
Copy Markdown
Member

Problem

In a multi-instance Rocket.Chat deployment, the TimeOff app reports stale "out of office" status. After a user runs /timeoff end (or start), other instances keep showing the old status for up to an hour — forcing users to re-run the slash command repeatedly until it "sticks".

Root cause

TimeOffCache was an in-memory singleton Map with a 1-hour TTL. Each Rocket.Chat instance runs its own Node process, so each had its own isolated copy of this cache. MongoDB (the apps-engine persistence layer) is the only store actually shared between instances.

  1. User runs /timeoff end on instance AsaveTimeOff() writes to MongoDB and refreshes instance A's cache.
  2. Instance B's cache still holds the stale ON_TIME_OFF entry (TTL up to 1h).
  3. A DM handled by instance B reads the stale cached value and wrongly notifies the sender that the user is OOO.

Fix

Remove the in-memory cache entirely and read/write straight through the persistence repository. MongoDB is already the shared source of truth, and lookups use the existing indexed association query (MISC:'timeoff' + USER:coreUserId) — a cheap point lookup called at most once per DM/command. So every instance always sees the current status on the first try, with no meaningful performance cost.

A cache layered on top of the shared persistence store provided no real benefit here and was the direct cause of the bug.

Changes

  • TimeOffCache.ts — deleted
  • services/TimeOffService.tssaveTimeOff / getTimeOffByUserId go straight through the repository
  • TimeOffApp.ts — removed the cache import and the invalidateCache() call in onEnable

The public ITimeOffService interface is unchanged, so callers needed no edits.

Testing

  • npm run typecheck and npm run lint pass clean
  • Verified locally on a Rocket.Chat instance — status now reflects the current persisted state on the first try

🤖 Generated with Claude Code

The TimeOffCache was an in-memory singleton Map per Node process. In a
multi-instance Rocket.Chat deployment each instance held its own isolated
cache with a 1-hour TTL, while MongoDB was the only shared store. After a
user ended their time off on one instance, other instances kept serving the
stale ON_TIME_OFF entry until their cache expired, wrongly notifying senders
that the user was still OOO and forcing repeated slash-command retries.

Remove the cache entirely and read/write straight through the persistence
repository. Lookups use the existing indexed association query, so the app
stays performant while every instance always sees the current status.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@scuciatto scuciatto merged commit defadb5 into develop Jun 25, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants