feat(templates): add conversation history persistence to HTTP agent templates#794
feat(templates): add conversation history persistence to HTTP agent templates#794aidandaly24 wants to merge 2 commits into
Conversation
Package TarballHow to installgh release download pr-794-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.14.1.tgz |
notgitika
left a comment
There was a problem hiding this comment.
2 comments but LGTM otherwise
1e4f23a to
ea2f0cc
Compare
|
Claude Security Review: no high-confidence findings. (run) |
…emplates Closes aws#808, aws#809. All four HTTP agent templates were stateless per-invocation. Each call started with zero conversation context, so multi-turn recall within a session did not work. Strands additionally leaked history across sessions because all invocations shared a single global Agent instance. Each template now uses its framework's idiomatic per-session mechanism: - Strands: replace global _agent with @lru_cache(128) get_or_create_agent(session_id). The Agent accumulates messages internally across stream_async() calls, so reusing the per-session instance gives multi-turn memory and isolates sessions. Bound at 128 entries to cap process memory. Only applied to the no-memory, no-config-bundle branch — hasMemory already keys per (session_id, user_id) and hasConfigBundle intentionally recreates the agent each invoke for hooks. - OpenAI Agents: @lru_cache(128) get_session(session_id) returning SQLiteSession, threaded via session= on Runner.run(). Auto-loads/saves prior history in-process. - Google ADK: module-level _session_service + _runner with a get_or_create_session() helper that calls get_session() then create_session() if missing. Avoids AlreadyExistsError on repeat calls. - LangGraph: module-level _checkpointer = InMemorySaver() passed to create_react_agent(). thread_id mapped to context.session_id via configurable on graph.ainvoke. All state is in-memory and best-effort: persists across invocations within the runtime process, resets on cold starts. Users who need durability can swap to each framework's persistent backend. End-to-end deploy + invoke testing covered in the original PR description.
ea2f0cc to
5ad1381
Compare
|
Claude Security Review: no high-confidence findings. (run) |
|
@notgitika Pushed a fresh rebase onto main with both of your review comments addressed:
Also dropped #810 from scope since both LangGraph and OpenAI templates already have system prompts on main from other PRs that landed in the meantime. Scope is now #808 + #809. All checks green; ready for re-review when you have a chance. |
| tools=tools | ||
| ) | ||
| return _agent | ||
| @lru_cache(maxsize=128) |
There was a problem hiding this comment.
what happens for turn 129? they would lose the whole history?
if so, can we add some logging so the user is aware?
There was a problem hiding this comment.
When the 129th session arrives, the session #1's agent will be evicted and its message history lost. The next call to session #1, will get a new Agent with no prior memory without any error/warning. lru_cache doesn't support eviction callbacks, so this cannot be logged. Maybe add a comment in the template, "Caches up to 128 active sessions; LRU eviction silently resets history for the oldest session. For production use, replace with a durable session store."
There was a problem hiding this comment.
Yep — turn 129 evicts session #1, and the next call to session #1 starts fresh with no warning. functools.lru_cache doesn't expose an eviction callback so we can't log on it directly. Took @padmak30's suggestion below and added a documentation comment above the cache in both Strands and OpenAI templates explaining the cap and pointing to durable session stores (Strands FileSessionManager, SQLiteSession with file path) for production use.
There was a problem hiding this comment.
Used your suggested wording (lightly adapted). Added the comment above the @lru_cache in both Strands and OpenAI Agents templates pointing users to FileSessionManager / SQLiteSession with a file path for production use. Thanks for the clean phrasing.
| {{#if hasConfigBundle}} | ||
| def create_agent(): | ||
| return Agent( | ||
| model=load_model(), | ||
| system_prompt=DEFAULT_SYSTEM_PROMPT, | ||
| tools=tools, | ||
| hooks=[ConfigBundleHook()], | ||
| ) |
There was a problem hiding this comment.
The config bundle path should also get conversation persistence across sessions
There was a problem hiding this comment.
Good catch — fixed in the latest commit. The hasConfigBundle branch now uses the same @lru_cache(128) get_or_create_agent(session_id) pattern as the non-bundle branch, with hooks=[ConfigBundleHook()] attached at construction. Verified safe to cache: ConfigBundleHook reads bundle state inside its callbacks (_inject_system_prompt, _override_tool_desc) at invocation time, not at hook registration, so reusing the same Agent across invocations still picks up bundle changes. The hasMemory branch already attaches the hook the same way to a cached agent, so this matches existing precedent in the template.
| tools=tools | ||
| ) | ||
| return _agent | ||
| @lru_cache(maxsize=128) |
There was a problem hiding this comment.
When the 129th session arrives, the session #1's agent will be evicted and its message history lost. The next call to session #1, will get a new Agent with no prior memory without any error/warning. lru_cache doesn't support eviction callbacks, so this cannot be logged. Maybe add a comment in the template, "Caches up to 128 active sessions; LRU eviction silently resets history for the oldest session. For production use, replace with a durable session store."
… document LRU cap Addresses follow-up review comments on aws#794: - Strands hasConfigBundle branch now caches Agent per session_id via the same @lru_cache(128) get_or_create_agent() pattern as the non-bundle branch, with hooks=[ConfigBundleHook()] attached at construction. ConfigBundleHook reads bundle state inside its callbacks (not at registration), so caching the agent is safe and consistent with how the hasMemory branch already attaches the hook. Drops the create_agent() per-invoke pattern, which had no conversation persistence. - Document the LRU cache cap in both Strands and OpenAI Agents templates: callers of an evicted session start fresh on the next invoke. Comment also points to durable session stores for production use. Snapshot regenerated.
| content = types.Content(role="user", parts=[types.Part(text=query)]) | ||
| session, runner = await setup_session_and_runner(user_id, session_id) | ||
| runner = get_or_create_runner() | ||
| session = await get_or_create_session(user_id, session_id) |
There was a problem hiding this comment.
Cross-user conversation access via client-supplied user_id.
user_id here flows from the entrypoint, where it is read from payload (caller-controlled): user_id = payload.get("user_id", "default_user"). Before this change that was harmless — every invocation built a fresh InMemorySessionService — but the new module-level _session_service now persists history keyed by (app_name, user_id, session_id) across invocations.
That means any caller can now set user_id to another user's identifier and, with a known/guessable session_id, attach to that user's stored conversation: reading prior messages on the next turn and appending new ones. The Strands template avoids this by sourcing user_id from context (getattr(context, 'user_id', 'default-user')); this template should do the same — read user_id from context, not payload, before it is used to scope persistent session state.
| session_service = InMemorySessionService() | ||
| session = await session_service.create_session( | ||
| # Module-level session service and runner (preserves history across invocations) | ||
| _session_service = InMemorySessionService() |
There was a problem hiding this comment.
Unbounded in-memory session growth (DoS / OOM in long-lived containers).
InMemorySessionService is a plain in-memory store with no built-in eviction. With this PR it is now process-lifetime scoped, and every distinct (user_id, session_id) invocation creates a new entry that is never freed. An attacker (or a noisy client) can rotate session_ids and grow the process's heap until it OOMs — the runtime container typically has a fixed memory budget, so this is a denial-of-service vector.
The sibling Strands and OpenAIAgents templates in this same PR explicitly cap their session caches with @lru_cache(maxsize=128) and call out LRU eviction in a comment. The googleadk template (and the langchain template, with InMemorySaver()) should apply the same mitigation, or call out durable-store guidance just as prominently. As written, these two templates are noticeably less safe to ship into production than the other two.
| tools = [add_numbers] | ||
|
|
||
| # Module-level checkpointer preserves conversation history across invocations | ||
| _checkpointer = InMemorySaver() |
There was a problem hiding this comment.
Unbounded in-memory checkpoint growth (DoS / OOM in long-lived containers).
InMemorySaver is a process-lifetime, unbounded dict-style store: every distinct thread_id (which this PR derives from context.session_id) creates checkpoint entries that are never evicted. In a long-running runtime container, a client that rotates session_id (or any caller producing many sessions over time) can grow the heap until the process OOMs — a denial-of-service vector against the runtime.
The sibling Strands and OpenAIAgents templates in this same PR explicitly cap their session caches with @lru_cache(maxsize=128) and document the trade-off in a comment. This template should either bound _checkpointer similarly (or wrap session→checkpointer mapping in an LRU), or include the same prominent guidance pointing users at a durable backend (e.g. SqliteSaver/PostgresSaver) before deploying. As written, this template is noticeably less safe than its peers in this PR.
|
Claude Security Review: no high-confidence findings. (run) |
Description
All four HTTP agent templates (Strands, OpenAI Agents, Google ADK, LangGraph) were stateless per-invocation — each call started with zero conversation context, so multi-turn recall within a session didn't work. Strands additionally had a session-bleed risk because all invocations sharing a Python process used a single global
_agentwhoseAgent.messagesaccumulated across calls.Note on Strands and Runtime session isolation
AgentCore Runtime isolates each
session_idto its own dedicated microVM (terminated and memory-sanitized at session end), so the Strands cross-session bleed cannot manifest in a deployed AgentCore Runtime agent. The bleed is real, however, in:agentcore dev(one local process serving multiple sessions — the exact reproduction in Strands HTTP template leaks conversation history across sessions and users #809)Keying the agent by
session_idmakes the template correct independent of the runtime's isolation guarantee, costs the same memory profile as the previous global, and aligns Strands with the per-session pattern the other three templates use.Per-template fixes
_agentwith@lru_cache(maxsize=128) get_or_create_agent(session_id). StrandsAgentaccumulates messages internally acrossstream_async()calls, so reusing the per-session instance gives natural multi-turn memory and isolates sessions. Bound at 128 entries to cap process memory; evicted sessions silently start fresh on their next invoke (documented in a code comment, suggesting durable session stores for production). Applied to both the no-memory/no-config-bundle branch and the no-memory/config-bundle branch —hasMemoryalready keys per(session_id, user_id). TheConfigBundleHookcallbacks re-read bundle state at invocation time, so caching the agent is safe and consistent with howhasMemoryalready attaches the hook.@lru_cache(maxsize=128) get_session(session_id)returningSQLiteSession, threaded viasession=onRunner.run(). Auto-loads/saves prior history in-process. Same 128-session cap and documentation comment as Strands._session_service+_runnerwith aget_or_create_session()helper that callsget_session()thencreate_session()if missing — avoidsAlreadyExistsErroron repeat calls. ADK'sInMemorySessionServiceis itself a single instance retained across invocations, so no per-session cache cap is needed._checkpointer = InMemorySaver()passed tocreate_react_agent().thread_idmapped tocontext.session_idviaconfigurableongraph.ainvoke().InMemorySaverretains all checkpoints in process memory.All conversation state is in-memory (best-effort) — persists across invocations within the runtime process and resets on cold starts. This is the correct default for starter templates; users can swap to durable backends (Strands
FileSessionManager, OpenAISQLiteSessionwith file path, ADKDatabaseSessionService, LangGraph persistent checkpointers) when they need durability.Note: #810 (missing system prompts on OpenAI Agents and LangGraph) was already addressed by other PRs that landed on
mainafter this PR was opened — both templates now shipINSTRUCTIONS/DEFAULT_SYSTEM_PROMPTconstants. This PR's scope is therefore #808 + #809.Related Issue
Closes #808
Closes #809
Type of Change
Testing
npm run typechecknpm run lintnpm run test:unit(275 test files passing)npm run test:update-snapshotsEnd-to-end deploy + invoke testing (from earlier iteration of this branch):
Checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.