Feat: split agent_context into submodules and add offload/reload for oversized step content by liudfgoo · Pull Request #3243 · ModelEngine-Group/nexent

liudfgoo · 2026-06-16T06:49:35Z

Split the monolithic agent_context.py (1,409 lines) into the agent_context/ package, and implement an Offload/Reload
mechanism: when compression summarizes old steps, oversized rendered content is archived to an in-memory store and replaced with an [[OFFLOAD]] marker. The agent can retrieve archived content on demand via the reload_original_context_messages tool.

Changes (37 files)

`agent_context/` package (10 new modules)

Module	Responsibility
`manager.py`	ContextManager orchestration: compression flow, cache management, token estimation
`step_renderer.py`	Step rendering with per-step offload trigger + `compress_history_offline`
`offload_store.py`	In-memory store: UUID handles, FIFO eviction, reload inventory
`budget.py`	Pair/action extraction, fingerprints, cache validation, token-budget trimming
`llm_summary.py`	LLM summary calls with structured JSON parsing
`previous_compression.py`	Previous-run compression (incremental / fresh / fallback)
`current_compression.py`	Current-run compression (incremental / fresh / fallback)
`stats_export.py`	Compression statistics: call count, cache hits, token counts
`summary_step.py`	SummaryTaskStep type

Offload configuration (`summary_config.py`)

enable_reload = False              # Create OffloadStore + inject reload tool
per_step_render_limit = 0          # Trigger threshold (0 = disabled); suggest 3000–10000
max_offload_entries = 200          # Max entry count (FIFO eviction)
max_offload_entry_chars = 30000    # Per-entry character cap
max_offload_total_chars = 2_000_000  # Total character budget (FIFO eviction)

Offload requires both enable_reload=True and per_step_render_limit > 0. All parameters have English docstrings
explaining usage and the boundary with max_observation_length (irreversible per-step sanitation at source vs. reversible
archiving during compression).

Core logic changes

core_agent.py: Save _raw_observation (pre-truncation) for offload; restore max_observation_length truncation
nexent_agent.py: Build ContextManager before agent so the reload tool enters tool_list (sandboxed); fix duplicate
ContextManager creation
run_agent.py: Sync reload tool's offload_store when an external context_manager is injected

Other

tools/reload_original_context_tool.py: Reload tool implementation
utils/code_analysis.py: Tool-call signature extraction
Removed old monolithic agent_context.py

Testing

227 unit tests (14 test files) — all passing

Out of Scope

backend/ — no changes (Context Components system kept on parent branch feat/benchmark-on-refactor)
temp_scripts/ — debug scripts preserved on feat/opt-agent-context-temp-scripts

How to Enable Offload

from nexent.core.agents import ContextManagerConfig

config = ContextManagerConfig(
    enabled=True,
    enable_reload=True,
    per_step_render_limit=5000,     # offload when rendered segment > 5000 chars
    max_observation_length=2000,    # preserve raw observation before truncation
)

Purpose: Consolidate ALL context building related code into a single ContextManager class, making it the single source of truth for system prompt assembly, component injection, memory management, and token-aware compression. This refactoring introduces a component-based architecture where system prompt parts (tools, skills, memory, knowledge base, agent definitions) are assembled from registered ContextComponent instances using pluggable selection strategies. Changes Summary: - Added ContextComponent hierarchy with 7 subclasses - Added ContextStrategy implementations with 4 algorithms - Extended ContextManagerConfig with strategy and injection flags - Added component management methods to ContextManager - Integrated component registration in NexentAgent - Modified CoreAgent to use component-based system prompt assembly - Created backend helper for component building - Updated tests with comprehensive coverage (260 tests) File Changes: sdk/nexent/core/agents/agent_model.py: - Added ContextComponent abstract base class with to_messages() pattern - Added 7 component subclasses: SystemPromptComponent, ToolsComponent, SkillsComponent, MemoryComponent, KnowledgeBaseComponent, ManagedAgentsComponent, ExternalAgentsComponent - Added ContextStrategy abstract class with 4 implementations: FullStrategy, TokenBudgetStrategy, BufferedStrategy, PriorityWeightedStrategy - Added context_components field to AgentConfig sdk/nexent/core/agents/summary_config.py: - Added StrategyType literal for strategy selection - Extended ContextManagerConfig with strategy field - Added inject_* flags for each component type (7 flags) - Added component_budgets dict for per-component token limits - Added buffer_size_per_component for buffered strategy sdk/nexent/core/agents/agent_context.py: - Added _components registry in __init__ - Added register_component() for component accumulation - Added clear_components() and get_registered_components() - Added _get_strategy() for strategy selection - Added build_system_prompt() for component-based assembly - Added _calculate_component_budget() for budget allocation - Added _message_already_present() for deduplication sdk/nexent/core/agents/__init__.py: - Exported all new ContextComponent subclasses - Exported all ContextStrategy implementations - Exported StrategyType sdk/nexent/core/agents/nexent_agent.py: - Added component registration after ContextManager mount - Iterates context_components from AgentConfig and registers each sdk/nexent/core/agents/core_agent.py: - Modified SystemPromptStep creation to use component-based assembly - Added fallback logic: components -> original system_prompt backend/utils/context_utils.py (NEW): - Added build_context_components() main function - Added build_*_component() helpers for each type - Added _format_*_description() helpers for text formatting - Added build_app_context_string() for app metadata backend/agents/create_agent_info.py: - Imported build_context_components from context_utils - Called build_context_components() with agent configuration - Passed context_components to AgentConfig test/sdk/core/agents/test_context_component.py (NEW): - Added 66 tests for ContextComponent hierarchy - Tests for each subclass creation, validation, to_messages() - Tests for ContextStrategy implementations test/sdk/core/agents/test_agent_context/unit/test_component_management.py (NEW): - Added 29 tests for ContextManager component methods - Tests for register, clear, get_registered, build_system_prompt - Tests for strategy selection and budget calculation test/sdk/core/agents/test_nexent_agent_component_integration.py (NEW): - Added 9 integration tests for NexentAgent and CoreAgent - Tests for component registration flow - Tests for system prompt assembly with fallback - Tests for backward compatibility test/backend/utils/test_context_utils.py (NEW): - Added 27 tests for context_utils helper functions - Tests for format helpers and build helpers - Tests for build_context_components main function test/sdk/core/agents/test_agent_context/loader.py: - Added _load_agent_model() for loading agent_model.py - Exported all ContextComponent and ContextStrategy classes test/sdk/core/agents/test_agent_context/stubs.py: - Fixed register_smolagents_mocks() to always overwrite sys.modules test/backend/agents/test_create_agent_info.py: - Added _create_stub_component_class() helper - Added component class stubs to agent_model mock setup test/common/test_mocks.py: - Removed unnecessary nexent.core.agents.agent_model mocking

Without the terminator, psql parses the CREATE INDEX and the following CREATE TABLE as a single statement and aborts with a syntax error at line 415, leaving 28 tables uncreated (including ag_prompt_template_t and user_tenant_t). This breaks every backend service on a fresh DB. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…nents build_context_components() now accepts an optional pre-rendered ``system_prompt``. When provided it returns a single SystemPromptComponent carrying the Jinja2 output verbatim so ContextManager.build_system_prompt() emits a system prompt byte-identical to what CoreAgent saw before the context-management refactor. When omitted the function keeps its piece-wise behaviour for future incremental componentization. create_agent_info.py forwards the rendered string, completing the behavior-preserving migration: ContextManager is now the single assembly point without changing what the agent actually sees. Splitting the rendered prompt into real semantic components (Tools / Skills / Memory / KB) is deferred to a follow-up. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Two golden tests guard the behavior-preserving migration of system prompt assembly into ContextManager: * test_system_prompt_component_roundtrip - wraps a Jinja2-rendered prompt in a single SystemPromptComponent, registers it, runs build_system_prompt() through ContextManager, joins role=system messages and asserts byte-identical content. Verifies the component machinery (register / strategy / dedup / role filter / join) is loss-less. Parametrised over language (zh/en) and is_manager flag. * test_full_build_context_components_matches_jinja2 - mirrors what backend/agents/create_agent_info.py does in production: render Jinja2 then hand the rendered string to build_context_components(system_prompt=...). Asserts the full end-to-end output stays byte-identical to the Jinja2 baseline. If this turns red the migration path itself has regressed. Both run only on branches that have backend.utils.context_utils and the new ContextComponent classes; the module-level skip keeps it inert on older branches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

TokenBudgetStrategy / BufferedStrategy / PriorityWeightedStrategy used to silently drop components that did not fit their selection criteria, which made truncated system prompts impossible to diagnose at runtime. Each strategy now emits a logger.warning identifying the component type, priority, and which constraint tripped (total_budget vs type_budget vs buffer overflow vs relevance threshold vs per-component token fit). Behaviour for callers is unchanged - the selection results are identical; only observability improves. FullStrategy is unaffected since it cannot drop. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ModelConfig.extra_body / OpenAIModel(extra_body=...) / NexentAgent. create_model() forwarding is reintroduced so callers can supply provider-specific knobs to every chat.completions.create payload — most commonly Qwen3's chat_template_kwargs={"enable_thinking": false} to suppress reasoning preludes. Production safety: the field defaults to None on ModelConfig and the runtime guard "if self.extra_body" gates the kwargs injection, so any caller that does not opt in sees byte-identical request payloads. Originally added on feature/agent-context-improvement-eval to support the benchmark's thinking-off flag; missing on refactor/context-management caused the flag to be silently dropped (Pydantic v2 default extra='ignore' absorbed the kwarg, leaving </think> residues in benchmark model output). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds ContextManagerConfig.max_observation_length (chars; 0 = disabled, the production default) and a corresponding head+tail truncation in CoreAgent.execute_action that fires only when ContextManager is enabled AND the observation exceeds the configured limit. A short marker tells the agent the elision is recoverable via search/read tools. Originally added on feature/agent-context-improvement-eval as commit c992b48 "truncate model/tool output" by liudongfei. That commit also accidentally enabled `import pdb; pdb.set_trace()` at line 294; this back-port carries only the intended truncation logic. Production safety: - New field defaults to 0 (disabled). Existing callers see no behaviour change unless they explicitly set it. - Guarded by self.context_manager.config.enabled, so even setting the field has no effect when ContextManager is off. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Back-ports three small additions from feature/agent-context-improvement-eval that benchmarks depend on but were not part of the refactor: * incremental_summary_system_prompt (commit 9aecd0f on feature): Dedicated system prompt for incremental summary updates ("previous summary + new turns -> updated summary"). compress_*_with_cache's incremental paths now select this prompt via prompt_type='incremental'. Falls back to summary_system_prompt when the field is empty so callers that have not customised it still see a sensible default. * get_token_counts() (commit 824e737): Returns {last_uncompressed, last_compressed} from the most recent compress_if_needed pass. Required for benchmark token_reduction = 1 - last_compressed/last_uncompressed accounting. Recording sites added at all three return paths: under-threshold short-circuit, stable_bypass, and deep-compression finalisation. * export_summary() (commit 824e737): Returns cached summary texts + compression_boundary metadata (covered_pairs / end_steps / retained config). Benchmarks use the boundary to validate probe design - probes should only target content actually inside the compressed section. Production safety: * _generate_summary / _do_generate_summary gain prompt_type with default "initial". Existing callers that do not pass it keep the unchanged fresh-compaction prompt. * incremental prompt is only activated at the two call sites already labelled "*_incremental" in compress_previous_with_cache / compress_current_with_cache - other production callers are untouched. * Token-count instance vars default to None until the first compress_if_needed call, matching the feature-branch behaviour. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Lifts the benchmark + ctx_debugger source trees verbatim from feature/agent-context-improvement-eval. The benchmark is independent from production agent runtime and exists purely as an evaluation / diagnostic harness, so it lives on this integration branch (off refactor/context-management + feat/sysprompt-component-bringback) without changing any SDK or backend code. Subsequent commits on this branch will adapt the benchmark to the refactor's new ContextManager API surface (removing references to methods/fields that only existed on feature, e.g. export_summary, get_token_counts, OffloadStore, ModelConfig.extra_body, incremental_summary_system_prompt, max_observation_length, enable_reload). All such adaptations stay within sdk/benchmark/ and sdk/ctx_debugger/. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

smoke.py runs one trivial question through agent_runner.build_agent_run_info on the refactor SDK. Smoke passes today: imports clean, agent loop completes one step, final_answer is non-empty and correct. Two supporting tweaks live in this commit since the smoke does not run without them: - paths.py: accept .git as either a directory or a file. Git worktrees store .git as a pointer file at the repo root; the original isdir check failed to locate the project root from inside a worktree. - .gitignore: keep sdk/benchmark/.env out of version control (LLM credentials should remain local). Known limitations exposed by this smoke (work for follow-up commits): - LLM_ENABLE_THINKING / LLM_EXTRA_BODY are silently dropped because refactor's ModelConfig has no extra_body field (Pydantic v2 default is extra='ignore'). Visible as residual </think> tokens in model output. Will be fixed benchmark-side by subclassing OpenAIModel. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

test_benchmark.py:501 was r.get["baseline_failed"] (typo, would raise TypeError at runtime). Should be r.get("baseline_failed") to mirror the correct usage on line 502. The fix already exists on feature/agent-context-improvement-eval HEAD, but the import commit (3e42ff3) picked up an older snapshot of manual_cases/test_benchmark.py that pre-dates the fix. Spot-fixing here; a wider refresh of the benchmark dir against feature HEAD can be a follow-up if/when it is worth it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

P3/C4 incremental_llm_none_falls_through_to_fresh in test_compress_with_cache_extra.py used side_effect signatures that only accepted (text, model_, call_type=...). After da9d58a added prompt_type="incremental" at the two incremental call sites, these mocks raised TypeError. Extend the signatures with prompt_type="initial" so they tolerate either call shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Two test modules previously installed heavy mock module trees into sys.modules at import time and never rolled them back: - test/sdk/core/agents/test_agent_context/loader.py — registers mock smolagents (memory/models/agents) to sandbox-load agent_context.py via importlib. - test/sdk/core/agents/test_context_component.py — registers ~25 mocks (smolagents, rich, jinja2, langchain_core, exa_py, openai, mem0, paramiko, boto3, tiktoken, aiohttp, botocore, ...) to sandbox-load agent_model.py. When pytest collected these in the same session as test/backend/utils/test_context_utils.py, the latter resolved its "from nexent.core.agents.agent_model import ..." chain through the bare mock ModuleType entries left in sys.modules and failed with ImportError("unknown location") on AgentMemory, Console, etc. Fix: after each sandbox finishes loading the module it needs (so its target captures the mock classes as module-level attributes), restore real packages back into sys.modules via importlib.import_module. The already-loaded test target keeps its mock references; sibling test trees see real packages. Cross-validated: pytest test/sdk/core/agents/test_agent_context/ test/sdk/core/agents/test_context_component.py test/sdk/core/agents/test_nexent_agent_component_integration.py test/backend/utils/test_context_utils.py → 221 passed (was 206 passed, 15 failed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…riants Goal 3: Replace a330d81 short-circuit with semantic component architecture - Expand _format_* functions with full Jinja2 long text (memory guidelines, skill process, tool file guide, agent calling specs) for zh/en variants - Add build_skeleton_*_component() for header/duty/execution_flow/constraint/ code_norms/footer sections - Rewrite build_context_components() to accept raw params and emit piecewise + skeleton components in correct order (priority-encoded 100→10) - Remove 'if system_prompt: return [single]' short-circuit from build_context_components - Modify create_agent_info.py to pass raw params instead of Jinja2-rendered string - Rewrite test_prompt_equivalence.py as semantic assertions (section presence, order, memory ordering, key content) - 12 tests pass - Update test_context_utils.py for new piecewise behavior - 105 tests pass Verified: task_success_retention=1.0 for example_infra benchmark case

Back-port of feature/agent-context-improvement-eval's compress_history_offline, required by benchmark's static compression inspector (sdk/benchmark/.../ summary_inspector.py). Same prompts and schema as the in-agent compression path, but operates on plain (user, assistant) text pairs without any ContextManager state — no cache, no offload store, no agent runtime. Also brings back two module-level helpers it depends on: - format_summary_output: strips markdown fences, attempts JSON parse, falls back to plain text. Logic identical to ContextManager._format_summary but reusable from module scope. - _is_context_length_error: matches known context/token-limit error strings. Logic identical to ContextManager._is_context_length_error. Production safety: this is purely additive — three new module-level symbols (compress_history_offline, format_summary_output, _is_context_length_error). No existing ContextManager method body is touched. backend/ never imports or calls any of these. The benchmark inspector is the only consumer. Verified: pytest test/sdk/core/agents/test_agent_context/ \ test/sdk/core/agents/test_context_component.py \ test/sdk/core/agents/test_nexent_agent_component_integration.py \ test/backend/utils/test_context_utils.py → 221 passed (unchanged). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Mirrors feature/agent-context-improvement-eval commit a7d472e, adapted for refactor: the field is added to ModelConfig and forwarded through NexentAgent.create_model into OpenAIModel, but defaulted to None so that production behaviour is unchanged. Rationale: benchmarks (eventqa, longmemeval) need a per-call completion output cap to bound pathological generation loops where a model regurgitates context. Feature picked 4096 as the default; backport chooses None instead so production keeps the provider's own default (typically the model's max output). Benchmarks that want the cap set it explicitly via ModelConfig(..., max_tokens=4096). - agent_model.py: ModelConfig.max_tokens: Optional[int] = None - nexent_agent.py: create_model() forwards model_config.max_tokens - openai_llm.py: OpenAIModel(max_tokens=None) stored on instance; injected into completion_kwargs only when set AND when caller did not pass an explicit override via **kwargs Verified: pytest test/sdk/core/agents/test_agent_context/ \ test/sdk/core/agents/test_context_component.py \ test/sdk/core/agents/test_nexent_agent_component_integration.py \ test/backend/utils/test_context_utils.py → 221 passed (unchanged). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ctx_debugger Translate all documentation files from Chinese to English to comply with CONTRIBUTING.md code standards: - sdk/benchmark/README.md - sdk/benchmark/manual_cases/README.md - sdk/benchmark/manual_cases/note_benchmark.md - sdk/benchmark/eventqa_eval/README.md - sdk/benchmark/eventqa_eval/RUNBOOK.md - sdk/benchmark/longmemeval_eval/README.md - sdk/ctx_debugger/README.md - sdk/ctx_debugger/langfuse_eval_assessment.md These translations ensure the project documentation is accessible to international contributors and follows the English-only convention established in CONTRIBUTING.md.

Sync sdk/benchmark/agent_runner.py and sdk/benchmark/eventqa_eval/ run_eventqa.py from feature/agent-context-improvement-eval HEAD to freeze the version this branch will validate against. Per the user's instruction, the eventqa stack will not be updated again after this. agent_runner.py changes: - Comments translated zh -> en (commit 657d0b8 on feature) - extra_body handling upgraded: now reads LLM_EXTRA_BODY (raw JSON) and LLM_ENABLE_THINKING (truthy flag) env vars instead of the hardcoded THINKING_OFF_EXTRA_BODY dict. Both vendor dialects (Qwen3 chat_template_kwargs and Anthropic thinking.type) coexist in one payload so the same runner works against either backend. - max_tokens parameter forwarded to OpenAIModel (default 4096 at benchmark layer; SDK back-port keeps default=None so production is unaffected — see faf74dc). - AgentRunResult.total_input_tokens / total_output_tokens fields added, step callback accumulates them. Needed by run_eventqa.py. - sys.path setup simplified (drop redundant dirname layer). run_eventqa.py changes: - ingest_main_input_tokens / ingest_main_output_tokens accumulated across ingest chunks; surfaced in summary.json under "cost". - run_probes() turned into bounded-concurrent (asyncio.gather + Semaphore sized by --probe_concurrency). Signature changed from list[dict] to tuple[list[dict], dict] — the second element is the token totals. - Probes forward args.probe_max_tokens to build_agent_run_info. Verification: 1 book x 3 questions x 100k ingest x baseline_context=100k baseline_acc = 1.000 compressed acc = 0.667 retention = 0.667 token_reduction = 0.359 ratio (compressed_total / baseline_total) = 2.076 Compression triggered (token_reduction>0) and the agent answered with compressed context, confirming the full eventqa_eval pipeline works on refactor end-to-end. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…hemas from feature

Sync test_benchmark.py + calibrate_thresholds.py + the updated example_infra/case.json from feature/agent-context-improvement-eval HEAD. The 22 additional case dirs that feature added are NOT pulled — they're specific evaluation datasets the user keeps under their own bookkeeping; this branch only validates the harness with the original example_infra case. README.md / note_benchmark.md kept at worktree (English) versions. test_benchmark.py changes: - Replace build_agent_run_info_with_custom_prompt with build_agent_run_info (current SDK surface). - Introduce BENCHMARK_SYSTEM_PROMPT — a lean, generic system prompt that strips the verbose platform scaffolding (file URL guide, reference marks, safety principles) to minimize token overhead during benchmark runs while keeping the core execution-loop instructions. - Introduce BENCHMARK_SUMMARY_SYSTEM_PROMPT + custom 6-field summary schema (~620 word budget) for incremental compression. Replaces the default 10-field Hermes schema by merging completed_work + resolved_questions into "progress" and restricting key_facts to values NOT already stated in progress. Eliminates the 3-field redundancy that caused output bloat in incremental updates and held output size stable instead of growing past token_threshold. - Add net_token_reduction / compression_cost_tokens reporting in the summary so downstream analysis can subtract compression LLM cost from the input savings. calibrate_thresholds.py: new tool for tuning token_threshold per case. example_infra/case.json: 2 new probes added (generate_env.sh sed usage; ES port-mapping cross-container access), bringing the probe count to a more useful smoke level. Verification on refactor: test_benchmark.py --cases example_infra task_success_retention = 1.0 probe_retention = 0.722 token_reduction = 0.635 net_token_reduction = 0.408 compression_cost_tokens = 8341 summary_score = 0.5 Pipeline works end-to-end with the synced harness and expanded probes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…None Match the SDK back-port faf74dc (ModelConfig.max_tokens default=None for production safety). The benchmark harness previously inherited feature's 4096 default, but every concrete benchmark caller already passes max_tokens explicitly (e.g. run_eventqa.py forwards args.probe_max_tokens=4096 into probes), so flipping the default to None has no behaviour change for current callers while making the abstraction symmetric with the SDK: "default unbounded; opt in to a cap". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

# Conflicts: # .gitignore # backend/agents/create_agent_info.py # docker/init.sql # sdk/nexent/core/agents/agent_model.py # sdk/nexent/core/agents/nexent_agent.py # sdk/nexent/core/models/openai_llm.py

…level

- Revert backend/agents/create_agent_info.py to develop (Context Components system belongs to parent branch feat/benchmark-on-refactor, not offload PR) - Revert backend/utils/context_utils.py to develop (same reason) - Revert test/backend/utils/test_context_utils.py to develop - Exclude sdk/nexent/core/agents/temp_scripts/ via .gitignore - Debug scripts preserved on feat/opt-agent-context-temp-scripts Co-Authored-By: Claude <noreply@anthropic.com>

…ext/ package) The old single-file agent_context.py has been fully replaced by the agent_context/ package directory. Verified that unit tests use the package via loader.py importlib isolation — the old .py file is never referenced at runtime. Co-Authored-By: Claude <noreply@anthropic.com>

- Trim verbose block comments and docstrings (~57% reduction) - Eliminate cross-reference redundancy between per_step_render_limit and max_observation_length - Clarify boundary: max_observation_length = irreversible sanitation at source; per_step_render_limit = reversible archiving during compression, only for old steps outside keep_recent window Co-Authored-By: Claude <noreply@anthropic.com>

- Keep both ephemeral_messages methods (our branch) and verification methods (upstream) - Keep both raw observation save for offload (our branch) and verification check (upstream) Co-Authored-By: Claude <noreply@anthropic.com>

JasonW404 · 2026-06-24T04:03:25Z

 from .memory import *
 from .storage import *
 from .vector_database import *
+from .container import *


Wildcard import 风险：from .container import * 可能引入意外的符号，导致命名冲突。建议显式导入需要的类/函数，例如 from .container import ContainerClient, ContainerConfig。

JasonW404 · 2026-06-24T04:03:28Z


-.claude/skills/python-import-triage
+.claude/skills/python-import-triage
+


.gitignore 中添加了 sdk/nexent/core/agents/temp_scripts/，说明有调试脚本被提交到分支。即使被 gitignore，这些文件也不应存在于 PR 分支中。请在合并前清理。

YehongPan · 2026-06-24T05:11:19Z

 from .memory import *
 from .storage import *
 from .vector_database import *
+from .container import *


[代码规范] import * 会污染命名空间，建议显式导入需要的模块/类。

YehongPan · 2026-06-24T05:11:20Z

+        """
+        try:
+            return self._do_generate_summary(text, model, call_type, prompt_type)
+        except Exception as e:


[代码规范] except Exception: 过于宽泛，建议捕获更具体的异常类型，避免掩盖潜在错误。

YehongPan · 2026-06-24T05:11:23Z

+                )
+                try:
+                    return self._do_generate_summary(shrunk, model, call_type + "_retry", prompt_type)
+                except Exception as e2:


[代码规范] except Exception: 过于宽泛，建议捕获更具体的异常类型，避免掩盖潜在错误。

WMC001 · 2026-06-24T07:20:53Z

Context splitting is a significant refactor (37 files, +5276/-3003). The change from a single context window to a segmented architecture deserves thorough review. Please ensure backward compatibility for existing sessions and that the default chunk sizes are well-tested.

Jason and others added 30 commits May 20, 2026 16:54

sync(benchmark): pick run_longmemeval + run_with_debugger+ summary_sc…

9e65f2c

…hemas from feature

update system and summary prompt

f199c7a

readme and eval res

776f4fd

add fallback and build_compressed_snapshot

1c041a2

add acon output res

731503b

add default summary prompt argument

46dd020

add cases

47f6b91

add reports results

7cb5fa5

add static summary inspections

7b8cf05

Merge branch 'develop' into feat/benchmark-on-refactor

dff5114

# Conflicts: # .gitignore # backend/agents/create_agent_info.py # docker/init.sql # sdk/nexent/core/agents/agent_model.py # sdk/nexent/core/agents/nexent_agent.py # sdk/nexent/core/models/openai_llm.py

liudfgoo and others added 22 commits June 4, 2026 10:51

revert to original summary

b7ff90f

mount offloadstore on the context manager, exist at the same session …

a5729fe

…level

update: score and filter

4252520

construct handle and desc for offloadstore for large obs or model_output

cae5f9d

build _ephemeral_system_messages bearing offloaded content

6cf6308

Ensure the injection of ephemeral messages across run scenarios.

563e590

remove effective_* summary for offload

27855f7

Add reload module mounting to achieve session-level reuse.

23c44e0

Deduplicate to avoid repeated offloading of large outputs

0eb6f31

update docstring

d515148

update test files for agent_context

bad9b32

update test_cm_writing

bdc3e43

delete irrelevant files

fb3d544

offload test results

b38b13b

delete .claude

ab8d8b0

remove outdated doc

aae8be2

update note for feat/opt-agent-context-refactor

d22a6c8

solving conflict

63be55d

Resolve some merge conflict files.

e037c0e

liudfgoo requested review from Dallas98 and WMC001 as code owners June 16, 2026 06:49

JasonW404 reviewed Jun 24, 2026

View reviewed changes

YehongPan reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: split agent_context into submodules and add offload/reload for oversized step content#3243

Feat: split agent_context into submodules and add offload/reload for oversized step content#3243
liudfgoo wants to merge 88 commits into
ModelEngine-Group:developfrom
liudfgoo:feat/opt-agent-context-refactor

liudfgoo commented Jun 16, 2026

Uh oh!

JasonW404 Jun 24, 2026

Uh oh!

JasonW404 Jun 24, 2026

Uh oh!

YehongPan Jun 24, 2026

Uh oh!

YehongPan Jun 24, 2026

Uh oh!

YehongPan Jun 24, 2026

Uh oh!

WMC001 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		.claude/skills/python-import-triage No newline at end of file
		.claude/skills/python-import-triage

Conversation

liudfgoo commented Jun 16, 2026

Changes (37 files)

agent_context/ package (10 new modules)

Offload configuration (summary_config.py)

Core logic changes

Other

Testing

Out of Scope

How to Enable Offload

Uh oh!

JasonW404 Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

JasonW404 Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

YehongPan Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

YehongPan Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

YehongPan Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

WMC001 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

`agent_context/` package (10 new modules)

Offload configuration (`summary_config.py`)