Feat: split agent_context into submodules and add offload/reload for oversized step content#3243
Feat: split agent_context into submodules and add offload/reload for oversized step content#3243liudfgoo wants to merge 88 commits into
Conversation
Purpose:
Consolidate ALL context building related code into a single ContextManager
class, making it the single source of truth for system prompt assembly,
component injection, memory management, and token-aware compression.
This refactoring introduces a component-based architecture where system
prompt parts (tools, skills, memory, knowledge base, agent definitions)
are assembled from registered ContextComponent instances using pluggable
selection strategies.
Changes Summary:
- Added ContextComponent hierarchy with 7 subclasses
- Added ContextStrategy implementations with 4 algorithms
- Extended ContextManagerConfig with strategy and injection flags
- Added component management methods to ContextManager
- Integrated component registration in NexentAgent
- Modified CoreAgent to use component-based system prompt assembly
- Created backend helper for component building
- Updated tests with comprehensive coverage (260 tests)
File Changes:
sdk/nexent/core/agents/agent_model.py:
- Added ContextComponent abstract base class with to_messages() pattern
- Added 7 component subclasses: SystemPromptComponent, ToolsComponent,
SkillsComponent, MemoryComponent, KnowledgeBaseComponent,
ManagedAgentsComponent, ExternalAgentsComponent
- Added ContextStrategy abstract class with 4 implementations:
FullStrategy, TokenBudgetStrategy, BufferedStrategy, PriorityWeightedStrategy
- Added context_components field to AgentConfig
sdk/nexent/core/agents/summary_config.py:
- Added StrategyType literal for strategy selection
- Extended ContextManagerConfig with strategy field
- Added inject_* flags for each component type (7 flags)
- Added component_budgets dict for per-component token limits
- Added buffer_size_per_component for buffered strategy
sdk/nexent/core/agents/agent_context.py:
- Added _components registry in __init__
- Added register_component() for component accumulation
- Added clear_components() and get_registered_components()
- Added _get_strategy() for strategy selection
- Added build_system_prompt() for component-based assembly
- Added _calculate_component_budget() for budget allocation
- Added _message_already_present() for deduplication
sdk/nexent/core/agents/__init__.py:
- Exported all new ContextComponent subclasses
- Exported all ContextStrategy implementations
- Exported StrategyType
sdk/nexent/core/agents/nexent_agent.py:
- Added component registration after ContextManager mount
- Iterates context_components from AgentConfig and registers each
sdk/nexent/core/agents/core_agent.py:
- Modified SystemPromptStep creation to use component-based assembly
- Added fallback logic: components -> original system_prompt
backend/utils/context_utils.py (NEW):
- Added build_context_components() main function
- Added build_*_component() helpers for each type
- Added _format_*_description() helpers for text formatting
- Added build_app_context_string() for app metadata
backend/agents/create_agent_info.py:
- Imported build_context_components from context_utils
- Called build_context_components() with agent configuration
- Passed context_components to AgentConfig
test/sdk/core/agents/test_context_component.py (NEW):
- Added 66 tests for ContextComponent hierarchy
- Tests for each subclass creation, validation, to_messages()
- Tests for ContextStrategy implementations
test/sdk/core/agents/test_agent_context/unit/test_component_management.py (NEW):
- Added 29 tests for ContextManager component methods
- Tests for register, clear, get_registered, build_system_prompt
- Tests for strategy selection and budget calculation
test/sdk/core/agents/test_nexent_agent_component_integration.py (NEW):
- Added 9 integration tests for NexentAgent and CoreAgent
- Tests for component registration flow
- Tests for system prompt assembly with fallback
- Tests for backward compatibility
test/backend/utils/test_context_utils.py (NEW):
- Added 27 tests for context_utils helper functions
- Tests for format helpers and build helpers
- Tests for build_context_components main function
test/sdk/core/agents/test_agent_context/loader.py:
- Added _load_agent_model() for loading agent_model.py
- Exported all ContextComponent and ContextStrategy classes
test/sdk/core/agents/test_agent_context/stubs.py:
- Fixed register_smolagents_mocks() to always overwrite sys.modules
test/backend/agents/test_create_agent_info.py:
- Added _create_stub_component_class() helper
- Added component class stubs to agent_model mock setup
test/common/test_mocks.py:
- Removed unnecessary nexent.core.agents.agent_model mocking
Without the terminator, psql parses the CREATE INDEX and the following CREATE TABLE as a single statement and aborts with a syntax error at line 415, leaving 28 tables uncreated (including ag_prompt_template_t and user_tenant_t). This breaks every backend service on a fresh DB. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nents build_context_components() now accepts an optional pre-rendered ``system_prompt``. When provided it returns a single SystemPromptComponent carrying the Jinja2 output verbatim so ContextManager.build_system_prompt() emits a system prompt byte-identical to what CoreAgent saw before the context-management refactor. When omitted the function keeps its piece-wise behaviour for future incremental componentization. create_agent_info.py forwards the rendered string, completing the behavior-preserving migration: ContextManager is now the single assembly point without changing what the agent actually sees. Splitting the rendered prompt into real semantic components (Tools / Skills / Memory / KB) is deferred to a follow-up. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two golden tests guard the behavior-preserving migration of system prompt assembly into ContextManager: * test_system_prompt_component_roundtrip - wraps a Jinja2-rendered prompt in a single SystemPromptComponent, registers it, runs build_system_prompt() through ContextManager, joins role=system messages and asserts byte-identical content. Verifies the component machinery (register / strategy / dedup / role filter / join) is loss-less. Parametrised over language (zh/en) and is_manager flag. * test_full_build_context_components_matches_jinja2 - mirrors what backend/agents/create_agent_info.py does in production: render Jinja2 then hand the rendered string to build_context_components(system_prompt=...). Asserts the full end-to-end output stays byte-identical to the Jinja2 baseline. If this turns red the migration path itself has regressed. Both run only on branches that have backend.utils.context_utils and the new ContextComponent classes; the module-level skip keeps it inert on older branches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
TokenBudgetStrategy / BufferedStrategy / PriorityWeightedStrategy used to silently drop components that did not fit their selection criteria, which made truncated system prompts impossible to diagnose at runtime. Each strategy now emits a logger.warning identifying the component type, priority, and which constraint tripped (total_budget vs type_budget vs buffer overflow vs relevance threshold vs per-component token fit). Behaviour for callers is unchanged - the selection results are identical; only observability improves. FullStrategy is unaffected since it cannot drop. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ModelConfig.extra_body / OpenAIModel(extra_body=...) / NexentAgent.
create_model() forwarding is reintroduced so callers can supply
provider-specific knobs to every chat.completions.create payload — most
commonly Qwen3's chat_template_kwargs={"enable_thinking": false} to
suppress reasoning preludes.
Production safety: the field defaults to None on ModelConfig and the
runtime guard "if self.extra_body" gates the kwargs injection, so any
caller that does not opt in sees byte-identical request payloads.
Originally added on feature/agent-context-improvement-eval to support
the benchmark's thinking-off flag; missing on refactor/context-management
caused the flag to be silently dropped (Pydantic v2 default extra='ignore'
absorbed the kwarg, leaving </think> residues in benchmark model output).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds ContextManagerConfig.max_observation_length (chars; 0 = disabled, the production default) and a corresponding head+tail truncation in CoreAgent.execute_action that fires only when ContextManager is enabled AND the observation exceeds the configured limit. A short marker tells the agent the elision is recoverable via search/read tools. Originally added on feature/agent-context-improvement-eval as commit c992b48 "truncate model/tool output" by liudongfei. That commit also accidentally enabled `import pdb; pdb.set_trace()` at line 294; this back-port carries only the intended truncation logic. Production safety: - New field defaults to 0 (disabled). Existing callers see no behaviour change unless they explicitly set it. - Guarded by self.context_manager.config.enabled, so even setting the field has no effect when ContextManager is off. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Back-ports three small additions from feature/agent-context-improvement-eval that benchmarks depend on but were not part of the refactor: * incremental_summary_system_prompt (commit 9aecd0f on feature): Dedicated system prompt for incremental summary updates ("previous summary + new turns -> updated summary"). compress_*_with_cache's incremental paths now select this prompt via prompt_type='incremental'. Falls back to summary_system_prompt when the field is empty so callers that have not customised it still see a sensible default. * get_token_counts() (commit 824e737): Returns {last_uncompressed, last_compressed} from the most recent compress_if_needed pass. Required for benchmark token_reduction = 1 - last_compressed/last_uncompressed accounting. Recording sites added at all three return paths: under-threshold short-circuit, stable_bypass, and deep-compression finalisation. * export_summary() (commit 824e737): Returns cached summary texts + compression_boundary metadata (covered_pairs / end_steps / retained config). Benchmarks use the boundary to validate probe design - probes should only target content actually inside the compressed section. Production safety: * _generate_summary / _do_generate_summary gain prompt_type with default "initial". Existing callers that do not pass it keep the unchanged fresh-compaction prompt. * incremental prompt is only activated at the two call sites already labelled "*_incremental" in compress_previous_with_cache / compress_current_with_cache - other production callers are untouched. * Token-count instance vars default to None until the first compress_if_needed call, matching the feature-branch behaviour. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Lifts the benchmark + ctx_debugger source trees verbatim from feature/agent-context-improvement-eval. The benchmark is independent from production agent runtime and exists purely as an evaluation / diagnostic harness, so it lives on this integration branch (off refactor/context-management + feat/sysprompt-component-bringback) without changing any SDK or backend code. Subsequent commits on this branch will adapt the benchmark to the refactor's new ContextManager API surface (removing references to methods/fields that only existed on feature, e.g. export_summary, get_token_counts, OffloadStore, ModelConfig.extra_body, incremental_summary_system_prompt, max_observation_length, enable_reload). All such adaptations stay within sdk/benchmark/ and sdk/ctx_debugger/. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
smoke.py runs one trivial question through agent_runner.build_agent_run_info on the refactor SDK. Smoke passes today: imports clean, agent loop completes one step, final_answer is non-empty and correct. Two supporting tweaks live in this commit since the smoke does not run without them: - paths.py: accept .git as either a directory or a file. Git worktrees store .git as a pointer file at the repo root; the original isdir check failed to locate the project root from inside a worktree. - .gitignore: keep sdk/benchmark/.env out of version control (LLM credentials should remain local). Known limitations exposed by this smoke (work for follow-up commits): - LLM_ENABLE_THINKING / LLM_EXTRA_BODY are silently dropped because refactor's ModelConfig has no extra_body field (Pydantic v2 default is extra='ignore'). Visible as residual </think> tokens in model output. Will be fixed benchmark-side by subclassing OpenAIModel. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
test_benchmark.py:501 was r.get["baseline_failed"] (typo, would raise
TypeError at runtime). Should be r.get("baseline_failed") to mirror
the correct usage on line 502.
The fix already exists on feature/agent-context-improvement-eval HEAD,
but the import commit (3e42ff3) picked up an older snapshot of
manual_cases/test_benchmark.py that pre-dates the fix. Spot-fixing here;
a wider refresh of the benchmark dir against feature HEAD can be a
follow-up if/when it is worth it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
P3/C4 incremental_llm_none_falls_through_to_fresh in test_compress_with_cache_extra.py used side_effect signatures that only accepted (text, model_, call_type=...). After da9d58a added prompt_type="incremental" at the two incremental call sites, these mocks raised TypeError. Extend the signatures with prompt_type="initial" so they tolerate either call shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two test modules previously installed heavy mock module trees into
sys.modules at import time and never rolled them back:
- test/sdk/core/agents/test_agent_context/loader.py — registers mock
smolagents (memory/models/agents) to sandbox-load agent_context.py via
importlib.
- test/sdk/core/agents/test_context_component.py — registers ~25 mocks
(smolagents, rich, jinja2, langchain_core, exa_py, openai, mem0,
paramiko, boto3, tiktoken, aiohttp, botocore, ...) to sandbox-load
agent_model.py.
When pytest collected these in the same session as
test/backend/utils/test_context_utils.py, the latter resolved its
"from nexent.core.agents.agent_model import ..." chain through the bare
mock ModuleType entries left in sys.modules and failed with
ImportError("unknown location") on AgentMemory, Console, etc.
Fix: after each sandbox finishes loading the module it needs (so its
target captures the mock classes as module-level attributes), restore
real packages back into sys.modules via importlib.import_module. The
already-loaded test target keeps its mock references; sibling test
trees see real packages.
Cross-validated: pytest test/sdk/core/agents/test_agent_context/
test/sdk/core/agents/test_context_component.py
test/sdk/core/agents/test_nexent_agent_component_integration.py
test/backend/utils/test_context_utils.py → 221 passed (was 206 passed,
15 failed).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…riants Goal 3: Replace a330d81 short-circuit with semantic component architecture - Expand _format_* functions with full Jinja2 long text (memory guidelines, skill process, tool file guide, agent calling specs) for zh/en variants - Add build_skeleton_*_component() for header/duty/execution_flow/constraint/ code_norms/footer sections - Rewrite build_context_components() to accept raw params and emit piecewise + skeleton components in correct order (priority-encoded 100→10) - Remove 'if system_prompt: return [single]' short-circuit from build_context_components - Modify create_agent_info.py to pass raw params instead of Jinja2-rendered string - Rewrite test_prompt_equivalence.py as semantic assertions (section presence, order, memory ordering, key content) - 12 tests pass - Update test_context_utils.py for new piecewise behavior - 105 tests pass Verified: task_success_retention=1.0 for example_infra benchmark case
Back-port of feature/agent-context-improvement-eval's compress_history_offline, required by benchmark's static compression inspector (sdk/benchmark/.../ summary_inspector.py). Same prompts and schema as the in-agent compression path, but operates on plain (user, assistant) text pairs without any ContextManager state — no cache, no offload store, no agent runtime. Also brings back two module-level helpers it depends on: - format_summary_output: strips markdown fences, attempts JSON parse, falls back to plain text. Logic identical to ContextManager._format_summary but reusable from module scope. - _is_context_length_error: matches known context/token-limit error strings. Logic identical to ContextManager._is_context_length_error. Production safety: this is purely additive — three new module-level symbols (compress_history_offline, format_summary_output, _is_context_length_error). No existing ContextManager method body is touched. backend/ never imports or calls any of these. The benchmark inspector is the only consumer. Verified: pytest test/sdk/core/agents/test_agent_context/ \ test/sdk/core/agents/test_context_component.py \ test/sdk/core/agents/test_nexent_agent_component_integration.py \ test/backend/utils/test_context_utils.py → 221 passed (unchanged). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors feature/agent-context-improvement-eval commit a7d472e, adapted for refactor: the field is added to ModelConfig and forwarded through NexentAgent.create_model into OpenAIModel, but defaulted to None so that production behaviour is unchanged. Rationale: benchmarks (eventqa, longmemeval) need a per-call completion output cap to bound pathological generation loops where a model regurgitates context. Feature picked 4096 as the default; backport chooses None instead so production keeps the provider's own default (typically the model's max output). Benchmarks that want the cap set it explicitly via ModelConfig(..., max_tokens=4096). - agent_model.py: ModelConfig.max_tokens: Optional[int] = None - nexent_agent.py: create_model() forwards model_config.max_tokens - openai_llm.py: OpenAIModel(max_tokens=None) stored on instance; injected into completion_kwargs only when set AND when caller did not pass an explicit override via **kwargs Verified: pytest test/sdk/core/agents/test_agent_context/ \ test/sdk/core/agents/test_context_component.py \ test/sdk/core/agents/test_nexent_agent_component_integration.py \ test/backend/utils/test_context_utils.py → 221 passed (unchanged). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ctx_debugger Translate all documentation files from Chinese to English to comply with CONTRIBUTING.md code standards: - sdk/benchmark/README.md - sdk/benchmark/manual_cases/README.md - sdk/benchmark/manual_cases/note_benchmark.md - sdk/benchmark/eventqa_eval/README.md - sdk/benchmark/eventqa_eval/RUNBOOK.md - sdk/benchmark/longmemeval_eval/README.md - sdk/ctx_debugger/README.md - sdk/ctx_debugger/langfuse_eval_assessment.md These translations ensure the project documentation is accessible to international contributors and follows the English-only convention established in CONTRIBUTING.md.
Sync sdk/benchmark/agent_runner.py and sdk/benchmark/eventqa_eval/ run_eventqa.py from feature/agent-context-improvement-eval HEAD to freeze the version this branch will validate against. Per the user's instruction, the eventqa stack will not be updated again after this. agent_runner.py changes: - Comments translated zh -> en (commit 657d0b8 on feature) - extra_body handling upgraded: now reads LLM_EXTRA_BODY (raw JSON) and LLM_ENABLE_THINKING (truthy flag) env vars instead of the hardcoded THINKING_OFF_EXTRA_BODY dict. Both vendor dialects (Qwen3 chat_template_kwargs and Anthropic thinking.type) coexist in one payload so the same runner works against either backend. - max_tokens parameter forwarded to OpenAIModel (default 4096 at benchmark layer; SDK back-port keeps default=None so production is unaffected — see faf74dc). - AgentRunResult.total_input_tokens / total_output_tokens fields added, step callback accumulates them. Needed by run_eventqa.py. - sys.path setup simplified (drop redundant dirname layer). run_eventqa.py changes: - ingest_main_input_tokens / ingest_main_output_tokens accumulated across ingest chunks; surfaced in summary.json under "cost". - run_probes() turned into bounded-concurrent (asyncio.gather + Semaphore sized by --probe_concurrency). Signature changed from list[dict] to tuple[list[dict], dict] — the second element is the token totals. - Probes forward args.probe_max_tokens to build_agent_run_info. Verification: 1 book x 3 questions x 100k ingest x baseline_context=100k baseline_acc = 1.000 compressed acc = 0.667 retention = 0.667 token_reduction = 0.359 ratio (compressed_total / baseline_total) = 2.076 Compression triggered (token_reduction>0) and the agent answered with compressed context, confirming the full eventqa_eval pipeline works on refactor end-to-end. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hemas from feature
Sync test_benchmark.py + calibrate_thresholds.py + the updated example_infra/case.json from feature/agent-context-improvement-eval HEAD. The 22 additional case dirs that feature added are NOT pulled — they're specific evaluation datasets the user keeps under their own bookkeeping; this branch only validates the harness with the original example_infra case. README.md / note_benchmark.md kept at worktree (English) versions. test_benchmark.py changes: - Replace build_agent_run_info_with_custom_prompt with build_agent_run_info (current SDK surface). - Introduce BENCHMARK_SYSTEM_PROMPT — a lean, generic system prompt that strips the verbose platform scaffolding (file URL guide, reference marks, safety principles) to minimize token overhead during benchmark runs while keeping the core execution-loop instructions. - Introduce BENCHMARK_SUMMARY_SYSTEM_PROMPT + custom 6-field summary schema (~620 word budget) for incremental compression. Replaces the default 10-field Hermes schema by merging completed_work + resolved_questions into "progress" and restricting key_facts to values NOT already stated in progress. Eliminates the 3-field redundancy that caused output bloat in incremental updates and held output size stable instead of growing past token_threshold. - Add net_token_reduction / compression_cost_tokens reporting in the summary so downstream analysis can subtract compression LLM cost from the input savings. calibrate_thresholds.py: new tool for tuning token_threshold per case. example_infra/case.json: 2 new probes added (generate_env.sh sed usage; ES port-mapping cross-container access), bringing the probe count to a more useful smoke level. Verification on refactor: test_benchmark.py --cases example_infra task_success_retention = 1.0 probe_retention = 0.722 token_reduction = 0.635 net_token_reduction = 0.408 compression_cost_tokens = 8341 summary_score = 0.5 Pipeline works end-to-end with the synced harness and expanded probes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…None Match the SDK back-port faf74dc (ModelConfig.max_tokens default=None for production safety). The benchmark harness previously inherited feature's 4096 default, but every concrete benchmark caller already passes max_tokens explicitly (e.g. run_eventqa.py forwards args.probe_max_tokens=4096 into probes), so flipping the default to None has no behaviour change for current callers while making the abstraction symmetric with the SDK: "default unbounded; opt in to a cap". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# Conflicts: # .gitignore # backend/agents/create_agent_info.py # docker/init.sql # sdk/nexent/core/agents/agent_model.py # sdk/nexent/core/agents/nexent_agent.py # sdk/nexent/core/models/openai_llm.py
- Revert backend/agents/create_agent_info.py to develop (Context Components system belongs to parent branch feat/benchmark-on-refactor, not offload PR) - Revert backend/utils/context_utils.py to develop (same reason) - Revert test/backend/utils/test_context_utils.py to develop - Exclude sdk/nexent/core/agents/temp_scripts/ via .gitignore - Debug scripts preserved on feat/opt-agent-context-temp-scripts Co-Authored-By: Claude <noreply@anthropic.com>
…ext/ package) The old single-file agent_context.py has been fully replaced by the agent_context/ package directory. Verified that unit tests use the package via loader.py importlib isolation — the old .py file is never referenced at runtime. Co-Authored-By: Claude <noreply@anthropic.com>
- Trim verbose block comments and docstrings (~57% reduction) - Eliminate cross-reference redundancy between per_step_render_limit and max_observation_length - Clarify boundary: max_observation_length = irreversible sanitation at source; per_step_render_limit = reversible archiving during compression, only for old steps outside keep_recent window Co-Authored-By: Claude <noreply@anthropic.com>
- Keep both ephemeral_messages methods (our branch) and verification methods (upstream) - Keep both raw observation save for offload (our branch) and verification check (upstream) Co-Authored-By: Claude <noreply@anthropic.com>
| from .memory import * | ||
| from .storage import * | ||
| from .vector_database import * | ||
| from .container import * |
There was a problem hiding this comment.
Wildcard import 风险:from .container import * 可能引入意外的符号,导致命名冲突。建议显式导入需要的类/函数,例如 from .container import ContainerClient, ContainerConfig。
|
|
||
| .claude/skills/python-import-triage No newline at end of file | ||
| .claude/skills/python-import-triage | ||
|
|
There was a problem hiding this comment.
.gitignore 中添加了 sdk/nexent/core/agents/temp_scripts/,说明有调试脚本被提交到分支。即使被 gitignore,这些文件也不应存在于 PR 分支中。请在合并前清理。
| from .memory import * | ||
| from .storage import * | ||
| from .vector_database import * | ||
| from .container import * |
There was a problem hiding this comment.
[代码规范] import * 会污染命名空间,建议显式导入需要的模块/类。
| """ | ||
| try: | ||
| return self._do_generate_summary(text, model, call_type, prompt_type) | ||
| except Exception as e: |
There was a problem hiding this comment.
[代码规范] except Exception: 过于宽泛,建议捕获更具体的异常类型,避免掩盖潜在错误。
| ) | ||
| try: | ||
| return self._do_generate_summary(shrunk, model, call_type + "_retry", prompt_type) | ||
| except Exception as e2: |
There was a problem hiding this comment.
[代码规范] except Exception: 过于宽泛,建议捕获更具体的异常类型,避免掩盖潜在错误。
|
Context splitting is a significant refactor (37 files, +5276/-3003). The change from a single context window to a segmented architecture deserves thorough review. Please ensure backward compatibility for existing sessions and that the default chunk sizes are well-tested. |
Split the monolithic
agent_context.py(1,409 lines) into theagent_context/package, and implement an Offload/Reloadmechanism: when compression summarizes old steps, oversized rendered content is archived to an in-memory store and replaced with an
[[OFFLOAD]]marker. The agent can retrieve archived content on demand via thereload_original_context_messagestool.Changes (37 files)
agent_context/package (10 new modules)manager.pystep_renderer.pycompress_history_offlineoffload_store.pybudget.pyllm_summary.pyprevious_compression.pycurrent_compression.pystats_export.pysummary_step.pyOffload configuration (
summary_config.py)Offload requires both
enable_reload=Trueandper_step_render_limit > 0. All parameters have English docstringsexplaining usage and the boundary with
max_observation_length(irreversible per-step sanitation at source vs. reversiblearchiving during compression).
Core logic changes
core_agent.py: Save_raw_observation(pre-truncation) for offload; restoremax_observation_lengthtruncationnexent_agent.py: Build ContextManager before agent so the reload tool enterstool_list(sandboxed); fix duplicateContextManager creation
run_agent.py: Sync reload tool'soffload_storewhen an external context_manager is injectedOther
tools/reload_original_context_tool.py: Reload tool implementationutils/code_analysis.py: Tool-call signature extractionagent_context.pyTesting
Out of Scope
backend/— no changes (Context Components system kept on parent branchfeat/benchmark-on-refactor)temp_scripts/— debug scripts preserved onfeat/opt-agent-context-temp-scriptsHow to Enable Offload