feat: add generic sandbox provider framework#8093
Conversation
There was a problem hiding this comment.
Sorry @zouyonghe, your pull request is larger than the review limit of 150000 diff characters
There was a problem hiding this comment.
Code Review
This pull request transitions the sandbox management system to a provider-based architecture, removing hardcoded implementations for Shipyard and CUA from the core. It introduces a SandboxManager, SandboxRegistry, and a SandboxProvider protocol to handle sandbox lifecycles, leases, and persistence generically. The dashboard is updated with new sandbox routes, and tool registration is refactored to be dynamically driven by providers. Review feedback highlights the need to restore provider-specific system prompts lost during extraction, improve resilience when sandboxes are busy, ensure compatibility with different booter implementations of the available check, and avoid blocking the event loop with synchronous file I/O in the registry.
| provider_info = get_sandbox_provider_info(booter) | ||
| if provider_info: | ||
| for tool_name in provider_info.get("tool_names", []): | ||
| tool = tool_mgr.get_func(tool_name) | ||
| if tool and getattr(tool, "active", True): | ||
| req.func_tool.add_tool(tool) |
There was a problem hiding this comment.
The generic sandbox tool mounting logic should also include provider-specific system prompt instructions if available. The extraction of runtimes removed important guidance (e.g., file path rules for Shipyard Neo or GUI instructions for CUA) that the LLM needs to use the tools effectively. The SandboxProvider protocol should be updated to provide these instructions. Please ensure this new functionality is accompanied by corresponding unit tests and that the logic for extracting provider info is shared to avoid duplication.
| provider_info = get_sandbox_provider_info(booter) | |
| if provider_info: | |
| for tool_name in provider_info.get("tool_names", []): | |
| tool = tool_mgr.get_func(tool_name) | |
| if tool and getattr(tool, "active", True): | |
| req.func_tool.add_tool(tool) | |
| provider_info = get_sandbox_provider_info(booter) | |
| if provider_info: | |
| if prompt := provider_info.get("system_prompt"): | |
| req.system_prompt = f"{req.system_prompt or ''}\n{prompt}\n" | |
| for tool_name in provider_info.get("tool_names", []): | |
| tool = tool_mgr.get_func(tool_name) | |
| if tool and getattr(tool, "active", True): | |
| req.func_tool.add_tool(tool) |
References
- When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.
- New functionality, such as handling attachments, should be accompanied by corresponding unit tests.
| def _sandbox_provider_info(provider_id: str, provider: SandboxProvider) -> dict: | ||
| return { | ||
| "provider_id": provider_id, | ||
| "capabilities": sorted(getattr(provider, "capabilities", set())), | ||
| "tool_names": sorted(getattr(provider, "tool_names", set())), | ||
| } |
There was a problem hiding this comment.
Include the system_prompt from the provider in the info dictionary so it can be utilized by the agent during tool registration. Refactor the logic into a shared helper function to avoid code duplication. Also, ensure this new functionality is accompanied by unit tests.
| def _sandbox_provider_info(provider_id: str, provider: SandboxProvider) -> dict: | |
| return { | |
| "provider_id": provider_id, | |
| "capabilities": sorted(getattr(provider, "capabilities", set())), | |
| "tool_names": sorted(getattr(provider, "tool_names", set())), | |
| } | |
| def _sandbox_provider_info(provider_id: str, provider: SandboxProvider) -> dict: | |
| return { | |
| "provider_id": provider_id, | |
| "capabilities": sorted(getattr(provider, "capabilities", set())), | |
| "tool_names": sorted(getattr(provider, "tool_names", set())), | |
| "system_prompt": getattr(provider, "system_prompt", ""), | |
| } |
References
- When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.
- New functionality, such as handling attachments, should be accompanied by corresponding unit tests.
| available = getattr(booter, "available", None) | ||
| if available is None: | ||
| return True | ||
| return await available() |
| if not self.acquire_lease(current_sandbox_id, session_id): | ||
| raise RuntimeError(f"Sandbox {current_sandbox_id} is busy") | ||
| booter = self.session_booter[current_sandbox_id] | ||
| if await self.booter_available(booter): | ||
| self.registry.touch_sandbox(current_sandbox_id) | ||
| self.save_registry() | ||
| self.schedule_idle_cleanup(current_sandbox_id, idle_timeout) | ||
| return booter | ||
| self.session_booter.pop(current_sandbox_id, None) |
There was a problem hiding this comment.
If the current sandbox is busy (e.g., taken over by another session), raising a RuntimeError here will cause the agent to fail. Instead of crashing, it would be more resilient to clear the current sandbox assignment for this session and fall through to the logic that finds the default sandbox or creates a new one.
| if not self.acquire_lease(current_sandbox_id, session_id): | |
| raise RuntimeError(f"Sandbox {current_sandbox_id} is busy") | |
| booter = self.session_booter[current_sandbox_id] | |
| if await self.booter_available(booter): | |
| self.registry.touch_sandbox(current_sandbox_id) | |
| self.save_registry() | |
| self.schedule_idle_cleanup(current_sandbox_id, idle_timeout) | |
| return booter | |
| self.session_booter.pop(current_sandbox_id, None) | |
| if self.acquire_lease(current_sandbox_id, session_id): | |
| booter = self.session_booter[current_sandbox_id] | |
| if await self.booter_available(booter): | |
| self.registry.touch_sandbox(current_sandbox_id) | |
| self.save_registry() | |
| self.schedule_idle_cleanup(current_sandbox_id, idle_timeout) | |
| return booter | |
| self.session_booter.pop(current_sandbox_id, None) | |
| else: | |
| self.registry.set_current_sandbox_id(session_id, None) | |
| self.save_registry() |
| class SandboxProvider(Protocol): | ||
| provider_id: str | ||
| capabilities: set[str] | ||
| tool_names: set[str] |
There was a problem hiding this comment.
The SandboxProvider protocol should include a system_prompt field to allow runtimes to provide necessary instructions to the LLM (e.g., path conventions or specific tool usage workflows). New functionality should be accompanied by corresponding unit tests.
| class SandboxProvider(Protocol): | |
| provider_id: str | |
| capabilities: set[str] | |
| tool_names: set[str] | |
| class SandboxProvider(Protocol): | |
| provider_id: str | |
| capabilities: set[str] | |
| tool_names: set[str] | |
| system_prompt: str = "" |
References
- New functionality, such as handling attachments, should be accompanied by corresponding unit tests.
|
|
||
| def save(self) -> None: | ||
| self.storage_path.parent.mkdir(parents=True, exist_ok=True) | ||
| self.storage_path.write_text( |
Use sandbox_manager as the source of active sandbox booters so provider sandboxes receive skill updates again, and write the sandbox registry atomically to avoid corrupting persisted state on crashes.
- Use sandbox_manager.booter_available() in sync_skills_to_active_sandboxes to avoid TypeError when provider available is a sync method or property. - Release lease and save registry when current booter becomes unavailable in get_or_create_booter, preventing lease leaks. - Reschedule idle cleanup on destroy_booter failure instead of leaking the sandbox without any future cleanup task. - Wrap CopyFileBetweenSandboxesTool temp file cleanup in try/finally to prevent leaks on upload/download errors. - Update session_current in takeover_sandbox for consistent get_current_sandbox semantics. - Force-cleanup session_booter, idle_state and registry in destroy_sandbox even if destroy_booter raises an exception. - Add boot_lock around create_sandbox_uncontrolled to prevent concurrent booter creation for the same sandbox_id. - Remove unused session_id parameter from _apply_sandbox_tools. - Remove unreachable break in get_or_create_booter. - Update tests to match adjusted _apply_sandbox_tools signature.
…ill sync - Upgrade SandboxProvider Protocol with optional plugin_config, provider_api_version, system_prompt, auto_sync_skills and lifecycle hooks (on_sandbox_created / on_sandbox_destroyed). - register_sandbox_provider now accepts a tools=list parameter. Core automatically appends tools to llm_tools and removes them on unregister. - unregister_sandbox_provider with force=True now synchronously cleans up registry records and spawns best-effort async destroy_booter tasks. - cleanup_managed_sandboxes no longer skips sandboxes whose provider has been unregistered; it always frees booters and registry state. - get_or_create_booter automatically syncs skills after boot unless provider sets auto_sync_skills=False. - invoke optional lifecycle hooks after create and destroy. - Remove manual skill sync calls from all provider create_booter methods. - Update tests for new protocol shape and auto-sync behaviour.
…e, best-effort skill sync - _cleanup_provider_sandboxes_sync now calls registry.save() after deleting records so force-unregister cleanup is persisted to disk. - Extract _finalize_created_booter helper and call it from both get_or_create_booter and create_sandbox_uncontrolled so explicit sandbox creation (e.g. astrbot_create_sandbox) also runs skill sync and on_sandbox_created hooks. - Wrap auto skill sync in try/except so a sync failure logs a warning instead of leaving a live but orphaned sandbox registered in state.
- Extract _invoke_sandbox_created_hook helper. - Move hook invocation out of _finalize_created_booter so it fires only after the sandbox is leased, matching the Protocol docstring guarantee. - get_or_create_booter calls hook after _finalize_created_booter (lease already held during creation). - create_sandbox calls hook after acquire_lease, eliminating the race where hook/skill sync could delay leasing and let idle cleanup expire the sandbox.
…aths - Make takeover_sandbox async so it can await the hook - Trigger hook in switch_current_sandbox_checked after lease cleanup - Trigger hook in get_observer_booter_by_id for dashboard-created sandboxes - Only mark created_hook_fired on success; transient failures are retried - Reset idle cleanup timer in create_sandbox after lease acquisition - Update callers (dashboard route, TakeoverSandboxTool, tests) to await Closes: on_sandbox_created skipped for dashboard/uncontrolled sandboxes
…nitoring - Remove on_sandbox_created from get_observer_booter_by_id; observer access does not acquire a lease and must not trigger creation-side initialization. The hook still fires on first lease via switch/takeover/create_sandbox. - Protect created_hook_fired check-and-set with _sandbox_boot_lock to prevent duplicate triggers under concurrent lease operations. - Allow Dashboard screenshot/shell to bypass lease check (session_id=None) so admins can monitor any sandbox regardless of current controller.
… semantics - get_observer_booter_by_id now accepts require_lease=True (default). When False, it allows read-only observer access to sandboxes controlled by other sessions without raising, but skips touch_sandbox and idle_cleanup reset so observer polling does not extend another session's sandbox lifetime. - Dashboard shell restores session_id lease check (require_lease=True default); only screenshot bypasses lease with require_lease=False for monitoring. - Update test_sandbox_dashboard_screenshot_respects_session_ownership to test_sandbox_dashboard_screenshot_bypasses_lease_for_monitoring, verifying that read-only observer access succeeds without a lease.
Dashboard is an administrative interface; both shell and screenshot should be usable on any sandbox regardless of current lease holder. - run_shell: pass require_lease=False to get_observer_booter_by_id - Update test to verify admin shell access bypasses session ownership
Add canonical sandbox lifecycle states to replace vague 'not running' errors: - creating: boot in progress - running: active and available - error: creation failed or booter health check failed - stopping: destroy in progress - stopped: destroyed but persistent record kept - unknown: reconciled on startup Changes: - SandboxStatus enum gains CREATING, ERROR, STOPPING states - create_sandbox_uncontrolled sets CREATING before boot, ERROR on failure - _finalize_created_booter sets RUNNING after successful boot - destroy_sandbox sets STOPPING before teardown, STOPPED for persistent - get_observer_booter_by_id returns state-aware error messages: 'still being created', 'being destroyed', 'has been destroyed', 'encountered an error', 'unavailable (health check failed)' - booter_available failure now updates status to ERROR instead of unknown
5e11841 to
48a1900
Compare
…box-core # Conflicts: # dashboard/src/assets/mdi-subset/materialdesignicons-subset.css # dashboard/src/assets/mdi-subset/materialdesignicons-webfont-subset.woff # dashboard/src/assets/mdi-subset/materialdesignicons-webfont-subset.woff2
…aster # Conflicts: # dashboard/pnpm-workspace.yaml # dashboard/src/i18n/locales/ru-RU/core/navigation.json
…feat/generic-sandbox-core
Overall reviewI reviewed this PR as a full sandbox-runtime architecture extraction rather than as an isolated file change. Although the diff is large, the implementation is structurally coherent and the risk appears relatively low. ArchitectureThe PR moves concrete sandbox runtime implementations behind a provider abstraction and keeps core responsibilities focused on orchestration, registry state, lifecycle management, tool exposure, and dashboard integration. This is a good direction: runtime-specific behavior is no longer hard-coded into core, while the core still owns the common sandbox contract. Lifecycle and state managementThe sandbox manager/registry model is consistent with the new provider-based design. Creation, switching, leasing, takeover, cleanup, persistence, and runtime-state tracking are handled through centralized manager paths instead of being scattered across individual runtime implementations. This reduces coupling and should make future sandbox providers easier to add and maintain. Dashboard and API surfaceThe dashboard changes follow the same abstraction boundary as the backend: providers are listed generically, sandboxes are managed through common records, and runtime operations are routed through the manager/provider contract. The UI/API additions are broad, but they are aligned with the new model rather than introducing a parallel control path. Extensibility and compatibilityThe provider registration model, sandbox tool binding, skill synchronization, and runtime selection changes fit together cleanly. The design gives provider plugins clear ownership of runtime-specific behavior while preserving a generic core interface for agents, tools, and the dashboard. That makes this PR a solid foundation for extracting and evolving sandbox drivers outside the core repository. VerificationI ran the focused regression suite locally: uv run pytest tests/unit/test_sandbox_manager.py tests/test_dashboard.py tests/unit/test_astr_main_agent.py tests/unit/test_astr_agent_tool_exec.py -qResult: The covered areas include sandbox manager behavior, dashboard sandbox APIs, main agent runtime/tool wiring, and tool execution integration. Risk assessmentThe main risk is the size of the refactor, not a specific unresolved correctness issue. The new boundaries are clear, the provider abstraction is consistently applied, and the focused tests cover the most important integration points. Based on the code structure and regression results, I would classify this as a large but low-to-moderate risk architectural cleanup. RecommendationLooks good to merge. The PR is a significant internal restructuring, but it improves modularity and maintainability, and I did not find a blocking issue in the final reviewed state. |
Summary
This PR turns sandbox computer-use runtimes into plugin-provided providers instead of shipping CUA, Shipyard, Shipyard Neo, Boxlite, and Bay implementations directly in core. Core now owns the provider-agnostic sandbox lifecycle, registry, lease/occupancy model, generic agent tools, dashboard management APIs, and plugin contract; runtime-specific code moves to external sandbox driver plugins.
The agent-facing sandbox API is consolidated into three generic tools:
astrbot_sandbox_query:list_sandboxes,get_current,list_providersastrbot_sandbox_lifecycle:create,switch,release,renew_lease,set_retention,takeover,destroyastrbot_sandbox_operation:capture_screenshot,copy_fileProvider-specific tools remain possible for capabilities that are genuinely provider-specific, such as CUA mouse/keyboard actions or Shipyard Neo skill/browser operations. Screenshot is now a common sandbox operation through
astrbot_sandbox_operation(action="capture_screenshot"), not a CUA-specific tool.Architecture
Core now exposes a provider framework under
astrbot.core.computer:SandboxProviderdefines the runtime adapter protocol.SandboxManagerowns lifecycle operations, leases, cleanup scheduling, provider selection, persistent reconnect, and runtime booter state.SandboxRegistrypersists managed sandbox records, current sandbox bindings, default sandbox ids, lease holders, retention policy, status, capabilities, and tool metadata.SandboxRecord/SandboxStatusnormalize persisted and API-facing sandbox state.sandbox_timeoutscentralizes lease, idle, and TTL timeout parsing/formatting.sandbox_tool_bindingmarks provider-specific tools with provider metadata and exposes their config tags to the dashboard.The old embedded runtime modules were removed from core:
Runtime implementations are expected to live in sandbox driver plugins and register themselves with
register_sandbox_provider(...).Lifecycle Design
Sandbox lifecycle is now explicit and stateful:
creating: the sandbox record exists and background boot is in progress.running: a booter is available and the sandbox can be used.error: create/reconnect/destroy health checks failed and the record is preserved for visibility.stopping: destroy is in progress.stopped: a persistent sandbox was shut down but its record remains.unknown: persisted state exists, but runtime availability must be reconciled.Dashboard-created sandboxes return immediately as
creatingand boot in the background. The dashboard polls status instead of blocking the request. Action buttons are guarded when a sandbox is not runnable, and backend routes return state-aware errors for creating/stopping/stopped/error sandboxes.Persistent sandboxes can be reconnected on startup, switch, takeover, or observer access when the provider supports it. Temporary sandbox records are pruned on startup. Persistent records are preserved across restart and can be reconciled without eagerly destroying user state.
Occupancy, Leases, and Renewal
Sandbox access is guarded by a session-scoped occupancy lease:
create,switch, andtakeoveracquire a lease for the requesting session.now + sandbox_lease_timeout.600seconds and is configurable throughprovider_settings.sandbox.sandbox_lease_timeout.0disables automatic expiry and requires manual release.get_currentreturns no current sandbox, andrenew_leasecannot resurrect the expired lease.list_sandboxesand then explicitlyswitch,takeover, orcreatebefore continuing sandbox work.lease_expires_at,lease_expires_in_seconds, andauto_renew_interval_seconds.releaseclears the current session occupancy.takeoverlets a session explicitly take control of a sandbox, with state and ownership checks.Expired leases are released when records are listed or accessed, so another session can reuse or take over the sandbox after the timeout. Existing leases are never shortened by renewal; renewal extends an active lease to at least
now + configured timeout.Dashboard monitoring can use observer access where appropriate. Read-only monitoring paths such as screenshot do not extend another session's lifetime, while mutating shell/action paths still go through explicit backend guards.
Idle and TTL Cleanup
The manager handles two cleanup dimensions:
Cleanup is resilient to provider failures:
error, the booter is retained for inspection, and idle cleanup state is cleared so the manager does not keep retrying forever.Active leases pause idle cleanup by rescheduling the idle deadline. Observer access that does not require a lease does not extend idle lifetime.
Agent Tooling
Core no longer injects one tool per sandbox action. Instead, it injects three grouped sandbox tools and documents their action parameters in the prompt. This keeps the tool surface smaller while preserving all management functionality.
Generic tools cover:
General computer tools such as shell, IPython, upload/download, and file operations remain separate. They act on the current sandbox when sandbox runtime is selected, but they are not sandbox management tools.
Provider-specific tools are marked as sandbox provider tools rather than ordinary plugin tools. They are shown in the dashboard as
sandboxorigin with provider-specific names and are read-only from generic plugin enable/disable controls.Dashboard
This PR adds a sandbox management page and backend API support:
The dashboard now loads sandbox providers dynamically rather than hardcoding core runtime-specific config/options.
Plugin Adapter Contract
Sandbox driver plugins should implement the provider layer instead of adding runtime code to core.
Required provider responsibilities:
provider_id.capabilitiessuch asshell,python,filesystem,screenshot,mouse, orkeyboard.tool_namesonly for provider-specific tools contributed by the plugin.build_create_config(context, session_id)for merging plugin config and user sandbox settings.build_connect_info(sandbox_name, config)for persisted reconnect metadata.create_booter(...)anddestroy_booter(...).Optional provider capabilities:
plugin_configconstructor input for plugin-level config.provider_api_versionfor future compatibility.system_promptfor provider-specific agent guidance.auto_sync_skills=Falsewhen the provider manages skill sync itself.supports_persistent_reconnectplus reconnect helpers for persistent sandbox restore.on_sandbox_createdandon_sandbox_destroyed.Provider-specific tools should be registered through
register_sandbox_provider(provider, tools=[...])and marked with@sandbox_provider_tool(provider_id, config=...). Core handles registration, unregister, dashboard classification, and runtime cleanup. Plugins should not register provider-specific sandbox tools as ordinary builtin/core tools.Migration Notes
Core now expects sandbox runtimes to be installed as plugins. Existing CUA, Shipyard, Shipyard Neo, and Boxlite runtime behavior should be supplied by their corresponding sandbox driver plugins.
Runtime-specific config keys were removed from core defaults and moved to plugin-owned config schemas. Core keeps only provider-agnostic sandbox settings such as selected provider, idle timeout, TTL, lease timeout, and default retention behavior.
For CUA specifically, screenshots should be provided through the generic
astrbot_sandbox_operationcapture_screenshotaction. CUA provider-specific tools should only expose mouse/keyboard actions.Verification
uv run ruff check astrbot/core/computer/sandbox_manager.py tests/unit/test_sandbox_manager.pyuv run pytest tests/unit/test_sandbox_manager.py -q->110 passeduv run ruff check tests/unit/test_astr_main_agent.py tests/test_sandbox_plugin_schema_contract.pyuv run pytest tests/test_sandbox_plugin_schema_contract.py::test_cua_adapter_uses_core_screenshot_operation tests/unit/test_astr_main_agent.py::TestPluginToolFix::test_plugin_tool_fix_keeps_provider_specific_tools_in_sandbox_runtime tests/unit/test_astr_main_agent.py::TestPluginToolFix::test_plugin_tool_fix_hides_provider_specific_tools_outside_sandbox_runtime -q->3 passeduv run pytest data/plugins/astrbot_sandbox_cua/test_persistence.py -q->29 passeduv run pytest tests/unit/test_sandbox_tools_permissions.py tests/unit/test_sandbox_tool_consolidation.py tests/unit/test_astr_main_agent.py tests/unit/test_astr_agent_tool_exec.py tests/unit/test_tool_permission.py tests/unit/test_sandbox_tool_binding.py tests/test_dashboard.py tests/test_sandbox_frontend_contract.py tests/test_sandbox_plugin_schema_contract.py -q->292 passed, 1 warningpnpm buildindashboardcompleted successfully; existing MDI CSS warnings were present.