docker · dgageot · Jun 23, 2026 · Jun 24, 2026 · Jun 24, 2026 · Jun 24, 2026
@@ -2038,6 +2038,14 @@
           "type": "boolean",
           "description": "Opt in to dialling non-public IP addresses (valid for type 'fetch', 'api', 'openapi', 'a2a', and remote MCP toolsets). By default protected HTTP clients refuse connections \u2014 after DNS resolution, so DNS rebinding is also blocked \u2014 to loopback, RFC1918 private ranges, link-local (including the cloud metadata endpoint at 169.254.169.254), multicast and the unspecified address. Set this to true when an agent legitimately needs to call internal services. For fetch, 'allowed_domains' / 'blocked_domains' are evaluated independently and still apply."
         },
+        "safer": {
+          "type": "boolean",
+          "description": "Enable destructive command detection for the shell toolset (only valid for type 'shell'). When enabled, every shell command requires explicit user approval regardless of permissions or --yolo. Commands matching docker-agent's embedded safety-pattern taxonomy use the matched blast-radius level; unmatched commands still warn with an unknown blast radius. Default false."
+        },
+        "safer_judge_model": {
+          "type": "string",
+          "description": "Opt in to a residual LLM judge for safer-mode pattern misses (only valid for type 'shell' and requires safer: true). Format is 'provider/model' (e.g. 'anthropic/claude-haiku-4-5'). When set and a shell command contains a destructive lexical signal (drop, wipe, destroy, purge, ...) without matching the embedded pattern set, the runtime asks this model to classify the command and uses its refined blast-radius verdict. Fail-closed: timeout, error, or an uncertain verdict falls back to the default unknown-blast-radius confirmation. Unset disables the LLM path."
+        },
         "sudo_askpass": {
           "type": "boolean",
           "description": "Opt in to a sudo privilege escalation flow for the shell toolset (only valid for type 'shell'). When enabled, sudo commands prompt the user for their password through the host UI via SUDO_ASKPASS; in non-interactive runs the prompt is declined automatically. Only a bare 'sudo ...' invocation in a POSIX shell is handled. No effect on Windows. Default false."

@@ -15,7 +15,7 @@ Built-in tools are included with docker-agent and require no external dependenci
 | Type | Description | Page |
 | --- | --- | --- |
 | `filesystem` | Read, write, list, search, navigate | [Filesystem]({{ '/tools/filesystem/' | relative_url }}) |
-| `shell` | Execute shell commands (sync + background jobs) | [Shell]({{ '/tools/shell/' | relative_url }}) |
+| `shell` | Execute shell commands (sync + background jobs). Supports `safer: true` to force confirmation for known destructive commands. | [Shell]({{ '/tools/shell/' | relative_url }}) |
 | `think` | Reasoning scratchpad | [Think]({{ '/tools/think/' | relative_url }}) |
 | `todo` | Task list management | [Todo]({{ '/tools/todo/' | relative_url }}) |
 | `tasks` | Persistent task database shared across sessions | [Tasks]({{ '/tools/tasks/' | relative_url }}) |
@@ -40,6 +40,7 @@ Built-in tools are included with docker-agent and require no external dependenci
 toolsets:
   - type: filesystem
   - type: shell
+    safer: true
   - type: think
   - type: todo
   - type: memory

@@ -26,6 +26,7 @@ toolsets:
 | Property       | Type    | Description                                                                                          |
 | -------------- | ------- | --------------------------------------------------------------------------------------------------- |
 | `env`          | object  | Environment variables to set for all shell commands                                                 |
+| `safer`        | boolean | Detect known destructive shell commands and always ask for confirmation with a blast-radius warning. Default `false`. |
 | `sudo_askpass` | boolean | Opt in to prompting for a `sudo` password (see [Sudo support](#sudo-support)). Default `false`.     |
 
 ### Custom Environment Variables
@@ -38,6 +39,22 @@ toolsets:
       PATH: "${PATH}:/custom/bin"
 ```
 
+### Safer mode
+
+Set `safer: true` to enable destructive command detection for the `shell` tool:
+
+```yaml
+toolsets:
+  - type: shell
+    safer: true
+```
+
+When enabled, docker-agent checks each `shell` tool call before the normal approval flow. The runtime always asks for explicit user approval, even when `--yolo` or permissions would otherwise auto-approve it. If the command matches a known destructive operation, the confirmation uses the taxonomy's blast-radius level; otherwise it still warns with an `unknown` blast radius.
+
+See [`examples/shell_safer.yaml`](https://github.com/docker/docker-agent/blob/main/examples/shell_safer.yaml) for a complete example.
+
+Current destructive command patterns are loaded from docker-agent's embedded `safety_patterns.json` taxonomy. The list covers filesystem deletion/overwrite commands, Docker cleanup commands, and selected out-of-scope-but-common destructive commands such as Git history rewrites. Each match carries a blast-radius level (`low`, `medium`, `high`, or `unknown`).
+
 ### Sudo support
 
 By default a shell command has no controlling terminal, so a `sudo` command that needs a password hangs until it times out (the agent usually gives up and falls back to printing manual instructions).

@@ -65,6 +65,7 @@ Examples that wire up one of the toolsets shipped with docker-agent
 | File | What it shows |
 |------|---------------|
 | [`shell.yaml`](shell.yaml) | Plain `shell` toolset. |
+| [`shell_safer.yaml`](shell_safer.yaml) | Shell toolset with `safer: true`, forcing confirmation for known destructive commands. |
 | [`filesystem.yaml`](filesystem.yaml) | Plain `filesystem` toolset. |
 | [`filesystem_allow_deny.yaml`](filesystem_allow_deny.yaml) | Restricting the filesystem tool with allow/deny path lists. |
 | [`script_shell.yaml`](script_shell.yaml) | Defining custom shell commands as named tools via `type: script`. |
@@ -210,6 +211,7 @@ remote MCP endpoints.
 | File | What it shows |
 |------|---------------|
 | [`permissions.yaml`](permissions.yaml) | Top-level `permissions` block with `allow`/`deny` patterns for tool calls. |
+| [`shell_safer.yaml`](shell_safer.yaml) | Shell `safer: true` mode that always asks before known destructive commands and shows blast radius. |
 | [`llm_judge.yaml`](llm_judge.yaml) | Layered defense: deterministic permissions + an LLM-as-judge `pre_tool_use` hook + user prompts. |
 | [`redact_secrets.yaml`](redact_secrets.yaml) | Single-flag (`redact_secrets: true`) scrubbing of detected secrets in args, chat content, and tool output. |
 | [`redact_secrets_hooks.yaml`](redact_secrets_hooks.yaml) | The same scrubbing wired manually as three hooks. |

@@ -0,0 +1,18 @@
+agents:
+  root:
+    model: anthropic/claude-haiku-4-5
+    description: Shell agent with safer mode; enable snapshots in user config
+    welcome_message: |
+      Shell safer mode is enabled. To capture snapshots for /undo, enable `settings.snapshot: true` in ~/.config/cagent/config.yaml.
+    instruction: Use the shell tool to run the command the user asks for.
+    toolsets:
+      - type: shell
+        safer: true
+        # Optional: opt in to a residual LLM judge for safer-mode
+        # pattern misses. When set, commands containing a destructive
+        # lexical signal (drop / wipe / destroy / purge / nuke / ...)
+        # without matching the embedded pattern set are classified by
+        # this model. Fail-closed: timeout or error falls back to the
+        # default unknown-blast-radius confirmation. Unset disables
+        # the LLM path entirely.
+        safer_judge_model: anthropic/claude-haiku-4-5
@@ -24,6 +24,7 @@ import (
 	"github.com/docker/docker-agent/pkg/team"
 	"github.com/docker/docker-agent/pkg/teamloader"
 	loaderdefaults "github.com/docker/docker-agent/pkg/teamloader/defaults"
+	"github.com/docker/docker-agent/pkg/tools"
 	"github.com/docker/docker-agent/pkg/version"
 )
 
@@ -707,28 +708,12 @@ func (a *Agent) runAgent(ctx context.Context, acpSess *Session) error {
 
 // handleToolCallConfirmation handles tool call permission requests.
 func (a *Agent) handleToolCallConfirmation(ctx context.Context, acpSess *Session, e *runtime.ToolCallConfirmationEvent) error {
-	toolCallUpdate := buildToolCallUpdate(e.ToolCall, e.ToolDefinition, acp.ToolCallStatusPending)
+	toolCallUpdate := buildToolCallUpdate(e.ToolCall, e.ToolDefinition, e.Safety, acp.ToolCallStatusPending)
 
 	permResp, err := a.conn.RequestPermission(ctx, acp.RequestPermissionRequest{
 		SessionId: acp.SessionId(acpSess.id),
 		ToolCall:  toolCallUpdate,
-		Options: []acp.PermissionOption{
-			{
-				Kind:     acp.PermissionOptionKindAllowOnce,
-				Name:     "Allow this action",
-				OptionId: "allow",
-			},
-			{
-				Kind:     acp.PermissionOptionKindAllowAlways,
-				Name:     "Allow and remember my choice",
-				OptionId: "allow-always",
-			},
-			{
-				Kind:     acp.PermissionOptionKindRejectOnce,
-				Name:     "Skip this action",
-				OptionId: "reject",
-			},
-		},
+		Options:   permissionOptions(e.Safety),
 	})
 	if err != nil {
 		return err
@@ -757,6 +742,34 @@ func (a *Agent) handleToolCallConfirmation(ctx context.Context, acpSess *Session
 	return nil
 }
 
+func permissionOptions(safety *tools.ToolCallSafety) []acp.PermissionOption {
+	allowName := "Allow this action"
+	if safety != nil && safety.Destructive {
+		level := safety.BlastRadius
+		if level == "" {
+			level = tools.BlastRadiusUnknown
+		}
+		allowName = fmt.Sprintf("Allow destructive tool (blast radius: %s)", level)
+	}
+	return []acp.PermissionOption{
+		{
+			Kind:     acp.PermissionOptionKindAllowOnce,
+			Name:     allowName,
+			OptionId: "allow",
+		},
+		{
+			Kind:     acp.PermissionOptionKindAllowAlways,
+			Name:     "Allow and remember my choice",
+			OptionId: "allow-always",
+		},
+		{
+			Kind:     acp.PermissionOptionKindRejectOnce,
+			Name:     "Skip this action",
+			OptionId: "reject",
+		},
+	}
+}
+
 // handleMaxIterationsReached handles max iterations events.
 func (a *Agent) handleMaxIterationsReached(ctx context.Context, acpSess *Session, e *runtime.MaxIterationsReachedEvent) error {
 	title := fmt.Sprintf("Maximum iterations (%d) reached", e.MaxIterations)

@@ -67,11 +67,18 @@ func buildToolCallComplete(arguments string, event *runtime.ToolCallResponseEven
 }
 
 // buildToolCallUpdate creates a tool call update for permission requests.
-func buildToolCallUpdate(toolCall tools.ToolCall, tool tools.Tool, status acp.ToolCallStatus) acp.ToolCallUpdate {
+func buildToolCallUpdate(toolCall tools.ToolCall, tool tools.Tool, safety *tools.ToolCallSafety, status acp.ToolCallStatus) acp.ToolCallUpdate {
 	kind := acp.ToolKindExecute
 	title := cmp.Or(tool.Annotations.Title, toolCall.Function.Name)
 
-	if tool.Annotations.ReadOnlyHint {
+	if safety != nil && safety.Destructive {
+		kind = acp.ToolKindDelete
+		level := safety.BlastRadius
+		if level == "" {
+			level = tools.BlastRadiusUnknown
+		}
+		title = fmt.Sprintf("Destructive tool: %s (blast radius: %s)", title, level)
+	} else if tool.Annotations.ReadOnlyHint {
 		kind = acp.ToolKindRead
 	}
 

@@ -15,6 +15,7 @@ import (
 
 	"github.com/docker/docker-agent/pkg/input"
 	"github.com/docker/docker-agent/pkg/tools"
+	"github.com/docker/docker-agent/pkg/tui/components/toolconfirm"
 )
 
 // ConfirmationResult represents the result of a user confirmation prompt
@@ -80,9 +81,35 @@ func (p *Printer) PrintToolCall(toolCall tools.ToolCall) {
 	p.Printf("\nCalling %s%s\n", bold(toolCall.Function.Name), formatToolCallArguments(toolCall.Function.Arguments))
 }
 
+func destructiveWarningPrinter() *color.Color {
+	return color.New(color.FgHiYellow, color.Bold)
+}
+
+func blastRadiusPrinter(level tools.BlastRadiusLevel) *color.Color {
+	switch level {
+	case tools.BlastRadiusLow:
+		return color.New(color.FgGreen, color.Bold)
+	case tools.BlastRadiusMedium:
+		return color.New(color.FgYellow, color.Bold)
+	case tools.BlastRadiusHigh:
+		return color.New(color.FgRed, color.Bold)
+	default:
+		return color.New(color.FgWhite, color.Bold)
+	}
+}
+
 // PrintToolCallWithConfirmation prints a tool call and prompts for confirmation
-func (p *Printer) PrintToolCallWithConfirmation(ctx context.Context, toolCall tools.ToolCall, rd io.Reader) ConfirmationResult {
-	p.Printf("\n%s\n", bold("🛠️ Tool call requires confirmation 🛠️"))
+func (p *Printer) PrintToolCallWithConfirmation(ctx context.Context, toolCall tools.ToolCall, safety *tools.ToolCallSafety, rd io.Reader) ConfirmationResult {
+	if safety != nil && safety.Destructive {
+		level := safety.BlastRadius
+		if level == "" {
+			level = tools.BlastRadiusUnknown
+		}
+		p.Printf("\n%s\n", destructiveWarningPrinter().Sprint(toolconfirm.DestructiveWarningTitle))
+		p.Printf("Blast radius level: %s\n", blastRadiusPrinter(level).Sprint(string(level)))
+	} else {
+		p.Printf("\n%s\n", bold("🛠️ Tool call requires confirmation 🛠️"))
+	}
 	p.PrintToolCall(toolCall)
 	p.Printf("\n%s", bold("Can I run this tool? ([y]es/[a]ll/[n]o): "))
 

@@ -164,7 +164,7 @@ func Run(ctx context.Context, out *Printer, cfg Config, rt runtime.Runtime, sess
 			case *runtime.AgentChoiceReasoningEvent:
 				out.Print(e.Content)
 			case *runtime.ToolCallConfirmationEvent:
-				result := out.PrintToolCallWithConfirmation(ctx, e.ToolCall, rd)
+				result := out.PrintToolCallWithConfirmation(ctx, e.ToolCall, e.Safety, rd)
 				// If interrupted, skip resuming; the runtime will notice context cancellation and stop
 				if ctx.Err() != nil {
 					continue

@@ -1110,6 +1110,25 @@ type Toolset struct {
 	// nil means the field was omitted and may inherit from a referenced definition.
 	AllowPrivateIPs *bool `json:"allow_private_ips,omitempty" yaml:"allow_private_ips,omitempty"`
 
+	// For the `shell` toolset — enable destructive command detection for the
+	// shell tool. When a shell call matches a known destructive command, the
+	// runtime always asks the user and includes the blast-radius level in the
+	// confirmation, regardless of permissions or --yolo.
+	Safer *bool `json:"safer,omitempty" yaml:"safer,omitempty"`
+
+	// For the `shell` toolset — opt in to a residual LLM judge that
+	// classifies commands that pass safer's regex pass without matching
+	// any pattern but contain a destructive lexical signal (drop, wipe,
+	// destroy, ...). The format is "provider/model"
+	// (e.g. "anthropic/claude-haiku-4-5"). When set, the runtime
+	// constructs a provider from this string and wires it into the
+	// shell toolset's residual classifier; nil/empty keeps the default
+	// behaviour (BlastRadiusUnknown for every pattern miss).
+	//
+	// Requires Safer:true; validation rejects this field on non-shell
+	// toolsets or when Safer is unset.
+	SaferJudgeModel *string `json:"safer_judge_model,omitempty" yaml:"safer_judge_model,omitempty"`
+
 	// For the `shell` toolset — opt in to a sudo privilege escalation flow.
 	// When enabled, sudo commands prompt the user for their password (masked)
 	// through the host UI via SUDO_ASKPASS; in non-interactive runs the prompt

@@ -226,6 +226,20 @@ func (t *Toolset) validate() error {
 	if t.AllowPrivateIPsEnabled() && t.Type != "fetch" && t.Type != "mcp" && t.Type != "api" && t.Type != "openapi" && t.Type != "a2a" {
 		return errors.New("allow_private_ips can only be used with type 'fetch', 'api', 'openapi', 'a2a' or remote MCP toolsets")
 	}
+	if t.Safer != nil && t.Type != "shell" {
+		return errors.New("safer can only be used with type 'shell'")
+	}
+	if t.SaferJudgeModel != nil {
+		if t.Type != "shell" {
+			return errors.New("safer_judge_model can only be used with type 'shell'")
+		}
+		if t.Safer == nil || !*t.Safer {
+			return errors.New("safer_judge_model requires safer: true")
+		}
+		if *t.SaferJudgeModel == "" {
+			return errors.New("safer_judge_model must not be empty when set")
+		}
+	}
 	if t.SudoAskpass != nil && t.Type != "shell" {
 		return errors.New("sudo_askpass can only be used with type 'shell'")
 	}