You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[aw-failures] Smoke CI hard-red at startup — EACCES mkdir /tmp/gh-aw/sandbox/firewall/logs, agent never invoked (rootless left
[Content truncated due to length] #42398
Reclaim the rootless /tmp/gh-aw/sandbox tree before writeConfigs() — a leftover root-owned dir makes mkdir /tmp/gh-aw/sandbox/firewall/logs fail EACCES and kills Smoke CI at startup before the agent is ever invoked.
This is a NEW, untracked P1 hard-red. It is distinct from #41455 (firewall startup via DNS EAI_AGAIN), #41636 (Copilot CLI exit-1 after safe-outputs succeed), and #41885 (Claude parse step on empty logEntries). Those are DNS races or post-completion false-reds; this one fails pre-flight — the agent never runs, so the run is 100% lost.
Problem statement
Make the AWF sandbox bootstrap resilient to a pre-existing root-owned /tmp/gh-aw/sandbox left by a prior rootless container on the same runner. Today, the very first config-generation step dies:
[INFO] Network-isolation mode: enforcing egress via Docker network topology (no host iptables, no sudo).
[INFO] Generating configuration files...
[ERROR] Fatal error: Error: EACCES: permission denied, mkdir '/tmp/gh-aw/sandbox/firewall/logs'
at Object.mkdirSync (node:fs:1370:26)
... at Object.RM [as writeConfigs] (/home/runner/.local/lib/awf/awf-bundle.js:786:1941)
[WARN] Could not fix squid log permissions: Error: Command failed ... chmod -R a+rX /tmp/gh-aw/sandbox/firewall/logs
chmod: cannot access '/tmp/gh-aw/sandbox/firewall/logs': Permission denied
Process exiting with code: 1
##[error]Process completed with exit code 1.
The recovery path (chmod -R a+rX) also fails Permission denied, so there is no escape hatch — the run aborts hard.
run1_token_usage: 0 — the agent was never invoked; the job died during "Generating configuration files".
has_anomalies: false — no firewall/egress divergence; the firewall config was never written because mkdir failed first. The discriminator is purely the pre-flight mkdir EACCES.
writeConfigs() calls mkdirSync('/tmp/gh-aw/sandbox/firewall/logs') without first ensuring the tree is owned by / writable for the current uid → EACCES.
The chmod -R a+rX fallback cannot touch the root-owned dir, so the bootstrap fatally exits 1 instead of repairing or relocating.
This is the pre-flight twin of #41885's post-teardown rootless-ownership failure: same root cause (rootless leaves root-owned /tmp/gh-aw/sandbox), opposite end of the run lifecycle.
Proposed remediation
Primary: before writeConfigs(), reclaim the sandbox tree for the current uid — rm -rf /tmp/gh-aw/sandbox (or rootless chown via podman unshare) when a pre-existing root-owned residue is detected — then mkdir. A fresh, uid-owned tree eliminates the race.
Resilience: on mkdirEACCES under /tmp/gh-aw/sandbox, attempt ownership repair or fall back to a fresh uid-scoped temp dir instead of fatal-exiting.
Reclaim the rootless
/tmp/gh-aw/sandboxtree beforewriteConfigs()— a leftover root-owned dir makesmkdir /tmp/gh-aw/sandbox/firewall/logsfailEACCESand kills Smoke CI at startup before the agent is ever invoked.This is a NEW, untracked P1 hard-red. It is distinct from #41455 (firewall startup via DNS
EAI_AGAIN), #41636 (Copilot CLI exit-1 after safe-outputs succeed), and #41885 (Claude parse step on emptylogEntries). Those are DNS races or post-completion false-reds; this one fails pre-flight — the agent never runs, so the run is 100% lost.Problem statement
Make the AWF sandbox bootstrap resilient to a pre-existing root-owned
/tmp/gh-aw/sandboxleft by a prior rootless container on the same runner. Today, the very first config-generation step dies:The recovery path (
chmod -R a+rX) also failsPermission denied, so there is no escape hatch — the run aborts hard.Affected workflows and run IDs
.github/workflows/smoke-ci.lock.yml)A nearly-identical Smoke CI run succeeded ~1 minute later on the same config → this is a per-runner ownership race, not a configuration error.
Evidence
audit-diff: failed §28413001230 vs success §28413042897
{ "firewall_diff": { "summary": { "has_anomalies": false, "anomaly_count": 0 } }, "run_metrics_diff": { "run1_token_usage": 0, "run2_token_usage": 0, "github_rate_limit_details": { "run1_total_api_calls": 11, "run1_core_consumed": 65 } } }run1_token_usage: 0— the agent was never invoked; the job died during "Generating configuration files".has_anomalies: false— no firewall/egress divergence; the firewall config was never written becausemkdirfailed first. The discriminator is purely the pre-flightmkdirEACCES.Probable root cause
/tmp/gh-aw/sandbox(the parent offirewall/logs) owned by a uid the currentrunneruser cannot write — the same residue surfaced as "Rootless artifact permission repair failed (exit 1)" in [aw-failures] Copilot CLI false-red — runs marked failure (exit 1) after safe-outputs succeed, via "numerous permission denied" [Content truncated due to length] #41636 and [aw-failures] Claude false-red — log_parser_bootstrap fails completed runs on empty logEntries (Avenger, Daily Rendering Scripts [Content truncated due to length] #41885.writeConfigs()callsmkdirSync('/tmp/gh-aw/sandbox/firewall/logs')without first ensuring the tree is owned by / writable for the current uid →EACCES.chmod -R a+rXfallback cannot touch the root-owned dir, so the bootstrap fatally exits 1 instead of repairing or relocating.This is the pre-flight twin of #41885's post-teardown rootless-ownership failure: same root cause (rootless leaves root-owned
/tmp/gh-aw/sandbox), opposite end of the run lifecycle.Proposed remediation
writeConfigs(), reclaim the sandbox tree for the current uid —rm -rf /tmp/gh-aw/sandbox(or rootlesschownviapodman unshare) when a pre-existing root-owned residue is detected — thenmkdir. A fresh, uid-owned tree eliminates the race.mkdirEACCESunder/tmp/gh-aw/sandbox, attempt ownership repair or fall back to a fresh uid-scoped temp dir instead of fatal-exiting.Success criteria / verification
EACCES: mkdir '/tmp/gh-aw/sandbox/firewall/logs'./tmp/gh-aw/sandboxfrom a prior rootless run still starts the firewall and invokes the agent./tmp/gh-aw/sandboxis fully removed (no root-owned residue) after each run.Existing-issue correlation
EAI_AGAIN), [aw-failures] Copilot CLI false-red — runs marked failure (exit 1) after safe-outputs succeed, via "numerous permission denied" [Content truncated due to length] #41636 (Copilot exit-1 after outputs succeed), or [aw-failures] Claude false-red — log_parser_bootstrap fails completed runs on empty logEntries (Avenger, Daily Rendering Scripts [Content truncated due to length] #41885 (Claude empty-logEntriesparse). Shares the rootless-ownership root-cause family with [aw-failures] Claude false-red — log_parser_bootstrap fails completed runs on empty logEntries (Avenger, Daily Rendering Scripts [Content truncated due to length] #41885 remediation Add workflow: githubnext/agentics/weekly-research #2 but fires pre-flight (agent never invoked) rather than post-teardown.Analyzed run IDs: 28413001230 (representative), comparator 28413042897.
References: §28413001230 · §28413042897