fix(project): resolve audit findings, review findings, and code-scanning alerts#332
Conversation
…t, docs Follow-up to #324 (high-severity batch) closing out the remaining findings from the 2026-06-11 codebase audit, plus review feedback from that PR. Security hardening: - Mirror the empty/whitespace-secret HMAC guard from the GitHub and Linear webhook verifiers into slack-verify.ts and webhook-create-task.ts so all four verifiers fail closed on a misconfigured empty signing secret (PR #324 review follow-up) - Sanitize GitHub issue/comment content and wrap it in untrusted delimiters on the agent's local/dry-run prompt path (context.py) - Validate repoFullName/sha shape before URL and S3-key interpolation in github-deployment-status.ts Correctness and resilience: - Post-once marker for Linear final-status comments so partial-batch stream retries cannot post duplicates (Linear has no comment edit) - CLI: memoize in-flight Cognito refresh (concurrent-refresh race), guard corrupt credentials.json with a friendly CliError, bound waitForTask with transient-error retry and a wall-clock ceiling, events --all pagination, validated --limit - Agent: explicit fail-closed deny for non-dict tool_input, default branch detection falls back to main on OSError, push_resolve push failures surface as a PR comment instead of silent success - validateMaxBudgetUsd rejects NaN/Infinity Observability: - CloudWatch alarm on the fan-out DLQ depth (silent notification outages were invisible) and an async-invoke DLQ for the screenshot webhook processor - bgagent watch renders structured approval-milestone metadata Contracts and tests: - constants-parity test pins CLI literals to contracts/constants.json - New hermetic tests for repo.py/post_hooks.py git/gh argv, wait.ts, debug redaction, and the four hardened webhook verifiers - Drop two unused @aws-sdk agentcore deps from the CLI (yarn.lock pruned); add files/engines to cli and docs package manifests Docs: workflow-model vocabulary in agent/README and AGENTS.md, ADR-014 marked accepted, curly-quote and mise-command fixes in guides, .DS_Store gitignore case fix; Starlight mirrors regenerated.
Follow-up to the medium/low audit batch (ea4c94a), resolving the confirmed findings from the high-effort plugin review of that commit plus the three open CodeQL clear-text-logging alerts. Review findings: - Restore the original transient-backoff curve (2**attempt): the extraction to cli/src/retry.ts had silently halved every retry delay, doubling pressure on a degraded backend; a new test pins the exact per-attempt jitter window so the curve can't drift again - events --all --limit N now caps TOTAL events client-side instead of forwarding limit as the server page size (which returned the whole stream in N-event pages) - Only Cognito auth-rejection errors map to "Session expired"; a transient network blip during the (now shared) refresh tells the user to retry instead of re-login - waitForTask timeout/transient-exhaustion exits with code 2 (CliError now carries exitCode) so scripts can tell "CLI gave up waiting" from a genuinely FAILED task (exit 1); ceiling check moved to loop top to cover the transient branch - Gate verbose-log redaction behind isVerbose() — no more deep-copy of every request/response body on the non-verbose hot path - One isUsableHmacSecret() chokepoint (shared/hmac-secret.ts) replaces the eight hand-copied empty-secret guards across the four webhook verifiers; one saveDispatchMarker() helper owns the never-throw post-once marker semantics in the fan-out plane - Agent: GitHub issue content is sanitized at fetch_github_issue (the source) so the GitHubIssue model never carries raw untrusted strings; shared FakeRunCmd/make_task_config test helpers moved to conftest.py; vestigial watch.ts re-export removed Code-scanning alerts (py/clear-text-logging, js/clear-text-logging): - shell.log() and server._warn_cw() emit redacted lines via a shared os.write sink (same pattern as _debug_cw); warn messages were previously printed unredacted — tests switched capsys → capfd - bgagent admin invite-user writes the credential share-block to a 0600 file under ~/.bgagent/invites/ instead of printing the password to stdout (scrollback/CI capture outlive "share once")
isadeks
left a comment
There was a problem hiding this comment.
Thanks for the thorough hardening pass — the empty-HMAC-secret consolidation, the fail-closed hooks, and the new resilience around waitForTask/token-refresh are all solid, well-documented work. I ran a high-effort multi-angle review plus verification. Below: blockers first, then nits, then what's done well.
🔴 Blockers
1. events --all --limit N returns stale pagination → silent data loss
cli/src/commands/events.ts:96
drainAllEvents slices the events array to limit but returns the raw last-page pagination (set at line 94 before the slice). So --output json emits has_more: true and a next_token that points past the events that were just dropped.
events.push(...result.data);
pagination = result.pagination; // full last page's cursor
if (limit !== undefined && events.length >= limit) {
return { events: events.slice(0, limit), pagination }; // events sliced, pagination is NOT
}Trigger: bgagent events <id> --all --limit 75 --output json on a task with ≥101 events → returns events 1–75 plus next_token pointing after event 100. A script that consumes the 75 and follows the token resumes at event 101, silently skipping 76–100.
The regression test --all --limit N caps the TOTAL asserts only parsed.data and never inspects parsed.pagination, so the bug ships green. Fix: when slicing, clear/recompute the returned cursor (e.g. pagination: { has_more: false, next_token: null } once the client-side cap is hit), and extend the test to assert on pagination.
2. status.test.ts leaks process.exitCode = 1 → CLI test command exits non-zero despite green assertions
cli/test/commands/status.test.ts:47,158
The last test (--wait with --output json …) asserts expect(process.exitCode).toBe(1) but the suite's beforeEach/afterEach never reset it — afterEach only does consoleSpy.mockRestore(). Jest reports all 344 CLI tests passing, but the process exits 1, so the CLI test step can fail CI even with green assertions. (Confirmed by a second reviewer's run: assertions pass, command exits 1.)
watch.test.ts:244 already does this correctly — process.exitCode = undefined; in beforeEach. Mirror that here (add process.exitCode = undefined to beforeEach, or reset in afterEach).
3. _note_unpushed_commits silently swallows a failed reviewer notification
agent/src/post_hooks.py:235
The push-failure path runs run_cmd(['gh','pr','comment', …], check=False). With check=False, run_cmd logs but does not raise on a non-zero exit, and the return code is never inspected — so the surrounding except Exception is dead code for the gh-failure case. A failed gh pr comment (missing scope, rate limit, not-a-PR) reports success while posting nothing.
This compounds the situation: the caller is already returning a stale PR URL as success because the push failed, and the one mechanism meant to warn the reviewer is itself a silent no-op. Inspect the return code and emit a distinct WARN when the comment doesn't post (mirror ensure_pushed's return push_result.returncode == 0 pattern).
4. fetch_github_issue crashes on comments by deleted accounts (local/dry-run paths)
agent/src/context.py:62
author=sanitize_external_content(c["user"]["login"]), # c["user"] unguarded
body=sanitize_external_content(c["body"] or ""), # body IS guardedGitHub returns "user": null for comments authored by since-deleted accounts → c["user"] is None → None["login"] raises TypeError: 'NoneType' object is not subscriptable, aborting issue hydration. The body is defended with or "" but the author is not. The sanitize wrapper this PR added sits on this exact line and didn't pick up the guard.
Reachable on local batch (pipeline.py:731) and DRY_RUN=1 (pipeline.py:1194) only — the AgentCore production path uses pre-hydrated context and is safe. Guard with (c.get("user") or {}).get("login", "") or similar.
🟡 Nits / discuss
-
events --limitmeans two different things (cli/src/commands/events.ts:35). Help text says "Max total number of events to return", but on the non---allpathopts.limitis forwarded as the server's per-pagelimit(which the server caps at 100), while--alltreats it as a true total. Sobgagent events <id> --limit 500(no--all) silently returns ≤100. Align the help text with the behavior, or make both paths enforce a total. -
events --allMAX_PAGES=100 cap is silent in text mode (events.ts:104). When the drain stops at the page cap withhas_morestill true, the--alltext branch never prints the(More events available)hint that the non---allpath does — a capped drain looks identical to a complete one. Print a truncation notice when the cap trips. -
waitForTasknow has a non-overridable 24h ceiling (cli/src/wait.ts:70). Pre-PR this was an unboundedwhile(true)poll; it now throwsCliError(exit 2) at 24h with no--timeoutflag or env var to restore the old behavior. Likely intentional and recoverable (re-run resumes), but it's a default-behavior flip — worth a CHANGELOG line and ideally a--timeout/--max-waitflag. -
isTransientErroris brittle (cli/src/retry.ts:56). Network transients are matched only viaerr instanceof TypeError && /fetch failed|network/i.test(err.message), with noerr.cause.codecheck. The common ECONNRESET/ENOTFOUND cases happen to wrap as'fetch failed'(so they match), and the 5xx-via-ApiErrorpath is robust — but an undici'terminated'(mid-stream socket close) or any non-TypeErrortransient is misclassified as fatal, defeating the retry budget the PR is trying to strengthen. Consider checkingerr.cause?.code. -
isExpiredNaN propagation (cli/src/auth.ts:116). A corrupt-but-valid-JSONcredentials.jsonwith a non-datetoken_expirymakesnew Date(...).getTime()→ NaN, andDate.now() >= NaN - BUFFERis always false → token treated as never-expiring → opaque 401 instead of a refresh. Not introduced by this PR and unreachable on the happy path (token_expiryis always written via.toISOString()), but a one-lineNumber.isFiniteguard (treat NaN as expired) would harden it. -
0o600mode not re-applied on overwrite (cli/src/config.ts:96,cli/src/commands/admin.ts).fs.writeFileSync(path, data, { mode: 0o600 })only honorsmodewhen creating a file; on overwrite of an existing file the permission bits are left untouched. Low severity (the happy path creates0o700dirs +0o600files), but a re-login / re-invite into a pre-existing loose-perms file won't re-tighten it. A trailingfs.chmodSync(path, 0o600)makes the intent durable. -
isValidReporegex allowsowner/..(cdk/src/handlers/shared/github-deployment-status.ts:114).^[A-Za-z0-9._-]+\/[A-Za-z0-9._-]+$permits.in segments, soowner/..passes; the new "path traversal" test only exercises the double-slash case (trivially rejected by the single-slash rule), giving false confidence. Impact is bounded (HMAC-gated, URL normalizes,replaceAll('/','_')neutralizes the S3 key), but the regex doesn't enforce theowner/reposhape the comment claims. -
renderMilestoneSuffixdrops unrecognized metadata (cli/src/commands/watch.ts:160). It returns as soon as any known field (severity/request_id/scope/timeout_s/matching_rule_ids) is present, so the JSON-dump fallback for unknown keys (guarded byparts.length === 0) never runs — contradicting the docstring's "so nothing is lost". Text-mode only; JSON mode serializes verbatim. -
Screenshot DLQ has no alarm (
cdk/src/constructs/github-screenshot-integration.ts:154). The newWebhookProcessorDlqis a localconstwith no CloudWatch alarm and isn't exposed, while the sibling fan-out DLQ in this same PR got a publicdlqDepthAlarm(threshold 1). The diff comment promises "operator inspection," but nothing notifies an operator — the exact silent-outage gap the fan-out alarm closes. Consider mirroring the alarm here. -
wait.tsoff-by-one in the abort message (cli/src/wait.ts:89). The guard isconsecutiveTransientFailures > MAX_TRANSIENT_FAILURES(=5) after a pre-increment, so it aborts on the 6th failure, but the error message says "after 5 consecutive transient failures." Cosmetic — use>=or say "after 6."
✅ Done well
- HMAC-secret consolidation is the right altitude. Collapsing eight hand-copied empty-secret checks into one
isUsableHmacSecret()type guard, applied at both fetch and verify points, turns "a future webhook source forgets the invariant" into a structural impossibility. All 8 call sites route correctly and Slack's cache-delete path is preserved. - Linear idempotency marker correctly fixes a real pre-PR double-post and brings Linear in line with Slack's collective-terminal dedup and GitHub's edit-in-place. The
saveDispatchMarkernever-throw / benign-CCF classification is exactly the right contract (a marker write must never turn a successful post into a retry). - Retry-curve regression caught and pinned. Noticing the extraction had silently halved the backoff (
2**(attempt-1)vs2**attempt) and then pinning the exact per-attempt jitter window with a test so it can't drift again is excellent defensive work. - Verbose-redaction gating behind
isVerbose()correctly short-circuits the deep-copy + stringify on thewatchpoll hot path. - Fail-closed hooks: non-dict
tool_inputnow denies explicitly instead of relying on anAttributeErrorswallowed by the engine's broad except — the right instinct. - CodeQL alerts handled at the source (os.write sink after redaction; password written to a
0600file instead of stdout) rather than suppressed. - Test investment: hermetic tests for the highest-side-effect agent functions (
repo.py/post_hooks.py) that were previously only ever mocked, theconstants-paritydrift guard, and the newwait.test.ts/debug.test.tssuites are real coverage wins.
Verification summary
- Linear / HMAC / webhook / fanout targeted tests pass; no PR-diff blocker in the Linear changes.
- Agent:
1051 passed. CDK targeted changed tests:303 passed(--coverage=false; full CDK needsyarn install --frozen-lockfilefirst —node_moduleswas stale/missing@aws-sdk/s3-presigned-post). CLI: 344 assertions pass but the command exits 1 due to blocker #2.
…s; enforce model-level sanitization Addresses the three PR #332 review findings: - Linear final-status comments are no longer dropped on transient failures: postIssueComment/addIssueReaction now return a classified LinearPostResult ({ ok } | { ok: false, retryable }) — network errors, timeouts, 5xx and 429 are retryable; auth, GraphQL errors and token-resolution failures stay terminal. dispatchToLinear throws on retryable failures so routeEvent records an infra rejection and the record lands in batchItemFailures for a Lambda retry. Safe by construction: the post-once marker is only persisted after a successful post, so the retry posts the missing comment or short-circuits on the marker. - Construct-test gaps closed: fanout-consumer.test.ts pins the FanOutDlqDepthAlarm (metric binding, threshold 1, notBreaching) so a refactor can't silently drop the only persistent signal of a fan-out outage; new github-screenshot-integration.test.ts asserts the WebhookProcessorDlq exists (14-day retention, enforceSSL) and is wired as the processor Lambda's async-invoke DeadLetterConfig. - GitHubIssue/IssueComment sanitization is now structural: field validators run sanitize_external_content at construction, so every construction path (fetch_github_issue, model_validate from cache, tests, future fetchers) yields a sanitized instance — the docstring promise "consumers must not re-sanitize" is enforced by the type rather than one caller's discipline. fetch_github_issue drops its now-redundant call sites; idempotency pinned by tests.
The chmod calls added for the durable-permissions fix tripped
@typescript-eslint/no-magic-numbers (the rule exempts object-literal
properties like { mode: 0o600 } but not bare call arguments). One named
constant in config.ts now owns the secret-file mode for credentials and
invite files.
|
Thanks @isadeks for the thorough multi-angle review — every one of the four blockers was real, and the verification detail (especially confirming the exit-1-with-green-assertions behavior empirically) made them fast to act on. All four blockers and all nits are now fixed and pushed. Fixed in response to your review (
|
|
Also deployed and tested live the changes |
Summary
Follow-up to #324 (high-severity audit batch). This PR closes out the remaining medium/low findings from the 2026-06-11 full-codebase audit, the confirmed findings from a high-effort multi-agent code review of that work, the PR #324 review follow-up (whitespace-secret guards), and all three open CodeQL code-scanning alerts.
Two commits:
ea4c94a(audit findings) andbc9d593(review findings + code-scanning alerts).Security hardening
slack-verify.tsandwebhook-create-task.ts(PR fix(project): fix multiple issues identified #324 review follow-up), then consolidates all eight hand-copied checks into one shared chokepoint —isUsableHmacSecret()incdk/src/handlers/shared/hmac-secret.ts— so a future webhook source can't forget the invariant.HMAC('', body)is computable by anyone; an empty secret must never verify. Forgery tests added per verifier.agent/src/context.py):fetch_github_issuenow routes title, body, and every comment throughsanitize_external_contentas theGitHubIssuemodel is built, andassemble_promptwraps the block in explicit untrusted-content delimiters. Previously the local/dry-run path injected raw issue text into the prompt (the orchestrator only sanitizes the deployed path), and the model carried unsanitized strings any future consumer would trust.github-deployment-status.tsvalidates shape, not just presence:repoFullNamemust matchowner/repoandshamust be hex before either is interpolated into the GitHub API URL / S3 key (defense-in-depth; payload is already HMAC-verified).tool_inputis now an explicit fail-closed deny inagent/src/hooks.pyinstead of an accidentalAttributeErrorcaught by the engine's broad except.Code-scanning alerts (CodeQL, all 3 open alerts)
agent/src/shell.py(py/clear-text-logging, high):log()now emits via anos.write(1, ...)sink afterredact_secrets, matching the patternserver._debug_cwalready used. Output-observing tests switchedcapsys→capfd.agent/src/server.py(py/clear-text-logging, high):_warn_cwpreviouslyprinted messages unredacted; it now routes through_redact_cached_credentialsplus a shared_emit_stdout_linehelper (extracted from_debug_cw, removing that duplication).cli/src/commands/admin.ts(js/clear-text-logging, high):admin invite-userno longer prints the Cognito password to stdout. The share-block is written to a0600file under~/.bgagent/invites/and the command prints the path with instructions to share over a secure channel and delete. Scrollback and CI capture outlive the "share once" intent.Correctness & resilience
linear_final_comment_event_idpost-once marker onTaskRecord(conditional write) prevents duplicate comments when a sibling channel's infra rejection re-runs all dispatchers — Linear has no comment-edit API. A sharedsaveDispatchMarker()helper owns the never-throw / benign-conditional-failure semantics for current and future channels.cli/src/retry.tshad silently halved every delay (2**(attempt-1)vs the original2**attempt), doubling retry pressure on a degraded backend. Restored, and a new test pins the exact per-attempt jitter window so the curve can't drift undetected again.events --all --limit Ncaps total events, not page size — it previously returned the entire stream in N-event pages.NotAuthorizedException/UserNotFoundException) map to "Session expired — run bgagent login"; transient network blips now say "Token refresh failed — retry."waitForTaskbounded and script-friendly: tolerates up to 5 consecutive transient failures with jittered backoff, enforces a 24h wall-clock ceiling checked at loop top, and exits with code 2 on timeout/exhaustion (CliErrornow carriesexitCode) so wrappers can distinguish "CLI gave up waiting" from a genuinely FAILED task (exit 1).~/.bgagent/credentials.jsonnow throws a friendly "run bgagent login"CliErrorinstead of a rawSyntaxError.validateMaxBudgetUsdrejects NaN/Infinity (was accepted as a valid budget); full test block added for a previously untested validator.detect_default_branchfalls back tomainonOSError/SubprocessError(e.g. missinggh), as its docstring always promised;push_resolvepush failures now surface as a PR comment instead of silently returning the stale PR URL as success.Observability
bgagent watchrenders structured approval-milestone metadata (severity, request_id, timeout, matching rules) in text mode instead of a bare★ approval_requested.isVerbose().Contracts, tests, hygiene
constants-parity.test.tspins the CLI's mirrored limits tocontracts/constants.json, turning silent drift into a CI failure.repo.pyclone/branch/credential-helper argv,post_hooks.pypush/PR flows) — these were previously only ever mocked out; sharedFakeRunCmd/make_task_configfixtures inconftest.py.wait.test.ts(timer-mocked poll/retry/ceiling/exit codes),debug.test.ts(redaction), plus extended auth/events/watch coverage.@aws-sdk/*-agentcoreruntime deps from the CLI (yarn.lock pruned); addedfiles/enginesto cli and docs manifests.task_typevocabulary, ADR-014 marked accepted, curly-quote andmise run //command-form fixes,.gitignore.DS_Storecase fix; Starlight mirrors regenerated.Verification
Full monorepo build (
mise run build) green: agent 1051 tests (78.8% coverage), cdk 2029 tests (111 suites), cli 344 tests (28 suites), types-sync drift check OK,cdk synth+ cdk-nag clean. The three code-scanning alerts should auto-close on the post-merge default-branch scan.