Skip to content

fix(security): drop symlink entries before visitor sees them#12

Open
aaronjmars wants to merge 1 commit into
perplexityai:mainfrom
aaronjmars:security/walker-skip-file-symlinks
Open

fix(security): drop symlink entries before visitor sees them#12
aaronjmars wants to merge 1 commit into
perplexityai:mainfrom
aaronjmars:security/walker-skip-file-symlinks

Conversation

@aaronjmars
Copy link
Copy Markdown

Summary

The walker surfaces file-typed symlinks to its visitor, and the per-ecosystem scanners open files through os.Open which follows symbolic links. A symlink planted inside a scan root therefore makes the scanner read and parse an arbitrary file outside the configured scope, then emit a package record whose package_name / version come from the link target. SECURITY.md states the threat model as "a read-only filesystem walker" that "does not parse source code"; in practice it parses whatever JSON/RFC-822 file the symlink points at, which can be a config file the operator never asked to scan.

This PR extends the existing directory-symlink defense (walk.go:212-218 comment: "the walker never crosses into an unrelated subtree by indirection") to file-typed entries: any fs.DirEntry whose Type() carries os.ModeSymlink is dropped before the visitor sees it.

Impact

Concrete chain:

  1. A malicious package's postinstall hook drops a symlink at node_modules/<pkg>/package.json pointing at, say, ~/.config/<other-app>/state.json (or any JSON file readable by the invoking user). bumblebee never executes that hook, but the symlink is already on disk by the time the next scheduled scan runs.
  2. The walker enters the project tree and surfaces the symlink to the npm dispatcher.
  3. npm.ScanNodeModulesPackageJSON calls os.Open(path) which follows the link; f.Stat().IsRegular() is true because the target is a regular file; json.Unmarshal succeeds whenever the target has parseable JSON.
  4. A package record is emitted with ecosystem=npm, source_file=<symlink path inside the scan root>, and package_name/version lifted verbatim from the target file's name/version fields. Anything the JSON target exposes through those two keys is now part of the NDJSON batch the operator's receiver sees.

Verified end-to-end against cmd/bumblebee built from main (v0.1.2-0.20260523091658-611dc7920847): a package.json symlink at proj/node_modules/evil/package.jsonproj/../elsewhere/oracle-target.json produces a clean exit-0 NDJSON record carrying the target's field values; nothing in the parser path filters or even notices the indirection.

Adjacency notes (these still work as designed; they're not regressed by this PR):

  • The MCP scanner doesn't put env values onto the record schema, so MCP env-block secrets stay out even if the symlink retargets at a Claude / Gemini host config.
  • The PEP-566 parser only reads Name and Version headers, so *.dist-info/METADATA symlinks leak only those two fields, not Author-email etc.
  • Per-scanner readBounded already rejects non-regular files via info.Mode().IsRegular(), so directory symlinks fail closed at the open. The hole is specifically file-typed symlinks that resolve to regular files.

Severity: medium. The blast radius is bounded by what the invoking user can already read (no privilege escalation), but the exfiltration path is silent (the records sink is the operator's receiver, not the user's terminal), and the trigger is one symlink in any of the parsed file names. CWE-59 (Improper Link Resolution Before File Access — "Link Following").

Location

  • internal/walk/walk.go:208-217 — walker visitor branch that surfaces every entry, including symlinks.
  • internal/ecosystem/*/*.go:readBounded()os.Open follows the link; the subsequent IsRegular() check is true when the target is a regular file.

Fix

internal/walk/walk.go: skip any fs.DirEntry whose Type() & os.ModeSymlink is non-zero before calling the visitor. The existing directory-symlink Lstat inside the d.IsDir() branch was unreachable in practice — filepath.WalkDir reports directory symlinks as symlink-typed (d.IsDir() == false), so the directory case never entered that block. The single early-return now covers both directory- and file-typed symlinks at the right layer, mirroring the symmetric handling already present in exposure.go:161 for --exposure-catalog directory loading.

The package doc comment is updated to call out the new invariant ("symlink entries are never surfaced to the visitor (neither directory-typed nor file-typed)") so the next reader of walk.go doesn't reintroduce the gap.

Detected by

Aeon manual review — drove this scan from the perpendicular-axis read on SECURITY.md (the explicit "Env values are never captured" and "does not parse source code" claims invited closer reading of the read-only-walker invariant; the directory-symlink defense was the existing positive control). Scanner triage was supporting context: semgrep (p/security-audit + p/owasp-top-ten + p/secrets across Go), trufflehog filesystem + git, and osv-scanner --recursive all came back clean (0 ERROR/WARNING, 0 verified secrets, 0 advisories) on the v0.1.1 tag, so the finding is purely from reading the walker + per-scanner readBounded together.

  • Severity: medium
  • CWE-59 (Improper Link Resolution Before File Access)

Verification

  • go test ./...19/19 packages pass, including:

    • new TestWalkSkipsFileSymlinks in internal/walk/walk_test.go (asserts the walker does not surface a file-typed symlink while still surfacing the sibling real file).
    • new TestFileSymlinkDoesNotExfiltrateOutOfScope in internal/scanner/scanner_test.go (the end-to-end shape: plant node_modules/evil/package.json -> elsewhere/stolen.json, run Run(Config{...Profile: project, Roots: ...}), assert no record carries the target's name/version and the symlink path never appears as source_file).
    • the existing TestSymlinkLoopSafety (directory-symlink-loop termination guarantee) still passes — the loop guard via seen[dev+inode] is preserved.
  • Reproducer (out-of-tree, attached in the agent's scratch dir) builds cmd/bumblebee from this branch and re-runs the attack shape from the Impact section above:

    • Before this PR: files_considered=1, records=1 — one package record with package_name=SECRET-FIELD-FROM-OUTSIDE-SCAN-ROOT, version=v-leaked-1.2.3, ecosystem=npm, source_file=.../node_modules/evil-pkg/package.json.
    • After this PR: files_considered=0, records=0 — the symlink is dropped at the walker layer, the npm dispatcher is never called, and the scan_summary closes clean.

No new runtime dependencies. No behavior change for non-symlink entries. Pnpm/yarn workspaces — which use OS symlinks at the directory level (node_modules/<top-level-pkg>node_modules/.pnpm/<pkg>@<v>/...) — are unaffected because filepath.WalkDir already declined to descend into directory symlinks; the real package metadata lives under the hard-linked / copy-on-write .pnpm store path, which the walker reaches via the normal traversal, not via the symlink.


Filed by Aeon.

A file-typed symlink planted inside a scan root makes the walker
surface the entry to its visitor; the per-ecosystem scanners then open
through os.Open which follows the link, parse the unrelated target as
if it were the expected metadata file, and emit a package record whose
package_name/version come from the target. With one symlink at e.g.
node_modules/<pkg>/package.json -> ~/.config/<other-app>/state.json an
attacker can have arbitrary readable JSON fields ride out through the
configured records sink.

Extend the existing directory-symlink defense (walk.go comment 'the
walker never crosses into an unrelated subtree by indirection') to
file entries: drop every fs.DirEntry whose Type carries ModeSymlink
before the visitor is called. Mirrors exposure.go:161 which already
takes this stance for --exposure-catalog directory loading.

Detected by Aeon manual review.
Severity: medium
CWE-59 (Improper Link Resolution Before File Access)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant