Skip to content

Symex: fork isolation, concrete-detection hardening, bulk-memory hook surface, explore_next#590

Merged
kumarak merged 1 commit into
mainfrom
feature/symex-fork-isolation-and-hardening
May 6, 2026
Merged

Symex: fork isolation, concrete-detection hardening, bulk-memory hook surface, explore_next#590
kumarak merged 1 commit into
mainfrom
feature/symex-fork-isolation-and-hardening

Conversation

@pgoodman
Copy link
Copy Markdown
Collaborator

@pgoodman pgoodman commented May 6, 2026

Summary

  • Fork isolation_fork_child was not propagating three per-path fields (findings, _region_at_suspension, _lazy_regions_used) to child paths; sibling paths shared the parent's mutable state. Fixed by propagating all three fields on fork, matching what Path.clone() already did. Also fixed _init_seeded_path for the same fields.

  • Per-path CoW for concrete writes_make_default_mem_write previously wrote concrete values only to shared ConcreteMemory, so any fork that wrote through the default handler would have its write visible to sibling paths. Concrete writes now also place per-byte BitVecVal entries in the per-path shadow, giving copy-on-write isolation. Shadow reads use a new _shadow_range_is_concrete helper to distinguish fully-concrete shadow ranges (unwrap to int) from symbolic ones (return z3 expression), preserving analyst-visible symbolic shape.

  • Concrete-detection hardening — narrow isinstance(v, int) checks at terminal-use sites (address coercion, branch resolution, is_true) replaced with _concrete_int / _concrete_bool helpers that call z3.simplify so a structurally non-trivial but provably-constant z3 expression (e.g. Concat(BVV, BVV) + 0 from a multi-byte shadow load) does not force a spurious MemAddrSuspension or branch fork.

  • Endian enum — all "little" / "big" string literals and str(self._engine.endian) cargo-cult wrappers replaced with Endian.LITTLE / Endian.BIG from _types.py.

  • _memop_name helper — bulk-op name derivation now drives off mx.ir.MemOp(int(op)).name so the name list stays in sync with the C++ enum automatically instead of a hand-maintained dict.

  • Bulk-memory hook surface — new intercept.bulk_memory(op="memcpy") (and other op names) hook axis. The C++ substrate calls mem_bulk_op; Python dispatches to any registered analyst handlers and then to a per-op default decomposer that re-fires the existing memory_read / memory_write hooks per byte. Supported ops: memcpy, memmove, memset, bzero, memcmp, memchr, strlen, strnlen, strcmp, strncmp, strchr, strrchr, strcpy, stpcpy, strncpy, stpncpy, strcat, strncat.

  • explore_next / from_path=engine.explore(..., from_path=P) clones the prior path's per-path state into a fresh initial path for the new target function. path.explore_next(target) is the sugar.

  • Phase 15 test correction — a C switch with no default: label still has an implicit default IR edge (three successors total); the test assertion for that case was wrong. Corrected to expect three completed paths with return values {10, 20, 0}.

🤖 Generated with Claude Code

… surface, explore_next

- _fork_child was not propagating findings / _region_at_suspension /
  _lazy_regions_used to child paths; all three now copied on fork,
  matching Path.clone(). Same fix in _init_seeded_path.

- Concrete writes in _make_default_mem_write now land in both shared
  ConcreteMemory and the per-path shadow (as BitVecVal bytes), giving
  copy-on-write isolation across forks. _shadow_range_is_concrete
  distinguishes fully-concrete shadow ranges (unwrap to int) from
  symbolic ones (return z3 expr), preserving analyst-visible shape.

- _concrete_int / _concrete_bool helpers call z3.simplify so a
  structurally non-trivial but provably-constant z3 expression
  (e.g. Concat(BVV, BVV) + 0 from a multi-byte shadow load) does
  not force a spurious MemAddrSuspension or branch fork. Used in
  extract_addr, _default_branch, is_true, and resolve_branch.

- All "little" / "big" string literals and str(engine.endian)
  wrappers replaced with Endian.LITTLE / Endian.BIG.

- _memop_name derives bulk-op names from mx.ir.MemOp(int(op)).name
  so the mapping stays in sync with the C++ enum automatically.

- New intercept.bulk_memory(op=...) hook axis. The C++ substrate
  calls mem_bulk_op; Python dispatches to registered handlers then
  to per-op default decomposers that re-fire memory_read /
  memory_write per byte. Supported: memcpy/memmove/memset/bzero/
  memcmp/memchr/strlen/strnlen/strcmp/strncmp/strchr/strrchr/
  strcpy/stpcpy/strncpy/stpncpy/strcat/strncat.

- engine.explore(..., from_path=P) clones the prior path's per-path
  state into a fresh initial path for the new target function.
  path.explore_next(target) is the sugar. Tests in test_explore_next.

- Phase 15 no-default switch test corrected: a switch with no
  default: label still has an implicit default IR edge (three
  successors); expected return set is {10, 20, 0}, not {10, 20}.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@pgoodman pgoodman requested a review from kumarak May 6, 2026 01:35
@kumarak kumarak merged commit 3d07cd5 into main May 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants