BFS Atomic Counter Task by QuantamHD · Pull Request #342 · The-OpenROAD-Project/OpenSTA

QuantamHD · 2026-04-07T23:48:40Z

No description provided.

Signed-off-by: Ethan Mahintorabi <ethanmoon@google.com>

CLAassistant · 2026-04-07T23:48:46Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

dsengupta0628 · 2026-04-09T19:58:32Z

Some initial feedback (more to come):

No tests or benchmarks. A change to the core BFS traversal should include regression tests (timing correctness) and performance benchmarks (runtime comparison on representative designs).
No PR description. The motivation, design rationale, and expected improvements are not documented.
There are issues with both OpenSTA/OpenROAD regressions. Suggest to please merge with latest OpenROAD/OpenSTA master and rerun regressions

dsengupta0628 · 2026-04-09T20:28:45Z

The most critical issue here is in-degree count/decrement mismatch (topological ordering violation)

computeInDegrees() deduplicates per unique successor vertex:

  std::set<Vertex*> counted_successors;
  // ...
  if (counted_successors.insert(to_vertex).second) {
      in_degrees_[to_vertex->objectIdx()].fetch_add(1, ...);
  }

But enqueueAdjacentVertices() decrements per edge (deduped via processed_edges_):

  inserted = processed_edges_.insert(edge).second;  // per edge, not per vertex
  if (inserted) {
      int old_deg = in_degrees_[to_vertex->objectIdx()].fetch_sub(1, ...);
      if (old_deg == 1) { /* process vertex */ }
  }

When multiple edges connect the same two vertices (e.g., different timing arc sets with different when conditions), the in-degree is incremented once but decremented multiple times. Concrete example:

Vertex A has edges e1, e2 to C. Vertex B has edge e3 to C.
computeInDegrees: C gets in-degree = 2 (A counted once, B once)
enqueueAdjacentVertices(A):
- e1: decrement C: 2→1, old_deg=2, not ready
- e2: decrement C: 1→0, old_deg=1, process C
But B hasn't been visited yet — C is processed before all predecessors are done.

This violates topological ordering. C's delay calculation uses stale/uninitialized data from B's contribution. Fix: either remove the counted_successors dedup in computeInDegrees (count per edge), or dedup per target vertex in enqueueAdjacentVertices.

--One possible fix would be----
Count per-edge in computeInDegrees (remove the dedup), and remove the counted_successors set entirely in computeInDegrees, so that in-degree counts every edge passing the predicate, matching the per-edge decrement in enqueueAdjacentVertices.

dsengupta0628 · 2026-04-09T20:44:20Z

One comment regarding the change- this change bypasses the incremental delay tolerance.

Old behavior
With the old BfsFwdIterator, if a vertex's output slews didn't change, enqueueAdjacentVertices was not called — successors were never enqueued, and the entire downstream cone was skipped. This is the key incremental optimization: a small change that doesn't affect slews stops
propagating immediately.

New behavior
In the new enqueueAdjacentVertices, when a successor's in-degree reaches 0, , it dispatches this lambda:


  dispatch_queue_->dispatch([this, to_vertex](size_t tid) {
      current_thread_id = tid;
      visitors_[tid]->visit(to_vertex);     // → findVertexDelay (may or may not propagate)
      visit_count_->fetch_add(1, ...);
      enqueueAdjacentVertices(to_vertex);   // always propagates
  });

Two calls to enqueueAdjacentVertices(to_vertex) happen:

Conditionally from findVertexDelay inside the visitor (only if slews changed)
Unconditionally from the lambda itself

If slews didn't change, call 1 is skipped. But call 2 runs anyway, decrements all successors' in-degrees, and dispatches any that become ready. The processed_edges_ set is empty for those edges (since call 1 was skipped), so they all get processed.

This cascades - every successor is visited, and every successor unconditionally propagates to its successors, and so on through the entire downstream cone.

But with new in-degree increment change, you probably need this. If a vertex C has 2 edges from A and B, and say A's slews changed but B's didn't:

With the old BFS: A propagates, enqueues C. B doesn't propagate. C is still visited (reached via A). Works fine — any single predecessor can trigger a visit.
With in-degree counting: A decrements C (2→1). If B doesn't decrement (because its slews didn't change), C stays at in-degree 1 forever — it's never processed, even though A's change means C needs recomputation.

So the unconditional propagation is necessary to make the in-degree mechanism work. But it means every vertex in the downstream cone is visited during incremental updates, even when slew changes die out early. For a small ECO on a large design, the old code might recompute tens
of vertices; the new code recomputes the entire fanout cone.

If the parallel speedup outweighs the incremental efficiency loss, it's a valid tradeoff. Do you see this making a dent in runtime?
Or you can use in-degree BFS only for non-incremental (first pass), fall back to the old BfsFwdIterator for incremental updates where selective propagation matters most?

dsengupta0628 · 2026-04-09T21:11:23Z

One comment regarding the change- this change bypasses the incremental delay tolerance.

Old behavior With the old BfsFwdIterator, if a vertex's output slews didn't change, enqueueAdjacentVertices was not called — successors were never enqueued, and the entire downstream cone was skipped. This is the key incremental optimization: a small change that doesn't affect slews stops propagating immediately.

New behavior In the new enqueueAdjacentVertices, when a successor's in-degree reaches 0, , it dispatches this lambda:
  dispatch_queue_->dispatch([this, to_vertex](size_t tid) {
      current_thread_id = tid;
      visitors_[tid]->visit(to_vertex);     // → findVertexDelay (may or may not propagate)
      visit_count_->fetch_add(1, ...);
      enqueueAdjacentVertices(to_vertex);   // always propagates
  });
Two calls to enqueueAdjacentVertices(to_vertex) happen:

Conditionally from findVertexDelay inside the visitor (only if slews changed)

Unconditionally from the lambda itself

If slews didn't change, call 1 is skipped. But call 2 runs anyway, decrements all successors' in-degrees, and dispatches any that become ready. The processed_edges_ set is empty for those edges (since call 1 was skipped), so they all get processed.

This cascades - every successor is visited, and every successor unconditionally propagates to its successors, and so on through the entire downstream cone.

But with new in-degree increment change, you probably need this. If a vertex C has 2 edges from A and B, and say A's slews changed but B's didn't:

With the old BFS: A propagates, enqueues C. B doesn't propagate. C is still visited (reached via A). Works fine — any single predecessor can trigger a visit.

With in-degree counting: A decrements C (2→1). If B doesn't decrement (because its slews didn't change), C stays at in-degree 1 forever — it's never processed, even though A's change means C needs recomputation.

So the unconditional propagation is necessary to make the in-degree mechanism work. But it means every vertex in the downstream cone is visited during incremental updates, even when slew changes die out early. For a small ECO on a large design, the old code might recompute tens of vertices; the new code recomputes the entire fanout cone.

If the parallel speedup outweighs the incremental efficiency loss, it's a valid tradeoff. Do you see this making a dent in runtime? Or you can use in-degree BFS only for non-incremental (first pass), fall back to the old BfsFwdIterator for incremental updates where selective propagation matters most?

Also note that the unconditional call was necessary because the in-degree mechanism requires every vertex to decrement its successors (otherwise successors are stuck). But it means the incremental optimization (stop propagating when slews don't change) is defeated. Now, processed_edges_ was added to prevent the two calls from double-decrementing the same edge.
If you remove the unconditional call then processed_edges_ + its mutex become unnecessary. No mutex, no std::set, no O(log n) tree operations. Just an atomic decrement per edge — the same cost as incrementing in computeInDegrees.

* fix power_json.tcl * get rid of the if/else statements throughout

Signed-off-by: James Cherry <cherry@parallaxsw.com>

BFS Atomic Counter Task

c4e516e

Signed-off-by: Ethan Mahintorabi <ethanmoon@google.com>

eder-matheus pushed a commit to eder-matheus/OpenSTA that referenced this pull request Apr 11, 2026

Report power as JSON (The-OpenROAD-Project#342)

56e4bd8

* fix power_json.tcl * get rid of the if/else statements throughout

eder-matheus pushed a commit to eder-matheus/OpenSTA that referenced this pull request Apr 11, 2026

rm PR The-OpenROAD-Project#342 turd

4afa443

Signed-off-by: James Cherry <cherry@parallaxsw.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BFS Atomic Counter Task#342

BFS Atomic Counter Task#342
QuantamHD wants to merge 1 commit intoThe-OpenROAD-Project:masterfrom
QuantamHD:atomic_task

QuantamHD commented Apr 7, 2026

Uh oh!

CLAassistant commented Apr 7, 2026

Uh oh!

dsengupta0628 commented Apr 9, 2026

Uh oh!

dsengupta0628 commented Apr 9, 2026

Uh oh!

dsengupta0628 commented Apr 9, 2026

Uh oh!

dsengupta0628 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

QuantamHD commented Apr 7, 2026

Uh oh!

CLAassistant commented Apr 7, 2026

Uh oh!

dsengupta0628 commented Apr 9, 2026

Uh oh!

dsengupta0628 commented Apr 9, 2026

Uh oh!

dsengupta0628 commented Apr 9, 2026

Uh oh!

dsengupta0628 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants