Skip to content

BFS Atomic Counter Task#342

Open
QuantamHD wants to merge 1 commit intoThe-OpenROAD-Project:masterfrom
QuantamHD:atomic_task
Open

BFS Atomic Counter Task#342
QuantamHD wants to merge 1 commit intoThe-OpenROAD-Project:masterfrom
QuantamHD:atomic_task

Conversation

@QuantamHD
Copy link
Copy Markdown
Contributor

No description provided.

Signed-off-by: Ethan Mahintorabi <ethanmoon@google.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@dsengupta0628
Copy link
Copy Markdown
Contributor

Some initial feedback (more to come):

  • No tests or benchmarks. A change to the core BFS traversal should include regression tests (timing correctness) and performance benchmarks (runtime comparison on representative designs).
  • No PR description. The motivation, design rationale, and expected improvements are not documented.
  • There are issues with both OpenSTA/OpenROAD regressions. Suggest to please merge with latest OpenROAD/OpenSTA master and rerun regressions

@dsengupta0628
Copy link
Copy Markdown
Contributor

The most critical issue here is in-degree count/decrement mismatch (topological ordering violation)

computeInDegrees() deduplicates per unique successor vertex:

  std::set<Vertex*> counted_successors;
  // ...
  if (counted_successors.insert(to_vertex).second) {
      in_degrees_[to_vertex->objectIdx()].fetch_add(1, ...);
  }

But enqueueAdjacentVertices() decrements per edge (deduped via processed_edges_):

  inserted = processed_edges_.insert(edge).second;  // per edge, not per vertex
  if (inserted) {
      int old_deg = in_degrees_[to_vertex->objectIdx()].fetch_sub(1, ...);
      if (old_deg == 1) { /* process vertex */ }
  }

When multiple edges connect the same two vertices (e.g., different timing arc sets with different when conditions), the in-degree is incremented once but decremented multiple times. Concrete example:

  • Vertex A has edges e1, e2 to C. Vertex B has edge e3 to C.
  • computeInDegrees: C gets in-degree = 2 (A counted once, B once)
  • enqueueAdjacentVertices(A):
    • e1: decrement C: 2→1, old_deg=2, not ready
    • e2: decrement C: 1→0, old_deg=1, process C
  • But B hasn't been visited yet — C is processed before all predecessors are done.

This violates topological ordering. C's delay calculation uses stale/uninitialized data from B's contribution. Fix: either remove the counted_successors dedup in computeInDegrees (count per edge), or dedup per target vertex in enqueueAdjacentVertices.

--One possible fix would be----
Count per-edge in computeInDegrees (remove the dedup), and remove the counted_successors set entirely in computeInDegrees, so that in-degree counts every edge passing the predicate, matching the per-edge decrement in enqueueAdjacentVertices.

@dsengupta0628
Copy link
Copy Markdown
Contributor

One comment regarding the change- this change bypasses the incremental delay tolerance.

Old behavior
With the old BfsFwdIterator, if a vertex's output slews didn't change, enqueueAdjacentVertices was not called — successors were never enqueued, and the entire downstream cone was skipped. This is the key incremental optimization: a small change that doesn't affect slews stops
propagating immediately.

New behavior
In the new enqueueAdjacentVertices, when a successor's in-degree reaches 0, , it dispatches this lambda:


  dispatch_queue_->dispatch([this, to_vertex](size_t tid) {
      current_thread_id = tid;
      visitors_[tid]->visit(to_vertex);     // → findVertexDelay (may or may not propagate)
      visit_count_->fetch_add(1, ...);
      enqueueAdjacentVertices(to_vertex);   // always propagates
  });

Two calls to enqueueAdjacentVertices(to_vertex) happen:

  1. Conditionally from findVertexDelay inside the visitor (only if slews changed)
  2. Unconditionally from the lambda itself

If slews didn't change, call 1 is skipped. But call 2 runs anyway, decrements all successors' in-degrees, and dispatches any that become ready. The processed_edges_ set is empty for those edges (since call 1 was skipped), so they all get processed.

This cascades - every successor is visited, and every successor unconditionally propagates to its successors, and so on through the entire downstream cone.

But with new in-degree increment change, you probably need this. If a vertex C has 2 edges from A and B, and say A's slews changed but B's didn't:

  • With the old BFS: A propagates, enqueues C. B doesn't propagate. C is still visited (reached via A). Works fine — any single predecessor can trigger a visit.
  • With in-degree counting: A decrements C (2→1). If B doesn't decrement (because its slews didn't change), C stays at in-degree 1 forever — it's never processed, even though A's change means C needs recomputation.

So the unconditional propagation is necessary to make the in-degree mechanism work. But it means every vertex in the downstream cone is visited during incremental updates, even when slew changes die out early. For a small ECO on a large design, the old code might recompute tens
of vertices; the new code recomputes the entire fanout cone.

If the parallel speedup outweighs the incremental efficiency loss, it's a valid tradeoff. Do you see this making a dent in runtime?
Or you can use in-degree BFS only for non-incremental (first pass), fall back to the old BfsFwdIterator for incremental updates where selective propagation matters most?

@dsengupta0628
Copy link
Copy Markdown
Contributor

One comment regarding the change- this change bypasses the incremental delay tolerance.

Old behavior With the old BfsFwdIterator, if a vertex's output slews didn't change, enqueueAdjacentVertices was not called — successors were never enqueued, and the entire downstream cone was skipped. This is the key incremental optimization: a small change that doesn't affect slews stops propagating immediately.

New behavior In the new enqueueAdjacentVertices, when a successor's in-degree reaches 0, , it dispatches this lambda:


  dispatch_queue_->dispatch([this, to_vertex](size_t tid) {
      current_thread_id = tid;
      visitors_[tid]->visit(to_vertex);     // → findVertexDelay (may or may not propagate)
      visit_count_->fetch_add(1, ...);
      enqueueAdjacentVertices(to_vertex);   // always propagates
  });

Two calls to enqueueAdjacentVertices(to_vertex) happen:

  1. Conditionally from findVertexDelay inside the visitor (only if slews changed)
  2. Unconditionally from the lambda itself

If slews didn't change, call 1 is skipped. But call 2 runs anyway, decrements all successors' in-degrees, and dispatches any that become ready. The processed_edges_ set is empty for those edges (since call 1 was skipped), so they all get processed.

This cascades - every successor is visited, and every successor unconditionally propagates to its successors, and so on through the entire downstream cone.

But with new in-degree increment change, you probably need this. If a vertex C has 2 edges from A and B, and say A's slews changed but B's didn't:

  • With the old BFS: A propagates, enqueues C. B doesn't propagate. C is still visited (reached via A). Works fine — any single predecessor can trigger a visit.
  • With in-degree counting: A decrements C (2→1). If B doesn't decrement (because its slews didn't change), C stays at in-degree 1 forever — it's never processed, even though A's change means C needs recomputation.

So the unconditional propagation is necessary to make the in-degree mechanism work. But it means every vertex in the downstream cone is visited during incremental updates, even when slew changes die out early. For a small ECO on a large design, the old code might recompute tens of vertices; the new code recomputes the entire fanout cone.

If the parallel speedup outweighs the incremental efficiency loss, it's a valid tradeoff. Do you see this making a dent in runtime? Or you can use in-degree BFS only for non-incremental (first pass), fall back to the old BfsFwdIterator for incremental updates where selective propagation matters most?

Also note that the unconditional call was necessary because the in-degree mechanism requires every vertex to decrement its successors (otherwise successors are stuck). But it means the incremental optimization (stop propagating when slews don't change) is defeated. Now, processed_edges_ was added to prevent the two calls from double-decrementing the same edge.
If you remove the unconditional call then processed_edges_ + its mutex become unnecessary. No mutex, no std::set, no O(log n) tree operations. Just an atomic decrement per edge — the same cost as incrementing in computeInDegrees.

eder-matheus pushed a commit to eder-matheus/OpenSTA that referenced this pull request Apr 11, 2026
* fix power_json.tcl

* get rid of the if/else statements throughout
eder-matheus pushed a commit to eder-matheus/OpenSTA that referenced this pull request Apr 11, 2026
Signed-off-by: James Cherry <cherry@parallaxsw.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants