Skip to content

Latest STA changes 06/29#379

Open
dsengupta0628 wants to merge 37 commits into
The-OpenROAD-Project:masterfrom
The-OpenROAD-Project-staging:sta_latest_0629
Open

Latest STA changes 06/29#379
dsengupta0628 wants to merge 37 commits into
The-OpenROAD-Project:masterfrom
The-OpenROAD-Project-staging:sta_latest_0629

Conversation

@dsengupta0628

Copy link
Copy Markdown
Contributor

Resolved merge conflict in .github/workflows/ci.yml: pinned actions/checkout to v7 (9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0), keeping SHA-pin style over upstream's @v7 tag.

Replaced deprecated LibertyCell::hasSequentials() with isSequential() (identical impl) at 9 call sites across 5 test files: TestPower.cc, TestLibertyClasses.cc, TestLibertyStaCallbacks.cc, TestLibertyStaBasicsB.cc, TestLibertyStaBasics.cc. Clears all -Wdeprecated-declarations warnings from these files.

calewis and others added 30 commits June 18, 2026 13:52
…lver (#450)

This commit refactors the Dartu-Menezes-Pileggi (DMP) effective capacitance
algorithm solver to use the Eigen library, and implements a Determinant Guarded
solver to handle singular and near-singular Jacobians safely and efficiently.

Key changes:
1. Refactored the DMP solver to use Eigen stack-based data structures,
   replacing the custom Crout LU decomposition and solver.
2. Extracted the linear solver logic into a dedicated helper function
   DmpAlg::solveNewtonStep.
3. Implemented a "Determinant Guarded" solver:
   - Manually checks the determinant of the Jacobian (safety guard).
   - Throws a DmpError if the determinant is dangerously close to zero (|det| < 1e-12),
     safely triggering the lumped capacitance fallback.
   - Otherwise, solves using Eigen's highly optimized analytical inverse (fast path).
4. Added documentation in the comments on how to easily swap back to LU
   decomposition with partial pivoting (PartialPivLU) if any numerical
   precision issues arise in the future.

Benchmark Results (Optimized Release Build, 5,288 DMP calls):

| Test Case | Calls | 1. Original (Crout LU) | 2. Eigen (PartialPivLU) | 3. Eigen (Determinant Guarded) | Speedup (3 vs 1) | Speedup (3 vs 2) |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| power | 2,608 | 1.8822 ms | 1.6791 ms | 1.2702 ms | +32.5% | +24.4% |
| power_vcd | 2,608 | 1.8729 ms | 1.6704 ms | 1.2768 ms | +31.8% | +23.6% |
| spef_parasitics | 24 | 0.0193 ms | 0.0187 ms | 0.0140 ms | +27.5% | +25.1% |
| mcmm3 | 48 | 0.0728 ms | 0.0756 ms | 0.0817 ms | -12.2% | -8.1% |
| Total DMP Time | 5,288 | 3.8472 ms | 3.4438 ms | 2.6427 ms | +31.3% | +23.3% |
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: James Cherry <cherry@CerezoBook.local>
James Cherry and others added 3 commits June 27, 2026 20:43
doc
Signed-off-by: James Cherry <cherry@CerezoBook.local>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
@dsengupta0628 dsengupta0628 self-assigned this Jun 29, 2026
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 4 committers have signed the CLA.

✅ dsengupta0628
❌ James Cherry
❌ calewis
❌ dependabot[bot]


James Cherry seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Newton-Raphson solver in DmpCeff to use Eigen, optimizes generated clock and latch arrival propagation by postponing latch evaluations, introduces new graph traversal helpers, and ensures input delay reference pin edges survive graph rebuilds. The review feedback identifies several critical issues: a potential stale value bug in DmpCeff when nr_order is not 3, an uninitialized pointer and double-free risk in PathEnumFaninVisitor due to an empty copy constructor, numerical instability in the determinant-based singularity check, potential null pointer dereferences in findDelays and visitFanouts, and a division-by-zero risk in Power::findActivity if the clock period is zero.

Comment thread dcalc/DmpCeff.cc
Comment on lines +141 to +143
Eigen::Vector3d x = Eigen::Vector3d::Zero();
if (nr_order_ == 3)
x_[DmpParam::ceff] = ceff;
x[DmpParam::ceff] = ceff;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When nr_order_ is not 3 (e.g., nr_order_ == 2 for DmpOnePole), ceff_ is never updated to the input ceff before newtonRaphson(x) is called. Since DmpOnePole::evalDmpEqns relies on ceff_, this leads to stale or uninitialized values being used during the Newton-Raphson iterations. Setting ceff_ = ceff; in the else branch resolves this issue.

Suggested change
Eigen::Vector3d x = Eigen::Vector3d::Zero();
if (nr_order_ == 3)
x_[DmpParam::ceff] = ceff;
x[DmpParam::ceff] = ceff;
Eigen::Vector3d x = Eigen::Vector3d::Zero();
if (nr_order_ == 3)
x[DmpParam::ceff] = ceff;
else
ceff_ = ceff;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a behavior change. In the base revision, ceff was likewise assigned only under if (nr_order_ == 3); the refactor preserves that exactly. For the order-2 path (DmpOnePole/DmpZeroC2), ceff_ is not a Newton-Raphson variable - it is a fixed value equal to c1_, set in init() (ceff_ = c1) and re-set in DmpZeroC2::gateDelaySlew() prior to use.

evalDmpEqns therefore reads a valid ceff_. The suggested else ceff_ = ceff; would only re-assign c1_ to itself. No stale/uninitialized read occurs.

Comment thread search/PathEnum.cc
Comment on lines 357 to +363
{
}

PathEnumFaninVisitor::~PathEnumFaninVisitor()
{
delete pred_;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The custom copy constructor PathEnumFaninVisitor::PathEnumFaninVisitor(const PathEnumFaninVisitor &visitor) is empty and does not call the base class copy constructor. This causes the base class to be default-constructed, leaving pred_ uninitialized or null in the copied visitor, which will cause crashes when copy() is called. Additionally, manually deleting pred_ in the destructor will cause a double-free if the base class copy constructor is called and copies the pointer. To fix this robustly, consider passing true for the delete_pred parameter in the PathVisitor constructor so PathVisitor manages the lifetime of pred_, removing the custom destructor entirely, and removing the custom copy constructor entirely to let the compiler generate a correct member-wise copy constructor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are hallucinating. The copy constructor is not empty - it is a delegating constructor that forwards to the primary constructor, which executes PathVisitor(new EnumPred(path_enum), false, path_enum) and allocates a fresh EnumPred. Each copied visitor therefore owns its own distinct pred_.

There is no double-free: PathVisitor::~PathVisitor() deletes only tag_cache_, never pred_. pred_ is deleted exactly once, in ~PathEnumFaninVisitor(), per object.

Also note PathVisitor's constructor has no delete_pred parameter, so that part of the suggestion isn't applicable. The current ownership is correct.

Comment thread dcalc/DmpCeff.cc
Comment on lines +970 to 985
if (nr_order_ == 2) {
double det = fjac.topLeftCorner<2, 2>().determinant();
if (std::abs(det) < 1e-12) {
throw DmpError("Jacobian is singular (order 2)");
}
p.head<2>() = fjac.topLeftCorner<2, 2>().inverse() * -fvec.head<2>();
return p;
}
}

// Solves fjac_ * x = p_ for x, assuming fjac_ is LU form from luDecomp.
// Solution overwrites p_.
void
DmpAlg::luSolve()
{
const int size = nr_order_;

// Transform p_ allowing for leading zeros.
int non_zero = -1;
for (int i = 0; i < size; i++) {
int iperm = index_[i];
double sum = p_[iperm];
p_[iperm] = p_[i];
if (non_zero != -1) {
for (int j = non_zero; j <= i - 1; j++)
sum -= fjac_[i][j] * p_[j];
}
else {
if (sum != 0.0)
non_zero = i;
}
p_[i] = sum;
}
// Backsubstitution.
for (int i = size - 1; i >= 0; i--) {
double sum = p_[i];
for (int j = i + 1; j < size; j++)
sum -= fjac_[i][j] * p_[j];
p_[i] = sum / fjac_[i][i];
double det = fjac.determinant();
if (std::abs(det) < 1e-12) {
throw DmpError("Jacobian is singular (order 3)");
}
p = fjac.inverse() * -fvec;
return p;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The absolute determinant check std::abs(det) < 1e-12 is highly sensitive to the scaling of the matrix elements, which depend on physical units (e.g., Farads and seconds). If the units result in small values in fjac, the determinant can easily fall below 1e-12 even for a well-conditioned matrix, causing unexpected DmpError failures. Using a robust LU decomposition with partial pivoting (as documented in your comments) is much safer and numerically stable.

Suggested change
if (nr_order_ == 2) {
double det = fjac.topLeftCorner<2, 2>().determinant();
if (std::abs(det) < 1e-12) {
throw DmpError("Jacobian is singular (order 2)");
}
p.head<2>() = fjac.topLeftCorner<2, 2>().inverse() * -fvec.head<2>();
return p;
}
}
// Solves fjac_ * x = p_ for x, assuming fjac_ is LU form from luDecomp.
// Solution overwrites p_.
void
DmpAlg::luSolve()
{
const int size = nr_order_;
// Transform p_ allowing for leading zeros.
int non_zero = -1;
for (int i = 0; i < size; i++) {
int iperm = index_[i];
double sum = p_[iperm];
p_[iperm] = p_[i];
if (non_zero != -1) {
for (int j = non_zero; j <= i - 1; j++)
sum -= fjac_[i][j] * p_[j];
}
else {
if (sum != 0.0)
non_zero = i;
}
p_[i] = sum;
}
// Backsubstitution.
for (int i = size - 1; i >= 0; i--) {
double sum = p_[i];
for (int j = i + 1; j < size; j++)
sum -= fjac_[i][j] * p_[j];
p_[i] = sum / fjac_[i][i];
double det = fjac.determinant();
if (std::abs(det) < 1e-12) {
throw DmpError("Jacobian is singular (order 3)");
}
p = fjac.inverse() * -fvec;
return p;
}
if (nr_order_ == 2) {
auto lu = fjac.topLeftCorner<2, 2>().partialPivLu();
if (std::abs(lu.matrixLU().diagonal().prod()) < 1e-12) {
throw DmpError("Jacobian is singular (order 2)");
}
p.head<2>() = lu.solve(-fvec.head<2>());
return p;
}
auto lu = fjac.partialPivLu();
if (std::abs(lu.matrixLU().diagonal().prod()) < 1e-12) {
throw DmpError("Jacobian is singular (order 3)");
}
p = lu.solve(-fvec);
return p;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - you're right that std::abs(det) < 1e-12 is an absolute threshold and not scale-invariant, so it could in principle throw a spurious "singular" error on a well-conditioned but small-magnitude Jacobian. Note the documented LU fallback doesn't actually resolve this: lu.matrixLU().diagonal().prod() equals +/-det, so it uses the same threshold and has the same scale sensitivity - it only improves solve accuracy via pivoting. A real fix would need a relative test (normalize by matrix norm, or fullPivLu().isInvertible()). Will check with James before making any change here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with the OpenSTA developer: this isn't a real issue. OpenSTA always works in fixed internal units (seconds), so the Jacobian elements don't rescale with unit choice. So the guard acts as a genuine singularity check and won't false-positive on well-conditioned matrices. So not an issue now

Comment thread dcalc/GraphDelayCalc.cc
iter_->ensureSize();
if (incremental_)
seedInvalidDelays();
Stats stats(debug_, report_);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Removing the if (arc_delay_calc_) guard in findDelays could lead to null pointer dereferences or unnecessary calculations if arc_delay_calc_ is null. It is safer to keep the guard at the beginning of the function.

  if (!arc_delay_calc_)
    return;
  Stats stats(debug_, report_);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guard removal comes from upstream OpenSTA (it is already absent in the base revision of this PR; the diff is against upstream master). arc_delay_calc_ is established at Sta construction and is non-null whenever findDelays() runs. We're intentionally staying aligned with upstream here rather than re-adding a guard that cannot trigger.

Comment thread graph/Graph.cc
Comment on lines +578 to +583
void
Graph::visitFanouts(Vertex *vertex,
SearchPred *pred,
const VertexFn &fn)
{
if (pred->searchFrom(vertex)) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pred pointer is dereferenced directly without checking if it is null. To prevent potential null pointer dereferences, consider adding a null check or an assertion.

void
Graph::visitFanouts(Vertex *vertex,
                    SearchPred *pred,
                    const VertexFn &fn)
{
  if (pred && pred->searchFrom(vertex)) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pred is non-null by contract, and the sibling functions in this file (visitFanoutEdges, visitFanins, visitFaninEdges) all dereference it the same way. The sole caller (Genclks.cc) passes the address of a stack-local predicate. Adding a null check to only this one function would be inconsistent with the established pattern and guards a case that does not occur.

Comment thread power/Power.cc
Comment on lines +1593 to +1596
if (clk) {
float duty = clockDuty(clk);
return PwrActivity(2.0 / clk->period(), duty, PwrActivityOrigin::clock);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If clk->period() is 0.0, the expression 2.0 / clk->period() will result in a division by zero (producing infinity or NaN). It is safer to guard against a zero period.

    if (clk && clk->period() != 0.0) {
      float duty = clockDuty(clk);
      return PwrActivity(2.0 / clk->period(), duty, PwrActivityOrigin::clock);
    }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A clock with a zero period is degenerate - create_clock enforces a positive period - so clk->period() is never 0.0 here, and the other period() divisions in this file (e.g. clockDuty) are likewise unguarded. The null-clk guard added in this PR addresses the real defect. A zero-period check would be redundant.

James Cherry and others added 4 commits June 30, 2026 09:10
Signed-off-by: James Cherry <cherry@CerezoBook.local>
* make mode/scene pass as tcl objects

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

* address review comments

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

* address review comments take 2

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

* revert to using cmd mode name

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

---------

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
@dsengupta0628

Copy link
Copy Markdown
Contributor Author

CI fails because we need OpenROAD side changes to accomodate these:

The-OpenROAD-Project/OpenROAD#10790

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants