Optimize Some Statement Evaluation and Fix Correctness Mismatches by mike-hunhoff · Pull Request #3083 · mandiant/capa

mike-hunhoff · 2026-05-19T20:03:18Z

This PR resolves a performance leak and a correctness mismatch inside capa's logical AST engine's Some statement class.

1. Overview of the Problems

A. Performance Bug (Unnecessary Child Evaluation in Optional Blocks)

In capa rules, optional: blocks (which are extremely common, used in over 199 rules in capa-rules) are compiled internally as a Some statement with a threshold count of 0:

Some(count=0, children=[Child1, Child2, ...])

During the fast matching pass (short_circuit=True), capa should stop evaluating children as soon as a statement's threshold is met. Since count=0 is met by default before evaluating any child (0 >= 0), the statement should instantly succeed.

However, because the threshold check was placed inside the loop after the first child's evaluation, capa was forced to evaluate at least the first child feature of every single optional block in the ruleset. If that child was a slow regular expression or substring search, it wasted significant CPU cycles on features that had zero impact on the rule matching outcome.

B. Correctness Bug (`Some(0, [])` Mismatch)

An empty optional statement Some(0, []) evaluated to False in short-circuit mode (because the loop didn't run and fell through), but evaluated to True in non-short-circuit mode (because 0 >= 0 is True). This created a silent correctness mismatch between fast and slow matching passes.

2. The Fix

We perform the threshold validation before entering the evaluation loop when short-circuiting is enabled:

        if short_circuit:
            results: list[Result] = []
            satisfied_children_count = 0
            if satisfied_children_count >= self.count:
                # short circuit immediately if threshold is already met (e.g. count <= 0)
                return Result(True, self, results)

            for child in self.children:
                # ... evaluate child ...

This guarantees:

Immediate Bypass: Some(0, [Children]) returns True immediately in microseconds without evaluating any child features during the fast pass.
Unified Semantics: Some(0, []) now correctly and consistently evaluates to True in both matching passes.

3. Verification & Performance Results

A. Unit Tests

Added two new targeted unit test cases in tests/test_engine.py:

Correctness: Asserts that Some(0, []) evaluates to True in both short-circuit modes.
Bypass: Asserts that Some(0, [Child]) does not evaluate the child feature under short_circuit=True.

All 19 engine unit tests passed cleanly:

Results (0.13s):
      19 passed

B. Real-World Performance Impact (`mimikatz.exe_`)

We profile-audited capa's execution on the large mimikatz.exe_ binary before and after this change to measure the exact real-world performance impact:

Total Some evaluations: 25,112 calls

Metric	Before Fix (Un-optimized)	After Fix (Optimized)	Direct Savings
Cumulative Time (`cumtime`)	`1.003` seconds	`0.956` seconds	-47 milliseconds (-5%)

Impact:

Shaves 47 milliseconds of execution time off a single analysis run on mimikatz.exe_, offering a 5% direct speedup for Some statement processing.
Scales even higher if users analyze files with custom rulesets containing slow optional features (like massive regex or substring scans).

Checklist

No CHANGELOG update needed

No new tests needed

No documentation update needed

This submission includes AI-generated code and I have provided details in the description.

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

CHANGELOG updated or no update needed, thanks! 😄

gemini-code-assist

Code Review

This pull request introduces an optimization in the evaluate method of the Some class within the capa engine. It now checks if the satisfaction threshold is already met (e.g., when count is 0) before iterating through children, allowing for immediate short-circuiting. Corresponding unit tests were added to verify the correctness of Some(0, ...) and ensure that children are not evaluated unnecessarily. I have no feedback to provide.

williballenthin

nice catch on the performance optimization, though i'd politely push back on the implementation as-is today. feel free to forward my feedback to Gemini 😁

williballenthin · 2026-05-19T21:59:32Z

    assert bool(Some(0, [Number(1)]).evaluate({Number(0): {ADDR1}})) is True
    assert bool(Some(1, [Number(1)]).evaluate({Number(0): {ADDR1}})) is False

+    # Test Some(0, []) correctness (should evaluate to True in both short-circuit modes)


imho this isn't a reasonable structure for a rule, so this test is noise.

williballenthin · 2026-05-19T22:01:54Z

+    assert Some(0, []).evaluate({Number(0): {ADDR1}}, short_circuit=False).success is True
+
+    # Test Some(0, [Child]) optimization (child must not be evaluated when short-circuit is True)
+    class TrackingFeature(Number):


i find this to be an overly clever way to test the behavior.

i'm not sure that the behavior is worth the code maintenance investment for this test. i don't think the the behavior will likely regress. therefore i'd propose to remove these tests, too, but i'm open to discussion.

thoughts?

williballenthin · 2026-05-19T22:03:53Z

-            results = []
+            results: list[Result] = []
            satisfied_children_count = 0
+            if satisfied_children_count >= self.count:


it doesn't make sense for count to be less than 0, so this condition is better expressed as if self.count == 0. if we don't have validation that count is at least 0, then we should add that in the rule loading code, not try to handle the case here.

Optimize Some Statement Evaluation and Fix Correctness Mismatches

2e2e3c7

github-actions Bot previously requested changes May 19, 2026

View reviewed changes

update CHANGELOG

ffa7ecc

mike-hunhoff added the bug Something isn't working label May 19, 2026

mike-hunhoff requested review from a team and williballenthin May 19, 2026 20:05

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

williballenthin requested changes May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Some Statement Evaluation and Fix Correctness Mismatches#3083

Optimize Some Statement Evaluation and Fix Correctness Mismatches#3083
mike-hunhoff wants to merge 2 commits into
masterfrom
fix/optimize-some-statement

mike-hunhoff commented May 19, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

williballenthin left a comment •

edited

Loading

Uh oh!

williballenthin May 19, 2026

Uh oh!

williballenthin May 19, 2026

Uh oh!

williballenthin May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mike-hunhoff commented May 19, 2026

1. Overview of the Problems

A. Performance Bug (Unnecessary Child Evaluation in Optional Blocks)

B. Correctness Bug (Some(0, []) Mismatch)

2. The Fix

3. Verification & Performance Results

A. Unit Tests

B. Real-World Performance Impact (mimikatz.exe_)

Impact:

Checklist

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

williballenthin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

williballenthin May 19, 2026

Choose a reason for hiding this comment

Uh oh!

williballenthin May 19, 2026

Choose a reason for hiding this comment

Uh oh!

williballenthin May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

B. Correctness Bug (`Some(0, [])` Mismatch)

B. Real-World Performance Impact (`mimikatz.exe_`)

williballenthin left a comment •

edited

Loading