Add "loop" feature to instructions in static extractors. by larchchen · Pull Request #3075 · mandiant/capa

larchchen · 2026-05-15T14:53:25Z

Detecting API happening in a loop is an effective approach for exploits leveraging racing conditions.

A classic example is DirtyCow (CVE-2016-5195)
By detecting madvise calls inside a loop with MADV_DONTNEED argument.

  scopes:
    static: basic block
  features:
    - and:
      - api: madvise
      - number: 4 # Constant for MADV_DONTNEED
      - characteristic: inside loop

A more recent example CVE-2024-50066 can be covered by

  - and:
    - api: madvise
    - number: 25 # MADV_COLLAPSE
    - characteristic: inside loop

Checklist

No CHANGELOG update needed

No new tests needed

No documentation update needed

This submission includes AI-generated code and I have provided details in the description.

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

CHANGELOG updated or no update needed, thanks! 😄

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

gemini-code-assist

Code Review

This pull request introduces the 'inside loop' characteristic for basic blocks in the BinExport2 extractor. It adds a new utility module, capa/features/extractors/loops.py, which uses networkx to identify vertices within cycles. Feedback identifies a structural issue in the new loops module where code precedes the license header and lacks necessary imports. Additionally, suggestions were made to move a local import to the top level for PEP 8 compliance and to simplify edge collection logic using a list comprehension.

CHANGELOG updated or no update needed, thanks! 😄

mike-hunhoff · 2026-05-15T17:55:39Z

Thank you for iterating on this! After reviewing the implementation and thinking about how to best align this with capa's architecture and performance constraints, I suggest a unified path forward that simplifies the rule authoring experience and minimizes performance overhead.

Here is the suggested implementation strategy:

1. Reuse the existing `loop` characteristic

Instead of introducing a new name like inside loop at the basic block scope, let's just yield the existing Characteristic("loop") at the instruction scope for instructions contained within a cycle.

Why: This avoids inflating the rule vocabulary and maintains perfect backward compatibility. By extracting it at the narrowest scope (instruction), it will automatically bubble up to basic block and function scopes properly.

2. Perform Graph Analysis Once Per Function (Performance)

To avoid the heavy performance penalty of running networkx graph analysis repeatedly, I suggest computing the loop vertices once at the function level and sharing them with the instruction extractor:

In the function extractor (where we already build edges for the current has_loop check), use your SCC logic to extract the set of basic block addresses that participate in cycles.
Store this set in the transient FunctionHandle.ctx dictionary (e.g., fh.ctx["cyclic_blocks"]).
In the instruction extractor, simply check if the instruction's parent basic block address is in that set. This is a fast $O(1)$ lookup with no heavy object creation at the instruction level, and the memory is freed as soon as capa moves to the next function.

3. Cross-Backend Consistency

Since capa rules strive to be backend-agnostic, this pattern should be applied across all static backends (Vivisect, Ghidra, IDA, etc.) that already support the function-level loop characteristic.

This approach fits perfectly into capa's style of using transient handle contexts for performance optimization while keeping the rule language intuitive. Let me know what you think!

Detecting API happening in a loop is an effective approach for exploits leveraging racing conditions. A classic example is DirtyCow (CVE-2016-5195) By detecting madvise calls inside a loop with MADV_DONTNEED argument. ``` scopes: static: basic block features: - and: - api: madvise - number: 4 # Constant for MADV_DONTNEED - characteristic: inside loop ``` A more recent example CVE-2024-50066 can be covered by ``` - and: - api: madvise - number: 25 # MADV_COLLAPSE - characteristic: inside loop ```

larchchen · 2026-05-18T09:17:40Z

Thanks for your comments @mike-hunhoff

I have made changes according to you suggestions, maybe you want to take a look again.

However, I have a few more questions:

If we are firing a "loop" characteristics for every instruction in a loop, wouldn't it resulting into too large feature spaces to match as well?
The instruction feature generation runs before the function feature generation inside static.py, so I need to calculate the cyclic_blocks at the time of FunctionContext __post_init__ instead of at the function iteration.

If it looks good to you, I'll make the similar changes to other extractors.

williballenthin

id be curious to see some examples of rules that use this new feature

williballenthin · 2026-05-18T10:48:21Z

    os: set[str]
    arch: set[str]

+    def __post_init__(self):


we haven't used post_init yet in this codebase so i wonder if you could do this in a way that's more consistent.

perhaps via a constructor or from_foo(...) classmethod.

maybe this could be placed into AnalysisContext (as a dict keyed by vertex)?

I don't think put it into AnalysisContext is a good idea for now, as AnalysisContext is more about global data can be repeated used, but this one is only used within a function scope. Depending on the program complexity, the data may not be very small if we keep the all loops information across the entire scan life-span.

Keeping it inside FunctionContext will just let the data in RAM within single function feature generation, and will be GC-ed right after.

Although I feel post_init is kinda standard approach for dataclass, I can also just put the code inside get_functions of BinExport2FeatureExtractor, make it part of the intialiser list.

How about that?

larchchen · 2026-05-18T11:48:58Z

id be curious to see some examples of rules that use this new feature

I gave out some snippets of rule examples in the first comment. How does it look to you?

github-actions Bot previously requested changes May 15, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

Comment thread capa/features/extractors/loops.py Outdated

Comment thread capa/features/extractors/binexport2/__init__.py Outdated

Comment thread capa/features/extractors/binexport2/__init__.py Outdated

larchchen force-pushed the inside-loop-characteristics branch 2 times, most recently from 66d9ff9 to 9b38843 Compare May 15, 2026 15:17

larchchen force-pushed the inside-loop-characteristics branch from 9b38843 to 80384c8 Compare May 18, 2026 09:04

larchchen changed the title ~~Add "inside loop" feature to basic blocks in Binexport2 extractor.~~ Add "loop" feature to instructions in static extractors. May 18, 2026

larchchen force-pushed the inside-loop-characteristics branch from 80384c8 to a3c4cf1 Compare May 18, 2026 09:07

larchchen force-pushed the inside-loop-characteristics branch from a3c4cf1 to cbc7287 Compare May 18, 2026 09:10

williballenthin reviewed May 18, 2026

View reviewed changes

Conversation

larchchen commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mike-hunhoff commented May 15, 2026

1. Reuse the existing loop characteristic

2. Perform Graph Analysis Once Per Function (Performance)

3. Cross-Backend Consistency

Uh oh!

larchchen commented May 18, 2026

Uh oh!

williballenthin left a comment

Choose a reason for hiding this comment

Uh oh!

williballenthin May 18, 2026

Choose a reason for hiding this comment

Uh oh!

williballenthin May 18, 2026

Choose a reason for hiding this comment

Uh oh!

larchchen May 18, 2026

Choose a reason for hiding this comment

Uh oh!

larchchen commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

larchchen commented May 15, 2026 •

edited

Loading

1. Reuse the existing `loop` characteristic