Add "loop" feature to instructions in static extractors.#3075
Conversation
There was a problem hiding this comment.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
CHANGELOG updated or no update needed, thanks! 😄
There was a problem hiding this comment.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
There was a problem hiding this comment.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
There was a problem hiding this comment.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
There was a problem hiding this comment.
Code Review
This pull request introduces the 'inside loop' characteristic for basic blocks in the BinExport2 extractor. It adds a new utility module, capa/features/extractors/loops.py, which uses networkx to identify vertices within cycles. Feedback identifies a structural issue in the new loops module where code precedes the license header and lacks necessary imports. Additionally, suggestions were made to move a local import to the top level for PEP 8 compliance and to simplify edge collection logic using a list comprehension.
CHANGELOG updated or no update needed, thanks! 😄
66d9ff9 to
9b38843
Compare
|
Thank you for iterating on this! After reviewing the implementation and thinking about how to best align this with Here is the suggested implementation strategy: 1. Reuse the existing
|
9b38843 to
80384c8
Compare
80384c8 to
a3c4cf1
Compare
Detecting API happening in a loop is an effective approach for exploits leveraging racing conditions. A classic example is DirtyCow (CVE-2016-5195) By detecting madvise calls inside a loop with MADV_DONTNEED argument. ``` scopes: static: basic block features: - and: - api: madvise - number: 4 # Constant for MADV_DONTNEED - characteristic: inside loop ``` A more recent example CVE-2024-50066 can be covered by ``` - and: - api: madvise - number: 25 # MADV_COLLAPSE - characteristic: inside loop ```
a3c4cf1 to
cbc7287
Compare
|
Thanks for your comments @mike-hunhoff I have made changes according to you suggestions, maybe you want to take a look again. However, I have a few more questions:
If it looks good to you, I'll make the similar changes to other extractors. |
williballenthin
left a comment
There was a problem hiding this comment.
id be curious to see some examples of rules that use this new feature
| os: set[str] | ||
| arch: set[str] | ||
|
|
||
| def __post_init__(self): |
There was a problem hiding this comment.
we haven't used post_init yet in this codebase so i wonder if you could do this in a way that's more consistent.
perhaps via a constructor or from_foo(...) classmethod.
There was a problem hiding this comment.
maybe this could be placed into AnalysisContext (as a dict keyed by vertex)?
There was a problem hiding this comment.
I don't think put it into AnalysisContext is a good idea for now, as AnalysisContext is more about global data can be repeated used, but this one is only used within a function scope. Depending on the program complexity, the data may not be very small if we keep the all loops information across the entire scan life-span.
Keeping it inside FunctionContext will just let the data in RAM within single function feature generation, and will be GC-ed right after.
Although I feel post_init is kinda standard approach for dataclass, I can also just put the code inside get_functions of BinExport2FeatureExtractor, make it part of the intialiser list.
How about that?
I gave out some snippets of rule examples in the first comment. How does it look to you? |
Detecting API happening in a loop is an effective approach for exploits leveraging racing conditions.
A classic example is DirtyCow (CVE-2016-5195)
By detecting madvise calls inside a loop with MADV_DONTNEED argument.
A more recent example CVE-2024-50066 can be covered by
Checklist