Enhance ghidra backend with existing project feature#3087
Enhance ghidra backend with existing project feature#3087saniyafatima07 wants to merge 12 commits into
Conversation
There was a problem hiding this comment.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
There was a problem hiding this comment.
Code Review
This pull request introduces support for Ghidra projects (.gpr files) by adding utilities to navigate Ghidra project structures and updating the loader to handle project-based analysis. The changes include automatic backend detection for Ghidra projects and updated CLI logic to skip generic file extraction when a project is provided. Review feedback highlights the need for better terminal formatting in error messages, improved resource management to prevent program leaks during exceptions, and the enablement of function filters for the Ghidra backend.
| project = project_cm.__enter__() | ||
| try: |
There was a problem hiding this comment.
Initialize program and consumer to None before the try block. This allows for safe cleanup in the except block if an error occurs after the program is opened but before the context wrapper is created.
| project = project_cm.__enter__() | |
| try: | |
| project = project_cm.__enter__() | |
| program, consumer = None, None | |
| try: |
| except Exception: | ||
| project_cm.__exit__(None, None, None) | ||
| tmpdir.cleanup() | ||
| if tmpdir: | ||
| tmpdir.cleanup() | ||
| raise |
There was a problem hiding this comment.
If an exception occurs after pyghidra.consume_program but before the GhidraContextWrapper is instantiated, the program resource is leaked. Adding a check to release the program in the except block ensures proper resource management.
| except Exception: | |
| project_cm.__exit__(None, None, None) | |
| tmpdir.cleanup() | |
| if tmpdir: | |
| tmpdir.cleanup() | |
| raise | |
| except Exception: | |
| if program is not None: | |
| program.release(consumer) | |
| project_cm.__exit__(None, None, None) | |
| if tmpdir: | |
| tmpdir.cleanup() | |
| raise |
There was a problem hiding this comment.
Please address @saniyafatima07 . This resource leak existed before, but didn't matter because capa created a temporary project each time. But now that we're using existing projects, we need to ensure capa doesn't leak the resource and lock the user out of their Ghidra database.
If you haven't already, another situation to test is how capa handles attempting to open a Ghidra database that is already locked, e.g. the user has it opened.
There was a problem hiding this comment.
Yeah, I have looked into the lock handling using LockException. I will ensure that the resources are properly released and no leaks occur.
| if backend == BACKEND_GHIDRA: | ||
| return {} | ||
|
|
||
| if input_format in STATIC_FORMATS: |
There was a problem hiding this comment.
Disabling extractor filters for the Ghidra backend prevents users from using --restrict-to-functions. Since Ghidra is a static analysis backend, it should support these filters. Merging this with the static format check allows filters to be applied correctly for Ghidra regardless of the input format detection.
| if backend == BACKEND_GHIDRA: | |
| return {} | |
| if input_format in STATIC_FORMATS: | |
| if input_format in STATIC_FORMATS or backend == BACKEND_GHIDRA: |
There was a problem hiding this comment.
@saniyafatima07 thoughts here? I'd expect the Ghidra backend to handle function restrictions set by the user. Is there a reason we're skipping them?
There was a problem hiding this comment.
I agree, Mike. Since Ghidra is a static-analysis backend, it should support function restrictions too. I had a slight misinterpretation while implementing this logic. I’ll correct it.
CHANGELOG updated or no update needed, thanks! 😄
|
@mike-hunhoff |
mike-hunhoff
left a comment
There was a problem hiding this comment.
Great work @saniyafatima07 ! I left comments for your review. I'll do some more thinking on how to best handle the .gpr tests in the meantime.
There was a problem hiding this comment.
Great work splitting up the code into helper functions to keep things concise.
| project = project_cm.__enter__() | ||
| try: |
| except Exception: | ||
| project_cm.__exit__(None, None, None) | ||
| tmpdir.cleanup() | ||
| if tmpdir: | ||
| tmpdir.cleanup() | ||
| raise |
There was a problem hiding this comment.
Please address @saniyafatima07 . This resource leak existed before, but didn't matter because capa created a temporary project each time. But now that we're using existing projects, we need to ensure capa doesn't leak the resource and lock the user out of their Ghidra database.
If you haven't already, another situation to test is how capa handles attempting to open a Ghidra database that is already locked, e.g. the user has it opened.
| if backend == BACKEND_GHIDRA: | ||
| return {} | ||
|
|
||
| if input_format in STATIC_FORMATS: |
There was a problem hiding this comment.
@saniyafatima07 thoughts here? I'd expect the Ghidra backend to handle function restrictions set by the user. Is there a reason we're skipping them?
| Please check out the [capa explorer documentation](/capa/ida/plugin/README.md). | ||
|
|
||
| ### Ghidra project support | ||
| Capa can analyze programs directly from Ghidra projects by specifying the project file path (`.gpr`). If the project contains multiple programs, set the `CAPA_GHIDRA_PROGRAM_PATH` environment variable to specify which program to analyze. For example: `CAPA_GHIDRA_PROGRAM_PATH=/myprogram capa /path/to/project.gpr`. |
There was a problem hiding this comment.
| Capa can analyze programs directly from Ghidra projects by specifying the project file path (`.gpr`). If the project contains multiple programs, set the `CAPA_GHIDRA_PROGRAM_PATH` environment variable to specify which program to analyze. For example: `CAPA_GHIDRA_PROGRAM_PATH=/myprogram capa /path/to/project.gpr`. | |
| capa can analyze programs directly from Ghidra projects by specifying the project file path (`.gpr`). If the project contains multiple programs, set the `CAPA_GHIDRA_PROGRAM_PATH` environment variable to specify which program to analyze. For example: `CAPA_GHIDRA_PROGRAM_PATH=/myprogram capa /path/to/project.gpr`. |
There was a problem hiding this comment.
Also, see my previous comment about the database already being locked when capa runs. Do we need to specify additional restrictions here? e.g. "you must close the database before running capa"?
There was a problem hiding this comment.
Also, see my previous comment about the database already being locked when capa runs. Do we need to specify additional restrictions here? e.g. "you must close the database before running capa"?
We can add a small log message when the exception occurs. Something like:
Unexpected exception raised: {exctype}.\n It looks like the Ghidra project database is locked. ""Please close the project in the Ghidra GUI (or other process) and try again. For details, run in debug mode (-d/--debug).
|
|
||
| - ghidra: support PyGhidra @mike-hunhoff #2788 | ||
| - vmray: extract number features from whitelisted void_ptr parameters (hKey, hKeyRoot) @adeboyedn #2835 | ||
| - ghidra: support analyzing existing Ghidra projects via .gpr:program input syntax @saniyafatima07 #3087 |
There was a problem hiding this comment.
The .gpr:program syntax is no longer valid.
There was a problem hiding this comment.
Corrected it.
| f"""CAPA_GHIDRA_PROGRAM_PATH did not match any program in the Ghidra project\n | ||
| available programs:\n{available}""" |
There was a problem hiding this comment.
Please don't use triple " in error messages to avoid confusing log message formats at execution.
Thank you for the review Mike. |
This PR adds support for analyzing existing Ghidra projects directly using
.gprproject input.Users can now provide input in the format:
For multi-program projects:
Motivation & Context
Currently, the Ghidra backend always creates a temporary project and re-imports the binary. This:
This change enables reuse of existing analyzed Ghidra projects while keeping the implementation localized to the Ghidra backend with minimal architecture changes.
Implementation Details
.gprdetection to select the Ghidra backend when a Ghidra project file is provided as input.domain_file.getPathname()to discover programs within the project.CAPA_GHIDRA_PROGRAM_PATHsupport for selecting the target program in multi-program projects.create=Falseconsume_program.gprinput.gprinputs.Tests
Added tests for:
.gprinputCloses #3004
Checklist
Parts of this implementation were assisted using AI tools (Github Copilot, ChatGPT).
AI was used for:
All code was reviewed, modified and tested manually before submission.