Skip to content

Enhance ghidra backend with existing project feature#3087

Open
saniyafatima07 wants to merge 12 commits into
mandiant:masterfrom
saniyafatima07:ghidra-feature-new
Open

Enhance ghidra backend with existing project feature#3087
saniyafatima07 wants to merge 12 commits into
mandiant:masterfrom
saniyafatima07:ghidra-feature-new

Conversation

@saniyafatima07
Copy link
Copy Markdown
Collaborator

@saniyafatima07 saniyafatima07 commented May 25, 2026

This PR adds support for analyzing existing Ghidra projects directly using .gpr project input.

Users can now provide input in the format:

capa /path/to/project.gpr

For multi-program projects:

CAPA_GHIDRA_PROGRAM_PATH=/folder/program capa /path/to/project.gpr

Motivation & Context

Currently, the Ghidra backend always creates a temporary project and re-imports the binary. This:

  • increases analysis time
  • ignores previously analyzed projects and annotations
  • duplicates existing analysis work

This change enables reuse of existing analyzed Ghidra projects while keeping the implementation localized to the Ghidra backend with minimal architecture changes.

Implementation Details

  • Added automatic .gpr detection to select the Ghidra backend when a Ghidra project file is provided as input.
  • Added recursive Ghidra project file enumeration using domain_file.getPathname() to discover programs within the project.
  • Added automatic program selection for single-program projects.
  • Added CAPA_GHIDRA_PROGRAM_PATH support for selecting the target program in multi-program projects.
  • Added informative error handling that lists available project program paths when disambiguation is required.
  • Updated Ghidra loader flow to:
    • open existing projects using create=False
    • reuse already analyzed programs via consume_program
    • skip temporary project creation/import flow for .gpr input
  • Default behavior remains unchanged for non-.gpr inputs.

Tests

Added tests for:

  • automatic Ghidra backend selection for .gpr input
  • skipping generic file extractor probing for Ghidra project input

Closes #3004

Checklist

  • CHANGELOG updated
  • Added few tests
  • Documentation updated
  • This submission includes AI-generated code and I have provided details in the description.

Parts of this implementation were assisted using AI tools (Github Copilot, ChatGPT).
AI was used for:

  • refining implementation approach
  • improving edge case handling

All code was reviewed, modified and tested manually before submission.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Ghidra projects (.gpr files) by adding utilities to navigate Ghidra project structures and updating the loader to handle project-based analysis. The changes include automatic backend detection for Ghidra projects and updated CLI logic to skip generic file extraction when a project is provided. Review feedback highlights the need for better terminal formatting in error messages, improved resource management to prevent program leaks during exceptions, and the enablement of function filters for the Ghidra backend.

Comment thread capa/ghidra/helpers.py
Comment thread capa/ghidra/helpers.py
Comment thread capa/loader.py
Comment on lines 384 to 385
project = project_cm.__enter__()
try:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Initialize program and consumer to None before the try block. This allows for safe cleanup in the except block if an error occurs after the program is opened but before the context wrapper is created.

Suggested change
project = project_cm.__enter__()
try:
project = project_cm.__enter__()
program, consumer = None, None
try:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address @saniyafatima07

Comment thread capa/loader.py
Comment on lines 432 to 436
except Exception:
project_cm.__exit__(None, None, None)
tmpdir.cleanup()
if tmpdir:
tmpdir.cleanup()
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If an exception occurs after pyghidra.consume_program but before the GhidraContextWrapper is instantiated, the program resource is leaked. Adding a check to release the program in the except block ensures proper resource management.

Suggested change
except Exception:
project_cm.__exit__(None, None, None)
tmpdir.cleanup()
if tmpdir:
tmpdir.cleanup()
raise
except Exception:
if program is not None:
program.release(consumer)
project_cm.__exit__(None, None, None)
if tmpdir:
tmpdir.cleanup()
raise

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address @saniyafatima07 . This resource leak existed before, but didn't matter because capa created a temporary project each time. But now that we're using existing projects, we need to ensure capa doesn't leak the resource and lock the user out of their Ghidra database.

If you haven't already, another situation to test is how capa handles attempting to open a Ghidra database that is already locked, e.g. the user has it opened.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have looked into the lock handling using LockException. I will ensure that the resources are properly released and no leaks occur.

Comment thread capa/main.py
Comment on lines +928 to 931
if backend == BACKEND_GHIDRA:
return {}

if input_format in STATIC_FORMATS:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Disabling extractor filters for the Ghidra backend prevents users from using --restrict-to-functions. Since Ghidra is a static analysis backend, it should support these filters. Merging this with the static format check allows filters to be applied correctly for Ghidra regardless of the input format detection.

Suggested change
if backend == BACKEND_GHIDRA:
return {}
if input_format in STATIC_FORMATS:
if input_format in STATIC_FORMATS or backend == BACKEND_GHIDRA:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saniyafatima07 thoughts here? I'd expect the Ghidra backend to handle function restrictions set by the user. Is there a reason we're skipping them?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, Mike. Since Ghidra is a static-analysis backend, it should support function restrictions too. I had a slight misinterpretation while implementing this logic. I’ll correct it.

@github-actions github-actions Bot dismissed their stale review May 25, 2026 13:10

CHANGELOG updated or no update needed, thanks! 😄

@saniyafatima07
Copy link
Copy Markdown
Collaborator Author

@mike-hunhoff
I have tried implementing this feature with the new approach as per the discussion in #3066 .
Could you please review it?
Thank you for your time!

@saniyafatima07 saniyafatima07 marked this pull request as ready for review May 25, 2026 13:42
Copy link
Copy Markdown
Collaborator

@mike-hunhoff mike-hunhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @saniyafatima07 ! I left comments for your review. I'll do some more thinking on how to best handle the .gpr tests in the meantime.

Comment thread capa/ghidra/helpers.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work splitting up the code into helper functions to keep things concise.

Comment thread capa/loader.py
Comment on lines 384 to 385
project = project_cm.__enter__()
try:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address @saniyafatima07

Comment thread capa/loader.py
Comment on lines 432 to 436
except Exception:
project_cm.__exit__(None, None, None)
tmpdir.cleanup()
if tmpdir:
tmpdir.cleanup()
raise
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address @saniyafatima07 . This resource leak existed before, but didn't matter because capa created a temporary project each time. But now that we're using existing projects, we need to ensure capa doesn't leak the resource and lock the user out of their Ghidra database.

If you haven't already, another situation to test is how capa handles attempting to open a Ghidra database that is already locked, e.g. the user has it opened.

Comment thread capa/main.py
Comment on lines +928 to 931
if backend == BACKEND_GHIDRA:
return {}

if input_format in STATIC_FORMATS:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saniyafatima07 thoughts here? I'd expect the Ghidra backend to handle function restrictions set by the user. Is there a reason we're skipping them?

Comment thread doc/usage.md Outdated
Please check out the [capa explorer documentation](/capa/ida/plugin/README.md).

### Ghidra project support
Capa can analyze programs directly from Ghidra projects by specifying the project file path (`.gpr`). If the project contains multiple programs, set the `CAPA_GHIDRA_PROGRAM_PATH` environment variable to specify which program to analyze. For example: `CAPA_GHIDRA_PROGRAM_PATH=/myprogram capa /path/to/project.gpr`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Capa can analyze programs directly from Ghidra projects by specifying the project file path (`.gpr`). If the project contains multiple programs, set the `CAPA_GHIDRA_PROGRAM_PATH` environment variable to specify which program to analyze. For example: `CAPA_GHIDRA_PROGRAM_PATH=/myprogram capa /path/to/project.gpr`.
capa can analyze programs directly from Ghidra projects by specifying the project file path (`.gpr`). If the project contains multiple programs, set the `CAPA_GHIDRA_PROGRAM_PATH` environment variable to specify which program to analyze. For example: `CAPA_GHIDRA_PROGRAM_PATH=/myprogram capa /path/to/project.gpr`.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, see my previous comment about the database already being locked when capa runs. Do we need to specify additional restrictions here? e.g. "you must close the database before running capa"?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, see my previous comment about the database already being locked when capa runs. Do we need to specify additional restrictions here? e.g. "you must close the database before running capa"?

We can add a small log message when the exception occurs. Something like:
Unexpected exception raised: {exctype}.\n It looks like the Ghidra project database is locked. ""Please close the project in the Ghidra GUI (or other process) and try again. For details, run in debug mode (-d/--debug).

Comment thread CHANGELOG.md Outdated

- ghidra: support PyGhidra @mike-hunhoff #2788
- vmray: extract number features from whitelisted void_ptr parameters (hKey, hKeyRoot) @adeboyedn #2835
- ghidra: support analyzing existing Ghidra projects via .gpr:program input syntax @saniyafatima07 #3087
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .gpr:program syntax is no longer valid.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected it.

Comment thread capa/ghidra/helpers.py Outdated
Comment on lines +82 to +83
f"""CAPA_GHIDRA_PROGRAM_PATH did not match any program in the Ghidra project\n
available programs:\n{available}"""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use triple " in error messages to avoid confusing log message formats at execution.

@mike-hunhoff mike-hunhoff requested a review from a team May 28, 2026 16:27
@saniyafatima07
Copy link
Copy Markdown
Collaborator Author

Great work @saniyafatima07 ! I left comments for your review. I'll do some more thinking on how to best handle the .gpr tests in the meantime.

Thank you for the review Mike.
I will address all the comments.
Sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ghidra: enable feature extraction from existing Ghidra project binary

2 participants