Skip to content

Changes to support Windows Server 2025 32KiB databases#79

Closed
takker-hero-se wants to merge 2 commits into
libyal:mainfrom
takker-hero-se:fix-ws2025-32k-page-support
Closed

Changes to support Windows Server 2025 32KiB databases#79
takker-hero-se wants to merge 2 commits into
libyal:mainfrom
takker-hero-se:fix-ws2025-32k-page-support

Conversation

@takker-hero-se

Copy link
Copy Markdown

Fixes #78

Summary

Windows Server 2025 introduced optional 32KiB ESE database pages for Active Directory (NTDS.dit). This PR fixes two issues that prevented libesedb from parsing such databases.

Changes

1. Page tag count mask (libesedb_page_header.c, +9 lines)

In the 32KiB page format, the available_page_tag field (uint16) uses a new layout:

  • Upper 4 bits: ctagReserved (reserved, must be masked out)
  • Lower 12 bits: actual number of page tags

Without masking, libesedb reads all 16 bits as the tag count, inflating the value and causing out-of-bounds page tag reads.

Fix: page_header->available_page_tag &= 0x0fff when io_handle->page_size >= 32768.

2. Leaf page validation in B-tree walk (libesedb_page_tree.c, +45 lines)

The backward walk in libesedb_page_tree_get_get_first_leaf_page_number() and forward walk in libesedb_page_tree_get_number_of_leaf_values() did not check whether each page in the leaf chain is actually a leaf page. In 32KiB databases, some pages referenced in the chain may be zeroed or non-leaf pages, causing errors or incorrect record counts.

Fix: Check LIBESEDB_PAGE_FLAG_IS_LEAF before processing each page. Added proper libcerror_error_set() error handling for the libesedb_page_get_flags() calls.

Testing

Tested with real Active Directory NTDS.dit databases:

8KiB pages (WS2019) 32KiB pages (WS2025)
Tables 14 14
datatable records 7,008 7,029
link_table records 13,904 485
Errors/crashes None None

No regression on 8KiB page databases.

References

Windows Server 2025 introduced optional 32KiB ESE database pages.
This fixes two issues that prevented libesedb from parsing such databases:

1. In the 32KiB page format the upper 4 bits of available_page_tag are
   reserved (ctagReserved) and only the lower 12 bits contain the actual
   number of page tags. Masked the reserved bits when page_size >= 32768.

2. The leaf page backward and forward walk functions did not validate the
   IS_LEAF page flag. In 32KiB databases some pages in the leaf chain may
   not be actual leaf pages. Added IS_LEAF check with proper error handling.
@takker-hero-se

Copy link
Copy Markdown
Author

CI failure analysis

The 2 failed jobs (macOS x64 gcc python and mingw-w64-gcc-python) are unrelated to this PR's changes.

Both fail with:

import pyesedb
ImportError: No such file or directory

This is a shared library path issue in the CI environment, not an API incompatibility. This PR only modifies internal page parsing logic in libesedb_page_header.c and libesedb_page_tree.c — no public API changes.

Additionally, mingw-w64-gcc-python also fails on the current main branch with the same error, confirming it is a pre-existing CI environment issue.

The remaining 21/23 jobs (including all C compilation and test jobs) pass successfully.

scudette added a commit to Velocidex/go-ese that referenced this pull request Mar 21, 2026
In Windows Server 2025 the ESE pagesize increased to 32kb. The
AvailablePageTag field is therefore reduced to only 12 bits with the
rest of the bits used for something else.

This caused this library to break in parsing the pages and crash.

This PR also adds some more resilience in parsing corrupted data to
avoid some crashes.

Credit for this fix goes to https://github.com/takker-hero-se
libyal/libesedb#79
scudette added a commit to Velocidex/go-ese that referenced this pull request Mar 21, 2026
In Windows Server 2025 the ESE pagesize increased to 32kb. The
AvailablePageTag field is therefore reduced to only 12 bits with the
rest of the bits used for something else.

This caused this library to break in parsing the pages and crash.

This PR also adds some more resilience in parsing corrupted data to
avoid some crashes.

Credit for this fix goes to https://github.com/takker-hero-se
libyal/libesedb#79
@joachimmetz joachimmetz force-pushed the main branch 2 times, most recently from eee0e90 to baed490 Compare June 12, 2026 17:52
The WS2025 itagState reinterpretation (upper 4 bits = ctagReserved, lower
12 bits = tag count) is not specific to 32 KiB pages: it also affects
16 KiB DataStore.edb and 4 KiB SRUDB.dat on WS2025. Gating the & 0x0fff
mask on page_size >= 32768 therefore left those databases unparseable.

Gate on io_handle->format_revision >= 0x0122 instead. This:
  * covers all WS2025 page sizes (revision 0x0122 for NTDS.dit, 0x012C
    for DataStore.edb and SRUDB.dat), and
  * is a provable no-op on every pre-WS2025 database (WS2019 = 0x0014),
    including dense 32 KiB legacy pages whose tag count could exceed the
    12-bit field and be truncated by an unconditional/size-gated mask.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 20.00000% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.30%. Comparing base (6620ab0) to head (35f8478).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
libesedb/libesedb_page_tree.c 16.66% 6 Missing and 4 partials ⚠️
libesedb/libesedb_page_header.c 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #79       +/-   ##
===========================================
+ Coverage   22.09%   33.30%   +11.20%     
===========================================
  Files          52       54        +2     
  Lines       12277    12399      +122     
  Branches     2836     2890       +54     
===========================================
+ Hits         2713     4129     +1416     
+ Misses       9140     7145     -1995     
- Partials      424     1125      +701     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@joachimmetz

Copy link
Copy Markdown
Member

Thanks for the proposed changes will take a look when time permits.

@joachimmetz

joachimmetz commented Jun 18, 2026

Copy link
Copy Markdown
Member

No need to address the failing tests. I'll make comparable changes in a separate change list.

In the 32KiB page format, the available_page_tag field (uint16) uses a new layout:

Note that this appears not to be limited to 32k page size.

To reproduce https://github.com/dfirlabs/esedb-specimens on Windows 11 (0x620,300)

libesedb_page_read_file_io_handle: reading page: 14 at offset: 61440 (0x0000f000)
libesedb_page_header_read_data: page header:
00000000: 83 e2 41 96 65 0f 65 0f  bb 01 00 00 00 00 00 00   ..A.e.e. ........
00000010: 0d 00 00 00 00 00 00 00  02 00 00 00 21 07 00 00   ........ ....!...
00000020: 2b 08 23 10 02 28 01 00                            +.#..(..

libesedb_page_header_read_data: XOR checksum                                    : 0x9641e283
libesedb_page_header_read_data: ECC checksum                                    : 0x0f650f65
libesedb_page_header_read_data: database modification time:
00000000: bb 01 00 00 00 00 00 00                            ........

libesedb_page_header_read_data: previous page number                            : 13
libesedb_page_header_read_data: next page number                                : 0
libesedb_page_header_read_data: father data page (FDP) object identifier        : 2
libesedb_page_header_read_data: available data size                             : 1825
libesedb_page_header_read_data: available uncommitted data size                 : 0
libesedb_page_header_read_data: available data offset                           : 2091
libesedb_page_header_read_data: available page tag                              : 35 (0x1023)
libesedb_page_header_read_data: page flags                                      : 0x00012802
        Is leaf
        0x0800 (primary?)
        Is new record format

@joachimmetz

joachimmetz commented Jun 18, 2026

Copy link
Copy Markdown
Member

In 32KiB databases, some pages referenced in the chain may be zeroed or non-leaf pages, causing errors or incorrect record counts.

Interesting do you have some examples that could be shared?

Note that in your changes both libesedb_page_tree_get_get_first_leaf_page_number and libesedb_page_tree_get_number_of_leaf_values now can stop prematurely if there is non-leaf page in the chain.

@joachimmetz joachimmetz changed the title Fixed parsing of 32KiB page ESE databases from Windows Server 2025 Changes to support Windows Server 2025 32KiB databases Jun 18, 2026
@joachimmetz

joachimmetz commented Jun 18, 2026

Copy link
Copy Markdown
Member

Closing in favor of 768a047

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to parse Windows Server 2025 NTDS.dit (0x620,290) with 32KiB ESE pages

2 participants