Skip to content

Commit a9646a2

Browse files
committed
Initial commit
1 parent e97bf23 commit a9646a2

8 files changed

Lines changed: 8258 additions & 0 deletions

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.vscode/

readme.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# libbbf: Bound Book Format
2+
3+
![alt text](https://img.shields.io/badge/Format-BBF1-blue.svg)
4+
5+
6+
![alt text](https://img.shields.io/badge/License-MIT-green.svg)
7+
8+
Bound Book Format (.bbf) is a high-performance, archival-grade binary container designed specifically for digital comic books and manga. Unlike CBR/CBZ, BBF is built for DirectSotrage/mmap, easy integrity checks, and mixed-codec containerization.
9+
10+
## Technical Details
11+
BBF is designed as a Footer-indexed binary format. This allows for rapid append-only creation and immediate random access to any page without scanning the entire file.
12+
13+
### Binary Layout
14+
1. **Header (13 bytes)**: Magic `BBF1`, versioning, and initial padding.
15+
2. **Page Data**: The raw image payloads (AVIF, PNG, etc.), each padded to **4096-byte boundaries**.
16+
3. **String Pool**: A deduplicated pool of null-terminated strings for metadata and section titles.
17+
4. **Asset Table**: A registry of physical data blobs with XXH3 hashes.
18+
5. **Page Table**: The logical reading order, mapping logical pages to assets.
19+
6. **Section Table**: Markers for chapters, volumes, or gallery sections.
20+
7. **Metadata Table**: Key-Value pairs for archival data (Author, Scanlation team, etc.).
21+
8. **Footer (76 bytes)**: Table offsets and a final integrity hash.
22+
23+
### 4KB Alignment & DirectStorage
24+
Every asset in a BBF file starts on a 4KB boundary. This alignment is critical for modern NVMe-based systems. It allows developers to utilize `mmap` or **DirectStorage** to transfer image data directly from disk to GPU memory, bypassing the CPU-bottlenecked "copy and decompress" cycles found in Zip-based formats.
25+
26+
---
27+
28+
## Features
29+
30+
### Content Deduplication
31+
BBF uses **[XXH3_64](https://github.com/Cyan4973/xxHash)** hashing to identify identical pages. If a book contains duplicate pages, the data is stored exactly once on disk while being referenced multiple times in the Page Table.
32+
33+
### Archival Integrity
34+
Traditional bit-rot is the enemy of the archivist. BBF stores a 64-bit hash for *every individual asset*. The `bbfmux --verify` command can pinpoint exactly which page in a 2GB file has been damaged, rather than simply failing to open the entire archive.
35+
36+
### Mixed-Codec Support
37+
Preserve covers in **Lossless PNG** while encoding internal story pages in **AVIF** to save 70% space. BBF explicitly flags the codec for every asset, allowing readers to initialize the correct decoder instantly without "guessing" the file type.
38+
39+
---
40+
41+
## CLI Usage: `bbfmux`
42+
43+
The included `bbfmux` tool is a reference implementation for creating and managing BBF files.
44+
45+
## CLI Features
46+
47+
The `bbfmux` utility provides a powerful interface for managing Bound Book files:
48+
49+
* **Flexible Ingestion**: Create books by passing individual files, entire directories, or a mix of both.
50+
* **Logical Structuring**: Add named **Sections** (Chapters, Volumes, Galleries) to define the internal hierarchy of the book.
51+
* **Custom Metadata**: Embed arbitrary Key:Value pairs into the global string pool for archival indexing.
52+
* **Content-Aware Extraction**: Extract the entire book or target specific sections by name.
53+
54+
## Usage Examples
55+
56+
### Create a new BBF
57+
You can mix individual images and folders. `bbfmux` will sort inputs alphabetically, deduplicate identical assets, and align data to 4KB boundaries.
58+
59+
NOTE: It's not quite implemented yet in the CLI, but the `AssetTable` enables you to specify custom reading orders.
60+
61+
```bash
62+
bbfmux cover.png ./chapter1/ endcard.png \
63+
--section="Cover":1 \
64+
--section="Chapter 1":2 \
65+
--section="Credits":24 \
66+
--meta=Title:"Akira" \
67+
--meta=Author:"Katsuhiro Otomo" \
68+
akira.bbf
69+
```
70+
71+
### Verify Integrity
72+
Scan for bit-rot or data corruption. Will tell you which assets are corrupted.
73+
```bash
74+
bbfmux input.bbf --verify
75+
```
76+
77+
### Extract Data
78+
Extract a specific section or the entire book.
79+
```bash
80+
bbfmux input.bbf --extract --section="Chapter 1" --outdir="./chapter1"
81+
```
82+
83+
Extract the entire book
84+
```bash
85+
bbfmux input.bbf --extract --outdir="./unpacked_book"
86+
```
87+
88+
### View Metadata
89+
View the metadata for the .bbf file.
90+
```bash
91+
bbfmux input.bbf --info
92+
```
93+
94+
---
95+
96+
## Getting Started
97+
98+
### Prerequisites
99+
- C++17 compliant compiler (GCC/Clang/MSVC)
100+
- [xxHash](https://github.com/Cyan4973/xxHash) library
101+
102+
### Compilation
103+
```bash
104+
g++ -std=c++17 bbfmux.cpp libbbf.cpp xxhash.c -o bbfmux
105+
```
106+
107+
## License
108+
Distributed under the MIT License. See `LICENSE` for more information.

src/bbfenc.cpp

Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
#include "libbbf.h"
2+
#include "xxhash.h"
3+
#include <iostream>
4+
#include <filesystem>
5+
#include <string>
6+
#include <vector>
7+
#include <algorithm>
8+
#include <iomanip>
9+
10+
namespace fs = std::filesystem;
11+
12+
class BBFReader {
13+
public:
14+
BBFFooter footer;
15+
BBFHeader header; // Added to store header info
16+
std::ifstream stream;
17+
std::vector<char> stringPool;
18+
19+
bool open(const std::string& path) {
20+
stream.open(path, std::ios::binary | std::ios::ate);
21+
if (!stream.is_open()) return false;
22+
23+
size_t fileSize = stream.tellg();
24+
25+
// read header
26+
stream.seekg(0, std::ios::beg);
27+
stream.read(reinterpret_cast<char*>(&header), sizeof(BBFHeader));
28+
29+
// validate header
30+
if (std::string((char*)header.magic, 4) != "BBF1") return false;
31+
32+
// read footer
33+
stream.seekg(fileSize - sizeof(BBFFooter));
34+
stream.read(reinterpret_cast<char*>(&footer), sizeof(BBFFooter));
35+
36+
if (std::string((char*)footer.magic, 4) != "BBF1") return false;
37+
38+
// Load string pool
39+
stringPool.resize(footer.assetTableOffset - footer.stringPoolOffset);
40+
stream.seekg(footer.stringPoolOffset);
41+
stream.read(stringPool.data(), stringPool.size());
42+
return true;
43+
}
44+
45+
std::string getString(uint32_t offset) {
46+
if (offset >= stringPool.size()) return "OFFSET_ERR";
47+
return std::string(stringPool.data() + offset);
48+
}
49+
50+
std::vector<BBFAssetEntry> getAssets() {
51+
std::vector<BBFAssetEntry> assets(footer.assetCount);
52+
stream.seekg(footer.assetTableOffset);
53+
stream.read(reinterpret_cast<char*>(assets.data()), footer.assetCount * sizeof(BBFAssetEntry));
54+
return assets;
55+
}
56+
57+
std::vector<BBFPageEntry> getPages() {
58+
std::vector<BBFPageEntry> pages(footer.pageCount);
59+
stream.seekg(footer.pageTableOffset);
60+
stream.read(reinterpret_cast<char*>(pages.data()), footer.pageCount * sizeof(BBFPageEntry));
61+
return pages;
62+
}
63+
64+
std::vector<BBFSection> getSections() {
65+
std::vector<BBFSection> sections(footer.sectionCount);
66+
stream.seekg(footer.sectionTableOffset);
67+
stream.read(reinterpret_cast<char*>(sections.data()), footer.sectionCount * sizeof(BBFSection));
68+
return sections;
69+
}
70+
71+
std::vector<BBFMetadata> getMetadata() {
72+
std::vector<BBFMetadata> meta(footer.keyCount);
73+
if (footer.keyCount > 0) {
74+
stream.seekg(footer.metaTableOffset);
75+
stream.read(reinterpret_cast<char*>(meta.data()), footer.keyCount * sizeof(BBFMetadata));
76+
}
77+
return meta;
78+
}
79+
};
80+
81+
82+
83+
void printHelp() {
84+
std::cout << "Bound Book Format Muxer (bbfmux)\n\n"
85+
<< "Usage (Creation):\n"
86+
<< " bbfmux <inputs...> [options] <output.bbf>\n"
87+
<< " Inputs can be individual images or directories.\n\n"
88+
<< "Options:\n"
89+
<< " --section=\"Name\":PageIdx Add a section marker (1-based index)\n"
90+
<< " --meta=Key:\"Value\" Add metadata\n\n"
91+
<< "Usage (Operations):\n"
92+
<< " bbfmux <input.bbf> --info\n"
93+
<< " bbfmux <input.bbf> --verify\n"
94+
<< " bbfmux <input.bbf> --extract [--outdir=path] [--section=\"Name\"]\n";
95+
}
96+
97+
int main(int argc, char* argv[]) {
98+
if (argc < 2) { printHelp(); return 1; }
99+
100+
std::vector<std::string> inputs;
101+
std::string outputBbf;
102+
bool modeInfo = false, modeVerify = false, modeExtract = false;
103+
std::string outDir = "./extracted";
104+
std::string targetSection = "";
105+
106+
struct SecReq { std::string name; uint32_t page; };
107+
struct MetaReq { std::string k, v; };
108+
std::vector<SecReq> secReqs;
109+
std::vector<MetaReq> metaReqs;
110+
111+
// Parse all of the arguments
112+
for (int i = 1; i < argc; ++i) {
113+
std::string arg = argv[i];
114+
if (arg == "--info") modeInfo = true;
115+
else if (arg == "--verify") modeVerify = true;
116+
else if (arg == "--extract") modeExtract = true;
117+
else if (arg.find("--outdir=") == 0) outDir = arg.substr(9);
118+
else if (arg.find("--section=") == 0) {
119+
std::string val = arg.substr(10);
120+
size_t colon = val.find_last_of(':');
121+
if (colon != std::string::npos && !modeExtract) {
122+
secReqs.push_back({val.substr(0, colon), (uint32_t)std::stoi(val.substr(colon+1))});
123+
} else {
124+
targetSection = val; // For extraction
125+
}
126+
}
127+
else if (arg.find("--meta=") == 0) {
128+
std::string val = arg.substr(7);
129+
size_t colon = val.find(':');
130+
if (colon != std::string::npos)
131+
metaReqs.push_back({val.substr(0, colon), val.substr(colon+1)});
132+
}
133+
else {
134+
inputs.push_back(arg);
135+
}
136+
}
137+
138+
// Perform actions
139+
if (modeInfo || modeVerify || modeExtract) {
140+
if (inputs.empty()) { std::cerr << "Error: No .bbf input specified.\n"; return 1; }
141+
BBFReader reader;
142+
if (!reader.open(inputs[0])) { std::cerr << "Error: Failed to open BBF.\n"; return 1; }
143+
144+
if (modeInfo) {
145+
std::cout << "Bound Book Format (.bbf) Info\n";
146+
std::cout << "------------------------------\n";
147+
std::cout << "BBF Version: " << (int)reader.header.version << "\n";
148+
std::cout << "Pages: " << reader.footer.pageCount << "\n";
149+
std::cout << "Assets: " << reader.footer.assetCount << " (Deduplicated)\n";
150+
151+
// Print Sections
152+
std::cout << "\n[Sections]\n";
153+
auto sections = reader.getSections();
154+
if (sections.empty()) {
155+
std::cout << " No sections defined.\n";
156+
} else {
157+
for (auto& s : sections) {
158+
std::cout << " - " << std::left << std::setw(20)
159+
<< reader.getString(s.sectionTitleOffset)
160+
<< " (Starts Page: " << s.sectionStartIndex + 1 << ")\n";
161+
}
162+
}
163+
164+
// Print Metadata
165+
std::cout << "\n[Metadata]\n";
166+
auto metadata = reader.getMetadata();
167+
if (metadata.empty()) {
168+
std::cout << " No metadata found.\n";
169+
} else {
170+
for (auto& m : metadata) {
171+
std::string key = reader.getString(m.keyOffset);
172+
std::string val = reader.getString(m.valOffset);
173+
std::cout << " - " << std::left << std::setw(15) << (key + ":") << val << "\n";
174+
}
175+
}
176+
std::cout << std::endl;
177+
}
178+
179+
if (modeVerify) {
180+
std::cout << "Verifying asset integrity...\n";
181+
auto assets = reader.getAssets();
182+
bool clean = true;
183+
for (size_t i = 0; i < assets.size(); ++i) {
184+
std::vector<char> buf(assets[i].length);
185+
reader.stream.seekg(assets[i].offset);
186+
reader.stream.read(buf.data(), assets[i].length);
187+
if (XXH3_64bits(buf.data(), buf.size()) != assets[i].xxh3Hash) {
188+
std::cerr << "Mismatch in asset " << i << "\n";
189+
clean = false;
190+
}
191+
}
192+
if (clean) std::cout << "Integrity Check Passed.\n";
193+
}
194+
195+
if (modeExtract) {
196+
fs::create_directories(outDir);
197+
auto pages = reader.getPages();
198+
auto assets = reader.getAssets();
199+
auto sections = reader.getSections();
200+
201+
uint32_t start = 0, end = pages.size();
202+
if (!targetSection.empty()) {
203+
bool found = false;
204+
for (size_t i = 0; i < sections.size(); ++i) {
205+
if (reader.getString(sections[i].sectionTitleOffset) == targetSection) {
206+
start = sections[i].sectionStartIndex;
207+
end = (i + 1 < sections.size()) ? sections[i+1].sectionStartIndex : pages.size();
208+
found = true; break;
209+
}
210+
}
211+
if (!found) { std::cerr << "Section not found.\n"; return 1; }
212+
}
213+
214+
for (uint32_t i = start; i < end; ++i) {
215+
auto& asset = assets[pages[i].assetIndex];
216+
std::string ext = (asset.type == 0x01) ? ".avif" : ".png";
217+
std::string outPath = outDir + "/page_" + std::to_string(i+1) + ext;
218+
std::vector<char> buf(asset.length);
219+
reader.stream.seekg(asset.offset);
220+
reader.stream.read(buf.data(), asset.length);
221+
std::ofstream ofs(outPath, std::ios::binary);
222+
ofs.write(buf.data(), asset.length);
223+
}
224+
std::cout << "Extracted " << (end - start) << " pages to " << outDir << "\n";
225+
}
226+
}
227+
else {
228+
// CREATE MODE
229+
if (inputs.size() < 2) { std::cerr << "Error: Provide inputs and an output filename.\n"; return 1; }
230+
outputBbf = inputs.back();
231+
inputs.pop_back();
232+
233+
BBFBuilder builder(outputBbf);
234+
235+
// Gather all image paths
236+
std::vector<std::string> imagePaths;
237+
for (const auto& path : inputs) {
238+
if (fs::is_directory(path)) {
239+
for (const auto& entry : fs::directory_iterator(path))
240+
imagePaths.push_back(entry.path().string());
241+
} else {
242+
imagePaths.push_back(path);
243+
}
244+
}
245+
std::sort(imagePaths.begin(), imagePaths.end());
246+
247+
for (const auto& p : imagePaths) {
248+
std::string ext = fs::path(p).extension().string();
249+
uint8_t type = (ext == ".avif" || ext == ".AVIF") ? 1 : 2;
250+
if (!builder.addPage(p, type)) std::cerr << "Warning: Failed to add " << p << "\n";
251+
}
252+
253+
for (auto& s : secReqs) builder.addSection(s.name, s.page - 1);
254+
for (auto& m : metaReqs) builder.addMetadata(m.k, m.v);
255+
256+
if (builder.finalize()) std::cout << "Successfully created " << outputBbf << "\n";
257+
}
258+
259+
return 0;
260+
}

0 commit comments

Comments
 (0)