Skip to content

Commit 802cfce

Browse files
committed
Add BIP UTXO set sharing
1 parent 805c9b5 commit 802cfce

1 file changed

Lines changed: 244 additions & 0 deletions

File tree

bip-XXXX.md

Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
```
2+
BIP: ?
3+
Layer: Peer Services
4+
Title: P2P UTXO Set Sharing
5+
Authors: Fabian Jahr <fjahr@protonmail.com>
6+
Status: Draft
7+
Type: Specification
8+
Assigned: ?
9+
Discussion: ?
10+
License: BSD-2-Clause
11+
```
12+
13+
## Abstract
14+
15+
This BIP defines a P2P protocol extension for sharing full UTXO sets between peers. It introduces
16+
a new service bit `NODE_UTXO_SET`, four new P2P messages (`getutxosetinfo`, `utxosetinfo`, `getutxoset`,
17+
`utxoset`), and a Merkle-tree-based integrity scheme that enables per-chunk verification. This allows
18+
nodes to bootstrap from a recent height by obtaining the required UTXO set directly from the P2P network
19+
via mechanisms such as assumeutxo.
20+
21+
## Motivation
22+
23+
The assumeutxo feature (implemented in Bitcoin Core) allows nodes to begin operating from a serialized
24+
UTXO set while validating
25+
historical blocks in the background. However, there is currently no canonical source for obtaining this
26+
data. Users must either generate one themselves from a fully synced node (using `dumptxoutset` in
27+
Bitcoin Core), or download one from a third party.
28+
29+
By enabling UTXO set sharing over the P2P network, new nodes can obtain the data directly from
30+
peers, removing the dependency on external infrastructure.
31+
32+
## Specification
33+
34+
The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in this document are to be
35+
interpreted as described in RFC 2119.
36+
37+
### Service Bit
38+
39+
| Name | Bit | Description |
40+
|------|-----|-------------|
41+
| `NODE_UTXO_SET` | 12 (0x1000) | The node can serve complete UTXO set data for at least one height. |
42+
43+
A node MUST NOT set this bit unless it has at least one full UTXO set available to serve.
44+
A node signaling `NODE_UTXO_SET` MUST respond to `getutxosetinfo` messages and MUST be capable of
45+
serving all UTXO sets it advertises in its `utxosetinfo` response. A node that fails to meet these
46+
obligations SHOULD be disconnected.
47+
48+
### Data Structures
49+
50+
#### Serialized UTXO Set
51+
52+
The serialized UTXO set uses the format established by the Bitcoin Core RPC `dumptxoutset` (as of PR #29612).
53+
54+
**Header (55 bytes):**
55+
56+
| Field | Type | Size | Description |
57+
|-------|------|------|-------------|
58+
| `magic` | `bytes` | 5 | `0x7574786fff` (ASCII `utxo` + `0xff`). |
59+
| `version` | `uint16_t` | 2 | Format version. |
60+
| `network_magic` | `bytes` | 4 | Network message start bytes. |
61+
| `base_height` | `uint32_t` | 4 | Block height of the UTXO set. |
62+
| `base_blockhash` | `uint256` | 32 | Block hash of the UTXO set. |
63+
| `coins_count` | `uint64_t` | 8 | Total number of coins (UTXOs) in the set. |
64+
65+
**Body (coin data):**
66+
67+
Coins are grouped by transaction hash. For each group:
68+
69+
| Field | Type | Size | Description |
70+
|-------|------|------|-------------|
71+
| `txid` | `uint256` | 32 | Transaction hash. |
72+
| `num_coins` | `compact_size` | 1–9 | Number of outputs for this txid. |
73+
74+
For each coin in the group:
75+
76+
| Field | Type | Size | Description |
77+
|-------|------|------|-------------|
78+
| `vout_index` | `compact_size` | 1–9 | Output index. |
79+
| `coin` | `Coin` | variable | Serialized coin (varint-encoded code for height/coinbase, then compressed txout). |
80+
81+
Coins are ordered lexicographically by outpoint (txid, then vout index), matching the LevelDB iteration
82+
order of the coins database.
83+
84+
#### Chunk Merkle Tree
85+
86+
The serialized UTXO set (header + body) is split into chunks of exactly 3,900,000 bytes (3.9 MB). The
87+
last chunk contains the remaining bytes and may be smaller.
88+
89+
The leaf hash for each chunk is `SHA256d(chunk_data)`. The tree is built as a balanced binary tree. When
90+
the number of nodes at a level is odd, the last node is duplicated before hashing the next level.
91+
Interior nodes are computed as `SHA256d(left_child || right_child)`.
92+
93+
The Merkle proof for chunk `i` consists of sibling hashes along the path from leaf to root. The
94+
verifier derives the path direction from the chunk index: at each level, if the current index is even
95+
the proof hash is the right sibling; if odd, the left sibling.
96+
97+
`SHA256d` denotes double-SHA256: `SHA256d(x) = SHA256(SHA256(x))`.
98+
99+
#### Serialized Hash
100+
101+
The serialized hash is the value that must match with a know value hash of the UTXO set at the respecitve
102+
height. In Bitcoin Core, for example, the `hash_serialized` field is in the assumeutxo
103+
parameters. It is computed by iterating over every coin in the set in lexicographic outpoint order and
104+
feeding a serialized representation of each coin into a SHA256d hasher. The per-coin serialization is:
105+
106+
| Field | Type | Size | Description |
107+
|-------|------|------|-------------|
108+
| `outpoint` | `COutPoint` | 36 | Transaction hash (32 bytes) + output index (4 bytes, little-endian). |
109+
| `code` | `uint32_t` | 4 | `(height << 1) \| coinbase_flag`, little-endian. `height` is the block height at which the coin was created. `coinbase_flag` is 1 if the coin originates from a coinbase transaction, 0 otherwise. |
110+
| `txout` | `CTxOut` | variable | The transaction output: amount as `int64_t` (8 bytes, little-endian) followed by the scriptPubKey serialized with its `compact_size` length prefix. |
111+
112+
All coin serializations are fed sequentially into a single SHA256d hasher. The resulting 32-byte digest
113+
is the serialized hash.
114+
115+
### Messages
116+
117+
#### `getutxosetinfo`
118+
119+
Sent to discover which UTXO sets a peer can serve. This message has an empty payload.
120+
121+
A node that has advertised `NODE_UTXO_SET` MUST respond with `utxosetinfo`. A node that has not
122+
advertised the service bit SHOULD ignore this message.
123+
124+
#### `utxosetinfo`
125+
126+
Sent in response to `getutxosetinfo`. Lists available UTXO sets.
127+
128+
| Field | Type | Size | Description |
129+
|-------|------|------|-------------|
130+
| `count` | `compact_size` | 1–9 | Number of available UTXO sets. |
131+
132+
For each available UTXO set:
133+
134+
| Field | Type | Size | Description |
135+
|-------|------|------|-------------|
136+
| `height` | `uint32_t` | 4 | Block height. |
137+
| `block_hash` | `uint256` | 32 | Block hash at that height. |
138+
| `serialized_hash` | `uint256` | 32 | The UTXO set serialized hash. |
139+
| `data_length` | `uint64_t` | 8 | Total size of the serialized UTXO set in bytes (header + body). |
140+
| `merkle_root` | `uint256` | 32 | Root of the Merkle tree computed over chunk hashes. |
141+
142+
A requesting node MUST ignore entries whose `serialized_hash` does not match a known
143+
utxo set hash for the corresponding height.
144+
145+
#### `getutxoset`
146+
147+
Sent to request a single chunk of UTXO set data. The requesting node MUST have completed header sync
148+
before sending this message.
149+
150+
| Field | Type | Size | Description |
151+
|-------|------|------|-------------|
152+
| `height` | `uint32_t` | 4 | Block height of the requested UTXO set. |
153+
| `block_hash` | `uint256` | 32 | Block hash at the requested height. |
154+
| `chunk_index` | `uint32_t` | 4 | Zero-based index of the requested chunk. |
155+
156+
If the serving node cannot fulfill the request, it MUST NOT respond. The requesting node SHOULD apply
157+
a reasonable timeout and disconnect peers that fail to respond.
158+
159+
#### `utxoset`
160+
161+
Sent in response to `getutxoset`, delivering one chunk with its Merkle proof.
162+
163+
| Field | Type | Size | Description |
164+
|-------|------|------|-------------|
165+
| `height` | `uint32_t` | 4 | Block height this data corresponds to. |
166+
| `block_hash` | `uint256` | 32 | Block hash this data corresponds to. |
167+
| `chunk_index` | `uint32_t` | 4 | Zero-based index of this chunk. |
168+
| `proof_length` | `compact_size` | 1–9 | Number of hashes in the Merkle proof. |
169+
| `proof_hashes` | `uint256[]` | 32 × `proof_length` | Sibling hashes from leaf to root. |
170+
| `data` | `bytes` | variable | Chunk payload, exactly 3.9 MB except for the last chunk. |
171+
172+
The transfer is receiver-driven: the requesting node sends one `getutxoset` per chunk. Chunks MAY be
173+
requested in any order and from different peers, provided those peers advertised the same `merkle_root`
174+
for the same height and block hash.
175+
176+
Upon receiving a `utxoset` message, the node MUST compute `SHA256d(data)` and verify it against the
177+
`merkle_root` using the provided proof. If verification fails, the node MUST discard the chunk and
178+
disconnect the peer. A node SHOULD also disconnect a peer that sends a `utxoset` message with fields
179+
(`chunk_index`, `height`, `block_hash`) that do not match the outstanding request.
180+
181+
After all chunks have been received, the node MUST compute the serialized hash and compare it against a
182+
known UTXO set hash. If this check fails, the node MUST discard all data and
183+
SHOULD disconnect all peers that advertised the corresponding Merkle root.
184+
185+
### Protocol Flow
186+
187+
1. The requesting node identifies peers advertising `NODE_UTXO_SET`.
188+
2. The requesting node sends `getutxosetinfo` to one or more of these peers.
189+
3. Each peer responds with `utxosetinfo`. The requesting node verifies that the advertised
190+
`serialized_hash` matches a known UTXO set hash, compares `merkle_root` values across peers,
191+
and selects a UTXO set whose Merkle root has agreement among multiple peers.
192+
4. The requesting node downloads chunks via `getutxoset`/`utxoset` exchanges, verifying each chunk
193+
against the Merkle root on receipt. On verification failure the peer is disconnected and download
194+
continues from another peer without losing already-verified chunks.
195+
5. After all chunks are received, the node computes the full serialized hash and verifies it against
196+
the known UTXO set hash.
197+
198+
Serving nodes are free to limit the number of concurrent and repeated transfers per peer at their own
199+
discretion to manage resource consumption.
200+
201+
## Rationale
202+
203+
**Usage of service bit 12:** Service bits allow selective peer discovery through
204+
DNS seeds and addr relay. Bit 12 is chosen as the next unassigned bit after `NODE_P2P_V2` (bit 11, BIP 324).
205+
206+
**Serialized hash in `utxosetinfo`:** The requesting node should have access to a known UTXO set hash
207+
before initiating the process. Including the serialized hash in the advertisement lets the requester
208+
immediately filter out peers claiming a different UTXO set state before downloading any data.
209+
210+
**Discovery before download:** The `getutxosetinfo`/`utxosetinfo` exchange lets the requesting node
211+
confirm availability, verify the serialized hash, and learn the Merkle root before committing to a large
212+
transfer.
213+
214+
**Per-chunk Merkle verification:** In the Bitcoin P2P protocol, every larger piece of data received during
215+
normal operation (blocks, transactions, compact block filters) can be verified independently before
216+
requesting more. Without per-chunk verification, a UTXO set transfer would be an anomaly: ~10 GB (as of early 2026)
217+
of data verifiable only after complete receipt. The Merkle tree enables incremental verifiability, allowing for
218+
immediate detection of corrupt data, peer switching without data loss, and parallel download from
219+
multiple peers. The overhead is minimal (~384 bytes of proof per 3.9 MB chunk). The specified
220+
serialization is deterministic, so all honest nodes produce byte-identical output, guaranteeing Merkle
221+
root agreement.
222+
223+
**3.9 MB chunk size:** The number balances round trips (~2,560 for a ~10 GB set) against memory usage
224+
for buffering and verifying a single chunk. Smaller chunks would increase protocol overhead; larger
225+
chunks would increase memory pressure on constrained devices commonly used to run Bitcoin nodes.
226+
Together with the additional message overhead, the `utxoset` message including the chunk data also
227+
sits just below the theoretical maximum block size which means any implementation should be able to
228+
handle messages of this size.
229+
230+
**Reusing the `dumptxoutset` format:** Avoids introducing a new serialization format and ensures
231+
compatibility with UTXO sets already being generated and shared.
232+
233+
**Relationship to BIP 64:** BIP 64 defined a protocol for querying individual UTXOs by outpoint and is
234+
now closed. This BIP addresses a different use case: bulk transfer of the entire UTXO set for node
235+
bootstrapping.
236+
237+
## Reference Implementation
238+
239+
TBD
240+
241+
## Copyright
242+
243+
This BIP is made available under the terms of the 2-clause BSD license. See
244+
https://opensource.org/license/BSD-2-Clause for more information.

0 commit comments

Comments
 (0)