|
| 1 | +``` |
| 2 | + BIP: ? |
| 3 | + Layer: Peer Services |
| 4 | + Title: P2P UTXO Set Sharing |
| 5 | + Authors: Fabian Jahr <fjahr@protonmail.com> |
| 6 | + Status: Draft |
| 7 | + Type: Specification |
| 8 | + Assigned: ? |
| 9 | + Discussion: ? |
| 10 | + License: BSD-2-Clause |
| 11 | +``` |
| 12 | + |
| 13 | +## Abstract |
| 14 | + |
| 15 | +This BIP defines a P2P protocol extension for sharing full UTXO sets between peers. It introduces |
| 16 | +a new service bit `NODE_UTXO_SET`, four new P2P messages (`getutxosetinfo`, `utxosetinfo`, `getutxoset`, |
| 17 | +`utxoset`), and a Merkle-tree-based integrity scheme that enables per-chunk verification. This allows |
| 18 | +nodes to bootstrap from a recent height by obtaining the required UTXO set directly from the P2P network |
| 19 | +via mechanisms such as assumeutxo. |
| 20 | + |
| 21 | +## Motivation |
| 22 | + |
| 23 | +The assumeutxo feature (implemented in Bitcoin Core) allows nodes to begin operating from a serialized |
| 24 | +UTXO set while validating |
| 25 | +historical blocks in the background. However, there is currently no canonical source for obtaining this |
| 26 | +data. Users must either generate one themselves from a fully synced node (using `dumptxoutset` in |
| 27 | +Bitcoin Core), or download one from a third party. |
| 28 | + |
| 29 | +By enabling UTXO set sharing over the P2P network, new nodes can obtain the data directly from |
| 30 | +peers, removing the dependency on external infrastructure. |
| 31 | + |
| 32 | +## Specification |
| 33 | + |
| 34 | +The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in this document are to be |
| 35 | +interpreted as described in RFC 2119. |
| 36 | + |
| 37 | +### Service Bit |
| 38 | + |
| 39 | +| Name | Bit | Description | |
| 40 | +|------|-----|-------------| |
| 41 | +| `NODE_UTXO_SET` | 12 (0x1000) | The node can serve complete UTXO set data for at least one height. | |
| 42 | + |
| 43 | +A node MUST NOT set this bit unless it has at least one full UTXO set available to serve. |
| 44 | +A node signaling `NODE_UTXO_SET` MUST respond to `getutxosetinfo` messages and MUST be capable of |
| 45 | +serving all UTXO sets it advertises in its `utxosetinfo` response. A node that fails to meet these |
| 46 | +obligations SHOULD be disconnected. |
| 47 | + |
| 48 | +### Data Structures |
| 49 | + |
| 50 | +#### Serialized UTXO Set |
| 51 | + |
| 52 | +The serialized UTXO set uses the format established by the Bitcoin Core RPC `dumptxoutset` (as of PR #29612). |
| 53 | + |
| 54 | +**Header (55 bytes):** |
| 55 | + |
| 56 | +| Field | Type | Size | Description | |
| 57 | +|-------|------|------|-------------| |
| 58 | +| `magic` | `bytes` | 5 | `0x7574786fff` (ASCII `utxo` + `0xff`). | |
| 59 | +| `version` | `uint16_t` | 2 | Format version. | |
| 60 | +| `network_magic` | `bytes` | 4 | Network message start bytes. | |
| 61 | +| `base_height` | `uint32_t` | 4 | Block height of the UTXO set. | |
| 62 | +| `base_blockhash` | `uint256` | 32 | Block hash of the UTXO set. | |
| 63 | +| `coins_count` | `uint64_t` | 8 | Total number of coins (UTXOs) in the set. | |
| 64 | + |
| 65 | +**Body (coin data):** |
| 66 | + |
| 67 | +Coins are grouped by transaction hash. For each group: |
| 68 | + |
| 69 | +| Field | Type | Size | Description | |
| 70 | +|-------|------|------|-------------| |
| 71 | +| `txid` | `uint256` | 32 | Transaction hash. | |
| 72 | +| `num_coins` | `compact_size` | 1–9 | Number of outputs for this txid. | |
| 73 | + |
| 74 | +For each coin in the group: |
| 75 | + |
| 76 | +| Field | Type | Size | Description | |
| 77 | +|-------|------|------|-------------| |
| 78 | +| `vout_index` | `compact_size` | 1–9 | Output index. | |
| 79 | +| `coin` | `Coin` | variable | Serialized coin (varint-encoded code for height/coinbase, then compressed txout). | |
| 80 | + |
| 81 | +Coins are ordered lexicographically by outpoint (txid, then vout index), matching the LevelDB iteration |
| 82 | +order of the coins database. |
| 83 | + |
| 84 | +#### Chunk Merkle Tree |
| 85 | + |
| 86 | +The serialized UTXO set (header + body) is split into chunks of exactly 3,900,000 bytes (3.9 MB). The |
| 87 | +last chunk contains the remaining bytes and may be smaller. |
| 88 | + |
| 89 | +The leaf hash for each chunk is `SHA256d(chunk_data)`. The tree is built as a balanced binary tree. When |
| 90 | +the number of nodes at a level is odd, the last node is duplicated before hashing the next level. |
| 91 | +Interior nodes are computed as `SHA256d(left_child || right_child)`. |
| 92 | + |
| 93 | +The Merkle proof for chunk `i` consists of sibling hashes along the path from leaf to root. The |
| 94 | +verifier derives the path direction from the chunk index: at each level, if the current index is even |
| 95 | +the proof hash is the right sibling; if odd, the left sibling. |
| 96 | + |
| 97 | +`SHA256d` denotes double-SHA256: `SHA256d(x) = SHA256(SHA256(x))`. |
| 98 | + |
| 99 | +#### Serialized Hash |
| 100 | + |
| 101 | +The serialized hash is the value that must match with a know value hash of the UTXO set at the respecitve |
| 102 | +height. In Bitcoin Core, for example, the `hash_serialized` field is in the assumeutxo |
| 103 | +parameters. It is computed by iterating over every coin in the set in lexicographic outpoint order and |
| 104 | +feeding a serialized representation of each coin into a SHA256d hasher. The per-coin serialization is: |
| 105 | + |
| 106 | +| Field | Type | Size | Description | |
| 107 | +|-------|------|------|-------------| |
| 108 | +| `outpoint` | `COutPoint` | 36 | Transaction hash (32 bytes) + output index (4 bytes, little-endian). | |
| 109 | +| `code` | `uint32_t` | 4 | `(height << 1) \| coinbase_flag`, little-endian. `height` is the block height at which the coin was created. `coinbase_flag` is 1 if the coin originates from a coinbase transaction, 0 otherwise. | |
| 110 | +| `txout` | `CTxOut` | variable | The transaction output: amount as `int64_t` (8 bytes, little-endian) followed by the scriptPubKey serialized with its `compact_size` length prefix. | |
| 111 | + |
| 112 | +All coin serializations are fed sequentially into a single SHA256d hasher. The resulting 32-byte digest |
| 113 | +is the serialized hash. |
| 114 | + |
| 115 | +### Messages |
| 116 | + |
| 117 | +#### `getutxosetinfo` |
| 118 | + |
| 119 | +Sent to discover which UTXO sets a peer can serve. This message has an empty payload. |
| 120 | + |
| 121 | +A node that has advertised `NODE_UTXO_SET` MUST respond with `utxosetinfo`. A node that has not |
| 122 | +advertised the service bit SHOULD ignore this message. |
| 123 | + |
| 124 | +#### `utxosetinfo` |
| 125 | + |
| 126 | +Sent in response to `getutxosetinfo`. Lists available UTXO sets. |
| 127 | + |
| 128 | +| Field | Type | Size | Description | |
| 129 | +|-------|------|------|-------------| |
| 130 | +| `count` | `compact_size` | 1–9 | Number of available UTXO sets. | |
| 131 | + |
| 132 | +For each available UTXO set: |
| 133 | + |
| 134 | +| Field | Type | Size | Description | |
| 135 | +|-------|------|------|-------------| |
| 136 | +| `height` | `uint32_t` | 4 | Block height. | |
| 137 | +| `block_hash` | `uint256` | 32 | Block hash at that height. | |
| 138 | +| `serialized_hash` | `uint256` | 32 | The UTXO set serialized hash. | |
| 139 | +| `data_length` | `uint64_t` | 8 | Total size of the serialized UTXO set in bytes (header + body). | |
| 140 | +| `merkle_root` | `uint256` | 32 | Root of the Merkle tree computed over chunk hashes. | |
| 141 | + |
| 142 | +A requesting node MUST ignore entries whose `serialized_hash` does not match a known |
| 143 | +utxo set hash for the corresponding height. |
| 144 | + |
| 145 | +#### `getutxoset` |
| 146 | + |
| 147 | +Sent to request a single chunk of UTXO set data. The requesting node MUST have completed header sync |
| 148 | +before sending this message. |
| 149 | + |
| 150 | +| Field | Type | Size | Description | |
| 151 | +|-------|------|------|-------------| |
| 152 | +| `height` | `uint32_t` | 4 | Block height of the requested UTXO set. | |
| 153 | +| `block_hash` | `uint256` | 32 | Block hash at the requested height. | |
| 154 | +| `chunk_index` | `uint32_t` | 4 | Zero-based index of the requested chunk. | |
| 155 | + |
| 156 | +If the serving node cannot fulfill the request, it MUST NOT respond. The requesting node SHOULD apply |
| 157 | +a reasonable timeout and disconnect peers that fail to respond. |
| 158 | + |
| 159 | +#### `utxoset` |
| 160 | + |
| 161 | +Sent in response to `getutxoset`, delivering one chunk with its Merkle proof. |
| 162 | + |
| 163 | +| Field | Type | Size | Description | |
| 164 | +|-------|------|------|-------------| |
| 165 | +| `height` | `uint32_t` | 4 | Block height this data corresponds to. | |
| 166 | +| `block_hash` | `uint256` | 32 | Block hash this data corresponds to. | |
| 167 | +| `chunk_index` | `uint32_t` | 4 | Zero-based index of this chunk. | |
| 168 | +| `proof_length` | `compact_size` | 1–9 | Number of hashes in the Merkle proof. | |
| 169 | +| `proof_hashes` | `uint256[]` | 32 × `proof_length` | Sibling hashes from leaf to root. | |
| 170 | +| `data` | `bytes` | variable | Chunk payload, exactly 3.9 MB except for the last chunk. | |
| 171 | + |
| 172 | +The transfer is receiver-driven: the requesting node sends one `getutxoset` per chunk. Chunks MAY be |
| 173 | +requested in any order and from different peers, provided those peers advertised the same `merkle_root` |
| 174 | +for the same height and block hash. |
| 175 | + |
| 176 | +Upon receiving a `utxoset` message, the node MUST compute `SHA256d(data)` and verify it against the |
| 177 | +`merkle_root` using the provided proof. If verification fails, the node MUST discard the chunk and |
| 178 | +disconnect the peer. A node SHOULD also disconnect a peer that sends a `utxoset` message with fields |
| 179 | +(`chunk_index`, `height`, `block_hash`) that do not match the outstanding request. |
| 180 | + |
| 181 | +After all chunks have been received, the node MUST compute the serialized hash and compare it against a |
| 182 | +known UTXO set hash. If this check fails, the node MUST discard all data and |
| 183 | +SHOULD disconnect all peers that advertised the corresponding Merkle root. |
| 184 | + |
| 185 | +### Protocol Flow |
| 186 | + |
| 187 | +1. The requesting node identifies peers advertising `NODE_UTXO_SET`. |
| 188 | +2. The requesting node sends `getutxosetinfo` to one or more of these peers. |
| 189 | +3. Each peer responds with `utxosetinfo`. The requesting node verifies that the advertised |
| 190 | + `serialized_hash` matches a known UTXO set hash, compares `merkle_root` values across peers, |
| 191 | + and selects a UTXO set whose Merkle root has agreement among multiple peers. |
| 192 | +4. The requesting node downloads chunks via `getutxoset`/`utxoset` exchanges, verifying each chunk |
| 193 | + against the Merkle root on receipt. On verification failure the peer is disconnected and download |
| 194 | + continues from another peer without losing already-verified chunks. |
| 195 | +5. After all chunks are received, the node computes the full serialized hash and verifies it against |
| 196 | + the known UTXO set hash. |
| 197 | + |
| 198 | +Serving nodes are free to limit the number of concurrent and repeated transfers per peer at their own |
| 199 | +discretion to manage resource consumption. |
| 200 | + |
| 201 | +## Rationale |
| 202 | + |
| 203 | +**Usage of service bit 12:** Service bits allow selective peer discovery through |
| 204 | +DNS seeds and addr relay. Bit 12 is chosen as the next unassigned bit after `NODE_P2P_V2` (bit 11, BIP 324). |
| 205 | + |
| 206 | +**Serialized hash in `utxosetinfo`:** The requesting node should have access to a known UTXO set hash |
| 207 | +before initiating the process. Including the serialized hash in the advertisement lets the requester |
| 208 | +immediately filter out peers claiming a different UTXO set state before downloading any data. |
| 209 | + |
| 210 | +**Discovery before download:** The `getutxosetinfo`/`utxosetinfo` exchange lets the requesting node |
| 211 | +confirm availability, verify the serialized hash, and learn the Merkle root before committing to a large |
| 212 | +transfer. |
| 213 | + |
| 214 | +**Per-chunk Merkle verification:** In the Bitcoin P2P protocol, every larger piece of data received during |
| 215 | +normal operation (blocks, transactions, compact block filters) can be verified independently before |
| 216 | +requesting more. Without per-chunk verification, a UTXO set transfer would be an anomaly: ~10 GB (as of early 2026) |
| 217 | +of data verifiable only after complete receipt. The Merkle tree enables incremental verifiability, allowing for |
| 218 | +immediate detection of corrupt data, peer switching without data loss, and parallel download from |
| 219 | +multiple peers. The overhead is minimal (~384 bytes of proof per 3.9 MB chunk). The specified |
| 220 | +serialization is deterministic, so all honest nodes produce byte-identical output, guaranteeing Merkle |
| 221 | +root agreement. |
| 222 | + |
| 223 | +**3.9 MB chunk size:** The number balances round trips (~2,560 for a ~10 GB set) against memory usage |
| 224 | +for buffering and verifying a single chunk. Smaller chunks would increase protocol overhead; larger |
| 225 | +chunks would increase memory pressure on constrained devices commonly used to run Bitcoin nodes. |
| 226 | +Together with the additional message overhead, the `utxoset` message including the chunk data also |
| 227 | +sits just below the theoretical maximum block size which means any implementation should be able to |
| 228 | +handle messages of this size. |
| 229 | + |
| 230 | +**Reusing the `dumptxoutset` format:** Avoids introducing a new serialization format and ensures |
| 231 | +compatibility with UTXO sets already being generated and shared. |
| 232 | + |
| 233 | +**Relationship to BIP 64:** BIP 64 defined a protocol for querying individual UTXOs by outpoint and is |
| 234 | +now closed. This BIP addresses a different use case: bulk transfer of the entire UTXO set for node |
| 235 | +bootstrapping. |
| 236 | + |
| 237 | +## Reference Implementation |
| 238 | + |
| 239 | +TBD |
| 240 | + |
| 241 | +## Copyright |
| 242 | + |
| 243 | +This BIP is made available under the terms of the 2-clause BSD license. See |
| 244 | +https://opensource.org/license/BSD-2-Clause for more information. |
0 commit comments