diff --git a/.github/ISSUE_TEMPLATE/DMP_2026.yml b/.github/ISSUE_TEMPLATE/DMP_2026.yml new file mode 100644 index 000000000..9d68f7f51 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/DMP_2026.yml @@ -0,0 +1,345 @@ +name: DMP 2026 Project Template +description: List a new project for Dedicated Mentoring Program (DMP) 2026 +title: "[DMP 2026]: " +labels: ["DMP 2026"] +body: + - type: textarea + id: ticket-description + validations: + required: true + attributes: + label: Ticket Contents + value: | + ## Description + [Provide a brief description of the feature, including why it is needed and what it will accomplish.] + + - type: textarea + id: ticket-goals + validations: + required: true + attributes: + label: Goals & Mid-Point Milestone + description: List the goals of the feature. Please add the goals that must be achieved by Mid-point check-in i.e 1.5 months into the coding period. + value: | + ## Goals + - [ ] [Goal 1] + - [ ] [Goal 2] + - [ ] [Goal 3] + - [ ] [Goal 4] + - [ ] [Goals Achieved By Mid-point Milestone] + + - type: textarea + id: ticket-setup + attributes: + label: Setup/Installation + description: Please list or link setup or installation guide (if any) + + - type: textarea + id: ticket-expected-outcome + attributes: + label: Expected Outcome + description: Describe in detail what the final product or result should look like and how it should behave. + + - type: textarea + id: ticket-acceptance-criteria + attributes: + label: Acceptance Criteria + description: List the acceptance criteria for this feature. + + - type: textarea + id: ticket-implementation-details + validations: + required: true + attributes: + label: Implementation Details + description: List any technical details about the proposed implementation, including any specific technologies that will be used. + + - type: textarea + id: ticket-mockups + attributes: + label: Mockups/Wireframes + description: Include links to any visual aids, mockups, wireframes, or diagrams that help illustrate what the final product should look like. This is not always necessary, but can be very helpful in many cases. + + - type: input + id: ticket-product + attributes: + label: Product Name + placeholder: Enter Product Name + validations: + required: true + + - type: dropdown + id: ticket-organisation + attributes: + label: Organisation Name + description: Enter Organisation Name + multiple: false + options: + - Agami + - Argusoft + - ARMMAN + - Avanti Fellows + - Bandhu + - Beckn + - Belongg + - Blockster Global (CREDBEL) + - Blockster Labs / AyanWorks + - CBoard + - CHAOSS + - CHAOSS Africa + GWU + - Civis + - ConveGenius + - Consul Democracy + - COSS + - CranberryFit + - Development Gateway + - DHIS2 + - Dhiway + - Dhwani + - Digital Green + - Digital India + - Dimagi + - Drupal + - Education Initiative + - eGov + - EkShop Marketplace + - FIDE + - FinternetLabs + - Flywheel + - GovDirectory + - Haqdarshak + - Healthsites.io + - IDinsight + - If Me + - IIIT Delhi + - IIT Bombay + - IIT Delhi + - Impactyaan + - Indus Action + - Intel Health + - Key Education Foundation + - Khushi Baby + - Learning Economy + - Linux Foundation + - Mecha Systems + - Medic Mobile + - Medtronic Labs + - MetaBrainz + - Mifos + - Mojaloop + - MOSIP + - NASSCOM Foundation + - NHA + - NIUA + - Norwegian Meteorological Institute + - NSUT x SEETA x AIC + - ONDC + - ONEST + - Open Healthcare Network + - OpenCRVS + - OpenFn + - OpenIMIS + - OpenMRS + - OpenSPP + - Piramal Swasthya + - Planet Read + - Policy Engine + - Pratham Books + - Project Second Chance + - Project Tech4Dev + - Protean + - RCTS-IIITH + - Reap Benefit + - Resolve to Save Lives + - Rocket Learning + - Rumsan + - Sahamati + - SamagraX + - Samanvay Foundation + - Sampatti Card + - Sanketika + - ShikshaLokam + - SimPPL + - Sugar Labs + - Swasth Alliance + - Swecha + - Tarento + - Tattle + - Tech4Dev + - Tekdi + - The Apprentice Project + - The Mifos Initiative + - Thoughtworks + - Tibil + - TinkerHub + - Trustin + - Tuner Labs + - TYCIA + - UNICEF + - United Nations + - Ushahidi + - Win Over Cancer + - WRI + - Zendalona + - Zenysis + - Arghyam + validations: + required: true + + - type: dropdown + id: ticket-governance-domain + attributes: + label: Domain + options: + - ⁠Healthcare + - ⁠Education + - Financial Inclusion + - ⁠Livelihoods + - ⁠Skilling + - ⁠Learning & Development + - ⁠Agriculture + - ⁠Service Delivery + - Open Source Library + - Water + validations: + required: true + + + - type: dropdown + id: ticket-technical-skills-required + attributes: + label: Tech Skills Needed + description: Select the technologies needed for this ticket (use Ctrl or Command to select multiple) + multiple: true + options: + - .NET + - Angular + - Artificial Intelligence + - ASP.NET + - AWS + - Babel + - Bootstrap + - C# + - Chart.js + - CI/CD + - Computer Vision + - CORS + - cURL + - Cypress + - D3.js + - Database + - Debugging + - Design + - DevOps + - Django + - Docker + - Electron + - ESLint + - Express.js + - Feature + - Flask + - Go + - GraphQL + - HTML + - Ionic + - Jest + - Java + - JavaScript + - Jenkins + - JWT + - Kubernetes + - Laravel + - Machine Learning + - Maintenance + - Markdown + - Material-UI + - Microservices + - MongoDB + - Mobile + - Mockups + - Mocha + - Natural Language Processing + - NestJS + - Node.js + - NUnit + - OAuth + - Performance Improvement + - Prettier + - Python + - Question + - React + - React Native + - Redux + - RESTful APIs + - Ruby + - Ruby on Rails + - Rust + - Scala + - Security + - Selenium + - SEO + - Serverless + - Solidity + - Spring Boot + - SQL + - Swagger + - Tailwind CSS + - Test + - Testing Library + - Three.js + - TypeScript + - UI/UX/Design + - Virtual Reality + - Vue.js + - WebSockets + - Webpack + - Other + validations: + required: true + + - type: textarea + id: ticket-mentors + attributes: + label: Mentor(s) + description: Please tag relevant mentors for the ticket + validations: + required: true + + - type: dropdown + id: ticket-category + attributes: + label: Category + description: Choose the categories that best describe your ticket + multiple: true + options: + - API + - Analytics + - Accessibility + - Backend + - Breaking Change + - Beginner Friendly + - Configuration + - CI/CD + - Database + - Data Science + - Deprecation + - Documentation + - Delpoyment + - Frontend + - Internationalization + - Localization + - Machine Learning + - Maintenance + - Mobile + - Performance Improvement + - Question + - Refactoring + - Research + - Needs Reproduction + - SEO + - Security + - Testing + - AI + - Other + validations: + required: true diff --git a/.github/workflows/DMP_2026.yml b/.github/workflows/DMP_2026.yml new file mode 100644 index 000000000..9d68f7f51 --- /dev/null +++ b/.github/workflows/DMP_2026.yml @@ -0,0 +1,345 @@ +name: DMP 2026 Project Template +description: List a new project for Dedicated Mentoring Program (DMP) 2026 +title: "[DMP 2026]: " +labels: ["DMP 2026"] +body: + - type: textarea + id: ticket-description + validations: + required: true + attributes: + label: Ticket Contents + value: | + ## Description + [Provide a brief description of the feature, including why it is needed and what it will accomplish.] + + - type: textarea + id: ticket-goals + validations: + required: true + attributes: + label: Goals & Mid-Point Milestone + description: List the goals of the feature. Please add the goals that must be achieved by Mid-point check-in i.e 1.5 months into the coding period. + value: | + ## Goals + - [ ] [Goal 1] + - [ ] [Goal 2] + - [ ] [Goal 3] + - [ ] [Goal 4] + - [ ] [Goals Achieved By Mid-point Milestone] + + - type: textarea + id: ticket-setup + attributes: + label: Setup/Installation + description: Please list or link setup or installation guide (if any) + + - type: textarea + id: ticket-expected-outcome + attributes: + label: Expected Outcome + description: Describe in detail what the final product or result should look like and how it should behave. + + - type: textarea + id: ticket-acceptance-criteria + attributes: + label: Acceptance Criteria + description: List the acceptance criteria for this feature. + + - type: textarea + id: ticket-implementation-details + validations: + required: true + attributes: + label: Implementation Details + description: List any technical details about the proposed implementation, including any specific technologies that will be used. + + - type: textarea + id: ticket-mockups + attributes: + label: Mockups/Wireframes + description: Include links to any visual aids, mockups, wireframes, or diagrams that help illustrate what the final product should look like. This is not always necessary, but can be very helpful in many cases. + + - type: input + id: ticket-product + attributes: + label: Product Name + placeholder: Enter Product Name + validations: + required: true + + - type: dropdown + id: ticket-organisation + attributes: + label: Organisation Name + description: Enter Organisation Name + multiple: false + options: + - Agami + - Argusoft + - ARMMAN + - Avanti Fellows + - Bandhu + - Beckn + - Belongg + - Blockster Global (CREDBEL) + - Blockster Labs / AyanWorks + - CBoard + - CHAOSS + - CHAOSS Africa + GWU + - Civis + - ConveGenius + - Consul Democracy + - COSS + - CranberryFit + - Development Gateway + - DHIS2 + - Dhiway + - Dhwani + - Digital Green + - Digital India + - Dimagi + - Drupal + - Education Initiative + - eGov + - EkShop Marketplace + - FIDE + - FinternetLabs + - Flywheel + - GovDirectory + - Haqdarshak + - Healthsites.io + - IDinsight + - If Me + - IIIT Delhi + - IIT Bombay + - IIT Delhi + - Impactyaan + - Indus Action + - Intel Health + - Key Education Foundation + - Khushi Baby + - Learning Economy + - Linux Foundation + - Mecha Systems + - Medic Mobile + - Medtronic Labs + - MetaBrainz + - Mifos + - Mojaloop + - MOSIP + - NASSCOM Foundation + - NHA + - NIUA + - Norwegian Meteorological Institute + - NSUT x SEETA x AIC + - ONDC + - ONEST + - Open Healthcare Network + - OpenCRVS + - OpenFn + - OpenIMIS + - OpenMRS + - OpenSPP + - Piramal Swasthya + - Planet Read + - Policy Engine + - Pratham Books + - Project Second Chance + - Project Tech4Dev + - Protean + - RCTS-IIITH + - Reap Benefit + - Resolve to Save Lives + - Rocket Learning + - Rumsan + - Sahamati + - SamagraX + - Samanvay Foundation + - Sampatti Card + - Sanketika + - ShikshaLokam + - SimPPL + - Sugar Labs + - Swasth Alliance + - Swecha + - Tarento + - Tattle + - Tech4Dev + - Tekdi + - The Apprentice Project + - The Mifos Initiative + - Thoughtworks + - Tibil + - TinkerHub + - Trustin + - Tuner Labs + - TYCIA + - UNICEF + - United Nations + - Ushahidi + - Win Over Cancer + - WRI + - Zendalona + - Zenysis + - Arghyam + validations: + required: true + + - type: dropdown + id: ticket-governance-domain + attributes: + label: Domain + options: + - ⁠Healthcare + - ⁠Education + - Financial Inclusion + - ⁠Livelihoods + - ⁠Skilling + - ⁠Learning & Development + - ⁠Agriculture + - ⁠Service Delivery + - Open Source Library + - Water + validations: + required: true + + + - type: dropdown + id: ticket-technical-skills-required + attributes: + label: Tech Skills Needed + description: Select the technologies needed for this ticket (use Ctrl or Command to select multiple) + multiple: true + options: + - .NET + - Angular + - Artificial Intelligence + - ASP.NET + - AWS + - Babel + - Bootstrap + - C# + - Chart.js + - CI/CD + - Computer Vision + - CORS + - cURL + - Cypress + - D3.js + - Database + - Debugging + - Design + - DevOps + - Django + - Docker + - Electron + - ESLint + - Express.js + - Feature + - Flask + - Go + - GraphQL + - HTML + - Ionic + - Jest + - Java + - JavaScript + - Jenkins + - JWT + - Kubernetes + - Laravel + - Machine Learning + - Maintenance + - Markdown + - Material-UI + - Microservices + - MongoDB + - Mobile + - Mockups + - Mocha + - Natural Language Processing + - NestJS + - Node.js + - NUnit + - OAuth + - Performance Improvement + - Prettier + - Python + - Question + - React + - React Native + - Redux + - RESTful APIs + - Ruby + - Ruby on Rails + - Rust + - Scala + - Security + - Selenium + - SEO + - Serverless + - Solidity + - Spring Boot + - SQL + - Swagger + - Tailwind CSS + - Test + - Testing Library + - Three.js + - TypeScript + - UI/UX/Design + - Virtual Reality + - Vue.js + - WebSockets + - Webpack + - Other + validations: + required: true + + - type: textarea + id: ticket-mentors + attributes: + label: Mentor(s) + description: Please tag relevant mentors for the ticket + validations: + required: true + + - type: dropdown + id: ticket-category + attributes: + label: Category + description: Choose the categories that best describe your ticket + multiple: true + options: + - API + - Analytics + - Accessibility + - Backend + - Breaking Change + - Beginner Friendly + - Configuration + - CI/CD + - Database + - Data Science + - Deprecation + - Documentation + - Delpoyment + - Frontend + - Internationalization + - Localization + - Machine Learning + - Maintenance + - Mobile + - Performance Improvement + - Question + - Refactoring + - Research + - Needs Reproduction + - SEO + - Security + - Testing + - AI + - Other + validations: + required: true diff --git a/pubsub/gossipsub/README.md b/pubsub/gossipsub/README.md index f119e3935..314c86e21 100644 --- a/pubsub/gossipsub/README.md +++ b/pubsub/gossipsub/README.md @@ -17,20 +17,23 @@ If you are new to Gossipsub and/or PubSub in general, we recommend you to first: - [gossipsub-v1.0](gossipsub-v1.0.md): v1.0 of the gossipsub protocol. This is a revised specification, to use a more normative language. The original v1.0 specification is [here](gossipsub-v1.0-old.md), still a good read. - [gossipsub-v1.1](gossipsub-v1.1.md): v1.1 of the gossipsub protocol. - [gossipsub-v1.2](gossipsub-v1.2.md): v1.2 of the gossipsub protocol. This includes the aggregation of the IDONTWANT control messages to the specs. +- [gossipsub-v1.3](gossipsub-v1.3.md): v1.3 of the gossipsub protocol. Introduces the Extensions Control Message framework. +- [gossipsub-v1.4](gossipsub-v1.4.md): v1.4 of the gossipsub protocol. Large message propagation via fragmentation, staggering, PREAMBLE, and IMRECEIVING. + - [Design Document](design-document.md): Architectural rationale, prototype analysis (nim-libp2p, py-libp2p), and design decisions for the v1.4 specification. - [(not in use) episub](episub.md): a research note on a protocol building on top of gossipsub to implement [epidemic broadcast trees](https://www.gsd.inesc-id.pt/~ler/reports/srds07.pdf). ## Implementation status Legend: ✅ = complete, 🏗 = in progress, ❕ = not started yet -| Name | v1.0 | v1.1 | v1.2 | -|--------------------------------------------------------------------------------------------------|:-----:|:-----:|:----:| -| [go-libp2p-pubsub (Golang)](https://github.com/libp2p/go-libp2p-pubsub/blob/master/gossipsub.go) | ✅ | ✅ | ✅ | -| [js-libp2p-gossipsub (JavaScript)](https://github.com/ChainSafe/js-libp2p-gossipsub) | ✅ | ✅ | ✅ | -| [rust-libp2p (Rust)](https://github.com/libp2p/rust-libp2p/tree/master/protocols/gossipsub) | ✅ | ✅ | ❔ | -| [py-libp2p (Python)](https://github.com/libp2p/py-libp2p/tree/master/libp2p/pubsub) | ✅ | 🏗 | ❔ | -| [jvm-libp2p (Java/Kotlin)](https://github.com/libp2p/jvm-libp2p/tree/develop/src/main/kotlin/io/libp2p/pubsub) | ✅ | 🏗 | ✅ | -| [nim-libp2p (Nim)](https://github.com/status-im/nim-libp2p/blob/master/libp2p/protocols/pubsub/gossipsub.nim) | ✅ | 🏗 | ✅ | +| Name | v1.0 | v1.1 | v1.2 | v1.3 | v1.4 | +|--------------------------------------------------------------------------------------------------|:-----:|:-----:|:----:|:----:|:----:| +| [go-libp2p-pubsub (Golang)](https://github.com/libp2p/go-libp2p-pubsub/blob/master/gossipsub.go) | ✅ | ✅ | ✅ | 🏗 | ❕ | +| [js-libp2p-gossipsub (JavaScript)](https://github.com/ChainSafe/js-libp2p-gossipsub) | ✅ | ✅ | ✅ | ❕ | ❕ | +| [rust-libp2p (Rust)](https://github.com/libp2p/rust-libp2p/tree/master/protocols/gossipsub) | ✅ | ✅ | ❔ | ❕ | ❕ | +| [py-libp2p (Python)](https://github.com/libp2p/py-libp2p/tree/master/libp2p/pubsub) | ✅ | 🏗 | ❔ | ❕ | 🏗 | +| [jvm-libp2p (Java/Kotlin)](https://github.com/libp2p/jvm-libp2p/tree/develop/src/main/kotlin/io/libp2p/pubsub) | ✅ | 🏗 | ✅ | ❕ | ❕ | +| [nim-libp2p (Nim)](https://github.com/status-im/nim-libp2p/blob/master/libp2p/protocols/pubsub/gossipsub.nim) | ✅ | 🏗 | ✅ | ❕ | 🏗 | Additional tooling: diff --git a/pubsub/gossipsub/design-document.md b/pubsub/gossipsub/design-document.md new file mode 100644 index 000000000..fbe2f03d1 --- /dev/null +++ b/pubsub/gossipsub/design-document.md @@ -0,0 +1,542 @@ +# Gossipsub v1.4 Large Message Propagation — Design Document + +| Document Type | Design Document | +| --------------- | ---------------------------------------------------- | +| Specification | [gossipsub-v1.4.md](./gossipsub-v1.4.md) | +| Status | Living Document — tracks design rationale and prototype learnings | +| Author | [@NomzzNJS](https://github.com/NomzzNJS) | +| Created | 2026-05-16 | + +--- + +## 1. Executive Summary + +This design document captures the architectural rationale, prototype analysis, +and design decisions behind the Gossipsub v1.4 specification for large message +propagation. It serves as the companion reference to the normative specification +in [gossipsub-v1.4.md](./gossipsub-v1.4.md). + +The design synthesizes findings from two independent prototype implementations +(**nim-libp2p** by the Vac/Logos team and **py-libp2p**) and two peer-reviewed +research papers to define four complementary mechanisms: **Message +Fragmentation**, **Message Staggering**, **PREAMBLE**, and **IMRECEIVING**. + +--- + +## 2. Problem Analysis + +### 2.1 The Store-and-Forward Bottleneck + +Standard Gossipsub uses a store-and-forward relay model: each peer must fully +receive a message before forwarding it to mesh neighbors. For a message of size +`L` bytes traversing a path of `h` hops, each with link data rate `R`: + +``` +Total store-and-forward delay = h × (L / R) +``` + +For a 1 MB message across 6 hops on a 10 Mbps link: +- Per-hop transmission time: ~0.8 seconds +- Cumulative delay: ~4.8 seconds (store-and-forward alone) + +This multiplicative delay is the primary scalability bottleneck for large +messages. + +### 2.2 The IDONTWANT Timing Gap + +Gossipsub v1.2 introduced `IDONTWANT` — a control message sent *after* full +reception to suppress further sends. However, for large messages: + +``` +Timeline (without v1.4): + t=0 Peer A starts sending 1MB to Peer B + t=0 Peer A simultaneously starts sending 1MB to Peer C + t=0.8s Peer B finishes receiving, sends IDONTWANT + t=0.8s Peer C finishes receiving (REDUNDANT — too late!) +``` + +The fundamental issue: simultaneous sends mean `IDONTWANT` always arrives too +late. The entire message has already been transmitted redundantly. + +### 2.3 Bandwidth Amplification + +In a standard gossipsub mesh with degree `D` (default 6–12), a peer forwards +every message to all `D` mesh neighbors simultaneously. For a 1 MB message +with `D=8`: + +- **Ideal bandwidth** (no redundancy): 1 MB per peer +- **Actual bandwidth** (with duplicates): up to 8 MB per peer outbound +- **Network-wide amplification**: O(D) redundancy factor per hop + +Research measurements (arXiv:2504.10365) show this results in bandwidth +utilization that scales poorly with message size, becoming the dominant cost +above ~64 KiB. + +### 2.4 Real-World Impact + +| System | Payload Size | Impact | +|--------|-------------|--------| +| Ethereum blocks (post-EIP-4844) | 128 KiB – 1 MB+ (with blobs) | Block propagation latency directly affects attestation timing and chain finality | +| Distributed AI model updates | 1 MB – 100 MB | Gradient aggregation round-trip time limits training throughput | +| Waku relay messages | Variable, growing | Store-and-forward delays compound across relay hops | +| State snapshots | 10 MB+ | Sync time for new nodes joining the network | + +--- + +## 3. Prototype Analysis + +### 3.1 nim-libp2p Prototype (Vac/Logos) + +**Repository**: [vacp2p/nim-libp2p](https://github.com/vacp2p/nim-libp2p) + +The Vac research team developed the primary proof-of-concept implementation in +nim-libp2p, testing via the Shadow discrete-event network simulator. + +#### Key Implementation Details + +- **Stagger-send branches**: Experimental branches (e.g., `staggersend`) + implemented sequential peer forwarding with configurable group sizes (1, 2, 3, + 4 parallel sends). +- **Fragment relay**: Fragments are forwarded as individual protocol messages + without waiting for full reassembly. +- **PREAMBLE/IMRECEIVING**: Implemented as new `ControlMessage` fields, sent + inline with the existing gossipsub RPC framing. +- **Shadow simulator testing**: Evaluated across 2,000–12,000 node networks + with message sizes from 200 KB to 1 MB. + +#### Prototype Results (Shadow Simulator) + +| Configuration | Latency Reduction | Bandwidth Reduction | +|---------------|-------------------|---------------------| +| Staggering only (1 parallel) | ~20% | ~30% | +| Fragmentation only (64 KB) | ~56% | ~15% | +| Stagger + Fragment | ~64% | ~45% | +| Stagger + Fragment + PREAMBLE + IMRECEIVING | Up to 35% additional | Up to 61% total | + +#### Design Lessons from nim-libp2p + +1. **Stagger interval sensitivity**: Too short (~50 ms) provides insufficient + time for IDONTWANT propagation. Too long (~500 ms) adds excessive total + relay time. The 200 ms default was empirically determined to balance these + tradeoffs across typical mesh topologies. + +2. **Fragment size tradeoffs**: Smaller fragments (16 KB) increase protocol + overhead (more fragment headers). Larger fragments (256 KB) reduce the + pipeline parallelism benefit. The 64 KB default provides the best + latency/overhead balance. + +3. **Fragment forwarding is critical**: The key latency win comes from + forwarding fragments *before* full reassembly. Without this, fragmentation + only reduces individual transmission sizes but doesn't eliminate + store-and-forward delay accumulation. + +4. **IMRECEIVING fills the IDONTWANT gap**: In the nim-libp2p prototype, + IMRECEIVING reduced redundant transmissions by an additional 20–30% beyond + what IDONTWANT alone achieved, because it provides *immediate* suppression + at the start of reception rather than after completion. + +### 3.2 py-libp2p Prototype + +**Repository**: [libp2p/py-libp2p](https://github.com/libp2p/py-libp2p) + +The py-libp2p implementation focused on application-layer fragmentation and +reassembly patterns, demonstrating the feasibility of the approach in a +dynamically-typed, event-loop-based runtime. + +#### Key Implementation Characteristics + +- **asyncio-based fragment relay**: Fragments are handled as individual async + tasks, enabling natural pipeline parallelism via Python's event loop. +- **Reassembly buffer management**: Implemented with per-peer, per-message + dictionaries keyed by `(peer_id, message_id)` with timeout-based cleanup. +- **IDONTWANT integration**: Built on py-libp2p's existing v1.2 support, + extending `dont_send_message_ids` to handle IMRECEIVING signals. + +#### Design Lessons from py-libp2p + +1. **Memory management is critical**: Without `max_pending_fragments` limits, + a malicious peer can exhaust memory by sending fragments for many fake + message IDs. The prototype demonstrated the need for both per-peer and + global reassembly buffer limits. + +2. **Fragment ordering is not guaranteed**: Network conditions can deliver + fragments out of order. The reassembly buffer must handle arbitrary + insertion order, not assume sequential delivery. + +3. **Timeout tuning matters**: The 30-second `fragment_timeout` was chosen to + accommodate high-latency network conditions while preventing indefinite + resource consumption. + +--- + +## 4. Design Decisions and Rationale + +### 4.1 Four Mechanisms, Not One + +**Decision**: Include all four mechanisms (fragmentation, staggering, PREAMBLE, +IMRECEIVING) as a unified extension rather than four separate extensions. + +**Rationale**: The research demonstrates that these mechanisms are +*synergistic* — each one amplifies the effectiveness of the others: + +``` + PREAMBLE + (announces message) + │ + ▼ + ┌─────────────────────┐ + │ Receiver learns │ + │ message is coming │ + └─────────┬───────────┘ + │ + ┌─────────▼───────────┐ + │ Sends IMRECEIVING │──── Immediate suppression + │ to mesh peers │ (no waiting for full rx) + └─────────┬───────────┘ + │ + ┌─────────▼───────────┐ + │ Staggered sends │──── Gives time for IDONTWANT + │ to remaining peers│ + IMRECEIVING to propagate + └─────────┬───────────┘ + │ + ┌─────────▼───────────┐ + │ Fragmented relay │──── Pipeline parallelism + │ across hops │ eliminates store-and-forward + └─────────────────────┘ +``` + +Deploying them independently yields diminishing returns. Combined, they achieve +the full 61% bandwidth reduction and 35% latency improvement. + +### 4.2 Protocol Version Bump (v1.4) vs. Extension-Only + +**Decision**: Assign a new protocol version (`/meshsub/1.4.0`) rather than +registering only as a v1.3 extension. + +**Rationale**: +- The scope introduces **3 new protobuf message types** and **2 new control + message fields** — comparable to v1.2's scope (which introduced IDONTWANT). +- A version bump makes capability detection straightforward: peers can check + the protocol ID to know if large message handling is supported. +- The v1.3 extension mechanism is still used for fine-grained advertisement + (`largeMessageHandling = 11` in `ControlExtensions`). + +### 4.3 Fragment Size: 64 KiB Default + +**Decision**: Default `fragment_size` = 65536 bytes (64 KiB). + +**Rationale**: + +| Fragment Size | Pros | Cons | +|---------------|------|------| +| 16 KiB | Maximum parallelism | High per-fragment overhead; many small RPC messages | +| 32 KiB | Good parallelism | Moderate overhead | +| **64 KiB** | **Best latency/overhead balance (empirically validated)** | **Moderate parallelism** | +| 128 KiB | Low overhead | Reduced pipeline benefit; closer to store-and-forward | +| 256 KiB | Minimal overhead | Minimal pipeline benefit | + +The 64 KiB default aligns with: +- Common network MTU multiples (avoids IP fragmentation at lower layers) +- The research paper defaults used in Shadow simulations +- Typical OS socket buffer sizes + +### 4.4 Configurable Thresholds + +**Decision**: All thresholds (`fragmentation_threshold`, `stagger_threshold`, +`preamble_threshold`) are configurable parameters, not hard-coded values. + +**Rationale**: Different deployments have different message size distributions: +- Ethereum networks: most messages are small (transactions), with periodic + large bursts (blocks with blobs) +- AI coordination: consistently large payloads +- General pubsub: unpredictable mix + +Application operators must be able to tune when each mechanism activates. + +### 4.5 IMRECEIVING as Advisory (Not Mandatory) + +**Decision**: IMRECEIVING is advisory — a peer MAY still send after receiving +IMRECEIVING, and doing so MUST NOT be penalized. + +**Rationale**: +- **Probabilistic nature**: IMRECEIVING signals *intent* to receive, not + *confirmed* reception. The reception might fail (network error, timeout). +- **Preventing censorship**: If IMRECEIVING were binding, a malicious peer + could send IMRECEIVING for messages it never intends to receive, effectively + censoring message delivery to its neighbors. +- **Consistent with IDONTWANT**: v1.2's IDONTWANT uses the same advisory + model. Maintaining consistency simplifies implementation. + +### 4.6 Fragment Forwarding Before Validation + +**Decision**: Fragments MAY be forwarded before the full message is reassembled +and validated. Forwarded fragments are *tentatively valid*. + +**Rationale**: This is the single most important design decision for latency +reduction. Waiting for full reassembly before forwarding would eliminate the +pipeline parallelism benefit entirely. The tradeoff: + +- **Risk**: A peer may forward fragments of a message that ultimately fails + validation, wasting bandwidth. +- **Mitigation**: Scoring penalties (P₄ from v1.1) are applied retroactively + to the original sender upon reassembly failure. +- **Bounded risk**: Fragment forwarding only occurs between v1.4-capable peers. + Non-v1.4 peers always receive fully validated messages. + +--- + +## 5. Protocol Flow Diagrams + +### 5.1 Complete v1.4 Message Flow (Happy Path) + +``` +Publisher Peer A Peer B Peer C + │ │ │ │ + │ Large msg M │ │ │ + │ (512 KB) │ │ │ + │ │ │ │ + │──PREAMBLE(M)──▶│ │ │ + │──PREAMBLE(M)───────────────────────────────────────▶│ + │──PREAMBLE(M)──────────────────▶│ │ + │ │ │ │ + │ │ ◄──IMRECEIVING(M)──(Peer B tells │ + │ │ C it's getting M) │ + │ │ │──IMRECEIVING(M)─▶│ + │ │ │ │ + │──Frag[0]──────▶│ │ │ + │ (stagger │──Frag[0]────────▶│ │ + │ 200ms wait) │ │──Frag[0]────────▶│ + │ │ │ │ + │──Frag[1]──────▶│ │ │ + │ │──Frag[1]────────▶│ │ + │ │ │──Frag[1]────────▶│ + │ ... │ ... │ ... │ + │ │ │ │ + │──Frag[7]──────▶│ │ │ + │ │ IDONTWANT(M)───▶│ │ + │ │ (A has full M) │ │ + │ │──Frag[7]────────▶│ │ + │ │ │──Frag[7]────────▶│ + │ │ │ │ + │ │ Reassemble M │ Reassemble M │ + │ │ Validate M ✓ │ Validate M ✓ │ +``` + +### 5.2 Redundancy Suppression Flow + +``` + Without v1.4 With v1.4 + ┌──────────────────────────┐ ┌──────────────────────────┐ + │ │ │ │ + │ Peer X ──1MB──▶ B │ │ Peer X ──PREAMBLE──▶ B │ + │ Peer Y ──1MB──▶ B │ │ B ──IMRECEIVING──▶ Y │ + │ Peer Z ──1MB──▶ B │ │ Peer Y: skip B │ + │ │ │ Peer Z: skip B │ + │ B receives 3 copies │ │ B receives 1 copy │ + │ Bandwidth: 3 MB │ │ Bandwidth: 1 MB │ + │ │ │ │ + └──────────────────────────┘ └──────────────────────────┘ +``` + +### 5.3 Fragment Reassembly State Machine + +``` + ┌──────────────┐ + │ IDLE │ + │ (no state) │ + └──────┬───────┘ + │ Receive first fragment + │ OR PREAMBLE + ▼ + ┌──────────────┐ + ┌────▶│ RECEIVING │◀─── Receive fragment + │ │ │ (store in buffer) + │ └──────┬───────┘ + │ │ + │ ┌──────┴───────┐ + │ │ │ + ▼ ▼ ▼ + ┌──────────────┐ ┌──────────────┐ + │ TIMEOUT │ │ COMPLETE │ + │ │ │ │ + │ Discard │ │ Reassemble │ + │ fragments │ │ Validate │ + │ Clear buffer │ │ Deliver/Fwd │ + └──────────────┘ └──────────────┘ +``` + +--- + +## 6. Wire Format Design + +### 6.1 Protobuf Integration Strategy + +The v1.4 messages integrate into the existing gossipsub RPC framing defined in +[extensions.proto](./extensions/extensions.proto): + +``` +RPC +├── subscriptions[] (existing) +├── publish[] (existing) +├── control (existing ControlMessage) +│ ├── ihave[] (v1.0) +│ ├── iwant[] (v1.0) +│ ├── graft[] (v1.0) +│ ├── prune[] (v1.0) +│ ├── idontwant[] (v1.2) +│ ├── extensions (v1.3) +│ ├── preamble[] (v1.4 — NEW) +│ └── imreceiving[] (v1.4 — NEW) +├── partial (extension) +└── largeMessageFragments[] (v1.4 — NEW) +``` + +**Design choice**: `LargeMessageFragment` is placed in the `RPC` message (not +`ControlMessage`) because fragments carry application data payload, not control +signaling. This follows the pattern of `publish[]` (which also carries data) +vs. `ControlMessage` (which carries routing metadata). + +### 6.2 Field Number Allocation + +| Message | Field | Number | Rationale | +|---------|-------|--------|-----------| +| `ControlExtensions.largeMessageHandling` | bool | 11 | Next canonical extension after `partialMessages` (10) | +| `ControlMessage.preamble` | repeated | 7 | Next after `extensions` (6) | +| `ControlMessage.imreceiving` | repeated | 8 | Sequential after `preamble` (7) | +| `RPC.largeMessageFragments` | repeated | 12 | Next available after `partial` (10), skipping 11 | + +Canonical field numbers (small integers, 1-byte varint encoding) are used +because this is a formal protocol version, not an experimental extension. + +### 6.3 Message Size Overhead Analysis + +| Control Message | Encoded Size (typical) | When Sent | +|-----------------|------------------------|-----------| +| `ControlPreamble` | ~40 bytes (32B msgID + 8B size + topic) | Once per large message per peer | +| `ControlImReceiving` | ~34 bytes (32B msgID + overhead) | Once per large message per peer | +| `LargeMessageFragment` header | ~44 bytes (32B msgID + indices + topic) | Per fragment | + +For a 512 KB message split into 8 fragments: +- Fragment header overhead: 8 × 44 = 352 bytes (0.07% of payload) +- PREAMBLE overhead: ~40 bytes per peer (negligible) +- IMRECEIVING overhead: ~34 bytes per peer (negligible) + +Total protocol overhead: < 0.1% of payload size. + +--- + +## 7. Interaction with Existing Protocol Mechanisms + +### 7.1 Interaction Matrix + +| Existing Mechanism | Interaction with v1.4 | Notes | +|--------------------|----------------------|-------| +| **IHAVE/IWANT** (v1.0) | Compatible. IHAVE/IWANT operate on full message IDs, not fragments. A peer that receives a PREAMBLE MAY delay IWANT responses. | | +| **Peer Scoring** (v1.1) | Extended. P₄ (Invalid Messages) applies to reassembly failures. P₇ (Behavioural Penalty) applies to PREAMBLE/IMRECEIVING abuse. | | +| **IDONTWANT** (v1.2) | Synergistic. Staggering is specifically designed to give IDONTWANT time to propagate. IMRECEIVING supplements IDONTWANT during the reception window. | | +| **Extensions** (v1.3) | Used for capability advertisement via `largeMessageHandling` in `ControlExtensions`. | | +| **Message Cache** (v1.0) | Unchanged. Only fully reassembled messages enter `mcache`. | | +| **Heartbeat** (v1.0) | Extended. Heartbeat now also prunes expired `receiving_messages` and fragment buffers. | | + +### 7.2 Backwards Compatibility Matrix + +| Sender | Receiver | Behavior | +|--------|----------|----------| +| v1.4 | v1.4 | Full pipeline: PREAMBLE → IMRECEIVING → staggered fragments | +| v1.4 | v1.0–v1.3 | v1.4 sender waits for full reassembly, sends complete message | +| v1.0–v1.3 | v1.4 | v1.4 receiver operates normally (no fragmentation) | +| v1.0–v1.3 | v1.0–v1.3 | No change (existing behavior) | + +--- + +## 8. Security Analysis + +### 8.1 Threat Model + +| Threat | Attack Vector | Mitigation | +|--------|---------------|------------| +| **Fragment flooding** | Send many fragments for non-existent messages | `max_pending_fragments` (16 per peer), global memory limit, P₄ penalty on reassembly failure | +| **PREAMBLE spam** | Send PREAMBLEs without follow-up data | `fragment_timeout` (30s) + P₇ behavioral penalty + rate limiting | +| **IMRECEIVING abuse** | Falsely claim to be receiving messages | Advisory nature limits impact; entries pruned during heartbeat; fallback via IHAVE/IWANT gossip | +| **Fragment injection** | Inject malicious fragments into legitimate reassembly | Fragments are keyed by `(sender, messageID)` — an attacker would need to predict the message ID and impersonate the sender | +| **Memory exhaustion** | Exhaust reassembly buffer memory across many peers | Per-peer limits + global memory cap + timeout-based cleanup | + +### 8.2 Resource Consumption Bounds + +For an implementation with default parameters: + +``` +Per-peer reassembly memory (worst case): + max_pending_fragments × fragmentation_threshold = 16 × 64 KiB = 1 MB + +Per-peer with maximum message size (1 MB messages): + 16 × 1 MB = 16 MB + +Global limit recommendation: + Total peers × per-peer limit, capped at a configurable maximum + (e.g., 256 MB for a node with 50 mesh peers) +``` + +--- + +## 9. Performance Expectations + +### 9.1 Expected Improvements (from Shadow Simulations) + +| Metric | Baseline (v1.2) | With v1.4 | Improvement | +|--------|-----------------|-----------|-------------| +| Bandwidth per 1 MB message (D=8) | ~8 MB/peer outbound | ~3.1 MB/peer outbound | **~61% reduction** | +| Dissemination latency (1 MB, 10K nodes) | ~12 seconds | ~7.8 seconds | **~35% reduction** | +| Redundant message copies per peer | ~4.2 | ~1.3 | **~69% reduction** | + +### 9.2 Parameter Sensitivity + +| Parameter | Low Value Effect | High Value Effect | Sweet Spot | +|-----------|-----------------|-------------------|------------| +| `stagger_interval` | IDONTWANT can't propagate in time | Total relay time increases | 150–250 ms | +| `fragment_size` | High header overhead | Reduced pipeline benefit | 32–128 KiB | +| `fragment_timeout` | Premature discard of slow transfers | Prolonged memory usage | 15–60 seconds | + +--- + +## 10. Implementation Guidance + +### 10.1 Recommended Implementation Order + +For implementers adding v1.4 support to an existing gossipsub implementation: + +1. **IDONTWANT integration check** — Ensure v1.2 IDONTWANT is fully + implemented and the `dont_send_message_ids` infrastructure exists. +2. **PREAMBLE + IMRECEIVING** — Easiest to implement; immediate bandwidth + benefit with minimal code changes. +3. **Message Staggering** — Modify the forwarding loop to be sequential with + delays. Requires async/timer infrastructure. +4. **Message Fragmentation** — Most complex; requires reassembly buffers, + timeout management, and fragment forwarding logic. + +### 10.2 Testing Recommendations + +- **Unit tests**: Fragment/reassemble round-trip, out-of-order delivery, + timeout cleanup, PREAMBLE→IMRECEIVING flow. +- **Integration tests**: Mixed v1.4/v1.2 mesh, large message delivery + confirmation, scoring penalty verification. +- **Simulation**: Shadow simulator with 1,000+ nodes, varying message sizes + (64 KiB – 2 MB), varying mesh degrees. + +--- + +## 11. References + +- [1] M. U. Farooq, T. Cizain, D. Kaiser. "Staggering and Fragmentation for + Improved Large Message Handling in libp2p GossipSub." arXiv:2504.10365, 2025. +- [2] M. U. Farooq, D. Kaiser. "PREAMBLE and IMRECEIVING for Improved Large + Message Handling in libp2p GossipSub." arXiv:2505.17337, 2025. +- [3] vacp2p/nim-libp2p — https://github.com/vacp2p/nim-libp2p +- [4] libp2p/py-libp2p — https://github.com/libp2p/py-libp2p +- [5] Vac Research Blog — https://vac.dev/rlog/gossipsub-stagger-idontwant/ +- [6] gossipsub v1.2 (IDONTWANT) — + https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.2.md +- [7] gossipsub v1.3 (Extensions) — + https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.3.md +- [8] dst-gossipsub-test-node (Shadow testing) — + https://github.com/vacp2p/dst-gossipsub-test-node diff --git a/pubsub/gossipsub/extensions/extensions.proto b/pubsub/gossipsub/extensions/extensions.proto index 04f539d58..981775c8a 100644 --- a/pubsub/gossipsub/extensions/extensions.proto +++ b/pubsub/gossipsub/extensions/extensions.proto @@ -3,6 +3,10 @@ syntax = "proto2"; message ControlExtensions { optional bool partialMessages = 10; + // gossipsub v1.4: Large Message Handling extension. + // See gossipsub-v1.4.md for the specification. + optional bool largeMessageHandling = 11; + // Experimental extensions must use field numbers larger than 0x200000 to be // encoded with at least 4 bytes @@ -17,6 +21,12 @@ message ControlMessage { repeated ControlPrune prune = 4; repeated ControlIDontWant idontwant = 5; optional ControlExtensions extensions = 6; + + // gossipsub v1.4: Large Message Handling control messages. + // PREAMBLE announces a large message before transmission begins. + repeated ControlPreamble preamble = 7; + // IMRECEIVING signals that a large message is currently being received. + repeated ControlImReceiving imreceiving = 8; } message RPC { @@ -41,6 +51,10 @@ message RPC { // Canonical Extensions should register their messages here. optional PartialMessagesExtension partial = 10; + // gossipsub v1.4: Large message fragments. + // See gossipsub-v1.4.md for the specification. + repeated LargeMessageFragment largeMessageFragments = 12; + // Experimental Extensions should register their messages here. They // must use field numbers larger than 0x200000 to be encoded with at least 4 // bytes @@ -59,3 +73,30 @@ message PartialMessagesExtension { // An encoded representation of the parts a peer has and wants. optional bytes partsMetadata = 4; } + +// ===== gossipsub v1.4: Large Message Handling ===== +// See gossipsub-v1.4.md for the full specification. + +// PREAMBLE is sent before transmitting a large message to announce its +// messageID and size, allowing receivers to prepare and coordinate. +message ControlPreamble { + optional bytes messageID = 1; // ID of the message about to be transmitted + optional uint64 messageSize = 2; // Total size in bytes of the full message + optional string topicID = 3; // Topic the message belongs to +} + +// IMRECEIVING signals that the sender is currently in the process of receiving +// a large message, allowing neighbors to suppress redundant sends. +message ControlImReceiving { + optional bytes messageID = 1; // ID of the message currently being received +} + +// LargeMessageFragment carries a single fragment of a large message that has +// been split for pipeline-parallel relay across the mesh. +message LargeMessageFragment { + optional bytes messageID = 1; // Full message ID this fragment belongs to + optional uint32 fragmentIndex = 2; // 0-based index of this fragment + optional uint32 totalFragments = 3;// Total number of fragments + optional bytes fragmentData = 4; // The fragment payload + optional string topicID = 5; // Topic the original message belongs to +} diff --git a/pubsub/gossipsub/gossipsub-v1.4.md b/pubsub/gossipsub/gossipsub-v1.4.md new file mode 100644 index 000000000..0698eb14b --- /dev/null +++ b/pubsub/gossipsub/gossipsub-v1.4.md @@ -0,0 +1,399 @@ +# gossipsub v1.4: Large Message Propagation + +| Lifecycle Stage | Maturity | Status | Latest Revision | +| --------------- | ------------- | ------ | --------------- | +| 1A | Working Draft | Active | r0, 2026-05-11 | + +Authors: [@NomzzNJS] + +Interest Group: TBD + +[@NomzzNJS]: https://github.com/NomzzNJS + +See the [lifecycle document][lifecycle-spec] for context about the maturity level +and spec status. + +[lifecycle-spec]: https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md + +--- + +## Overview + +This document specifies extensions to [gossipsub v1.2](gossipsub-v1.2.md) and +the [v1.3 Extensions framework](gossipsub-v1.3.md) to support efficient +propagation of large messages in gossipsub mesh networks. + +Current gossipsub implementations are optimized for relatively small messages. +However, the store-and-forward nature of the protocol introduces compounding +latency and excessive bandwidth usage when messages grow large (e.g. >256 KiB). +At each hop, a peer must fully receive a message before relaying it, and the +`IDONTWANT` control message (v1.2) can only be sent *after* full reception, +leaving a window for redundant duplicate transmissions. + +This specification introduces four complementary mechanisms to address these +problems: + +1. **Message Fragmentation** — splitting large messages into smaller fragments + that can be relayed independently, enabling pipeline-parallel propagation. +2. **Message Staggering** — forwarding messages to mesh peers sequentially + rather than simultaneously, giving `IDONTWANT` messages time to propagate. +3. **`PREAMBLE` control message** — announcing a message ID and size before + transmission begins, enabling receivers to prepare and coordinate. +4. **`IMRECEIVING` control message** — signaling that a message is currently + being received, allowing neighbors to suppress redundant sends immediately + (without waiting for full reception). + +These extensions are backwards-compatible. Peers that do not support v1.4 will +continue to function normally, receiving full messages from v1.4 peers. + +## Motivation + +Emerging decentralized systems increasingly require reliable dissemination of +large payloads, including: + +- Large Ethereum blocks and blobs (EIP-4844 and beyond) +- Distributed AI model updates and gradient aggregations +- Large event logs and telemetry streams +- State snapshots in decentralized coordination systems +- Agent communication payloads in multi-agent systems + +Research conducted using the Shadow network simulator demonstrates that the +combination of fragmentation, staggering, and early-notification control +messages reduces bandwidth utilization by up to 61% and message dissemination +time by up to 35% for large messages [1][2]. + +### Relationship to Existing Specifications + +- **v1.2 (`IDONTWANT`)**: v1.4 builds on `IDONTWANT` — staggering is designed + specifically to give `IDONTWANT` messages more time to propagate before + redundant sends occur. +- **v1.3 (Extensions)**: v1.4 capabilities are advertised using the v1.3 + Extensions Control Message framework. +- **Partial Messages Extension**: The Partial Messages extension addresses an + *application-layer* concern (transmitting only missing parts of structured + data). v1.4 fragmentation addresses a *transport-layer* concern (breaking + monolithic messages for efficient relay). They are complementary. + +## Protocol ID + +Nodes that support this extension SHOULD advertise the version number `1.4.0`. +Gossipsub nodes can advertise their own protocol-id prefix; by default this is +`meshsub`, giving the default protocol id: + +- `/meshsub/1.4.0` + +## Parameters + +| Parameter | Description | Reasonable Default | +| -------------------- | ---------------------------------------------------------------------------------- | ------------------ | +| `fragment_size` | Maximum size of each message fragment in bytes | 65536 (64 KiB) | +| `fragmentation_threshold` | Minimum message size before fragmentation is applied | 65536 (64 KiB) | +| `stagger_interval` | Delay between forwarding a message to successive mesh peers | 200 ms | +| `stagger_threshold` | Minimum message size before staggering is applied | 65536 (64 KiB) | +| `preamble_threshold` | Minimum message size before a `PREAMBLE` is sent | 65536 (64 KiB) | +| `fragment_timeout` | Maximum time to wait for all fragments of a message before discarding | 30 seconds | +| `max_pending_fragments` | Maximum number of messages being reassembled concurrently per peer | 16 | + +## Message Fragmentation + +### Overview + +When a message exceeds `fragmentation_threshold` bytes, the sender MUST split it +into fragments of at most `fragment_size` bytes before transmission. Fragments +are transmitted as `LargeMessageFragment` messages within the RPC. + +Fragmentation enables **pipeline-parallel relay**: an intermediate peer can begin +forwarding early fragments to its own mesh peers before it has received the +complete message. This eliminates the multiplicative store-and-forward delay +that accumulates at each hop in the mesh. + +### Fragmentation Algorithm + +Given a message `M` of size `S`: + +``` +if S <= fragmentation_threshold: + transmit M as a normal gossipsub message + return + +num_fragments = ceil(S / fragment_size) +message_id = compute_message_id(M) + +for i in 0..num_fragments: + fragment = M[i * fragment_size : min((i+1) * fragment_size, S)] + send LargeMessageFragment { + messageID: message_id, + fragmentIndex: i, + totalFragments: num_fragments, + fragmentData: fragment, + topicID: M.topic + } +``` + +### Reassembly + +Upon receiving fragments, a peer: + +1. Allocates a reassembly buffer keyed by `(sender, messageID)`. +2. Stores each fragment by its `fragmentIndex`. +3. When all `totalFragments` fragments have been received, reconstructs the + original message and delivers it to the application validator. +4. If `fragment_timeout` elapses before all fragments arrive, the reassembly + buffer MUST be discarded. + +Peers MUST limit the number of concurrent reassembly buffers per peer to +`max_pending_fragments` to prevent resource exhaustion. + +### Fragment Forwarding + +A peer that supports fragmentation SHOULD forward individual fragments to its +mesh peers as they arrive, without waiting for full reassembly. This is the key +mechanism that reduces store-and-forward latency. + +When forwarding fragments to a peer that does not support v1.4, the forwarding +peer MUST wait for full reassembly and send the complete message. + +### Interaction with Message Validation + +Application-level validation can only occur after full reassembly. Prior to +validation, forwarded fragments are considered *tentatively valid*. If the +reassembled message fails validation, the peer SHOULD apply scoring penalties +to the original sender as defined in v1.1. + +### Interaction with Message Cache + +Fragments do not replace the message cache (`mcache`). The fully reassembled +message is placed in the `mcache` after successful reassembly. Fragment-level +caching is an implementation detail outside the scope of this specification. + +## Message Staggering + +### Overview + +Instead of forwarding a message (or its fragments) to all mesh peers +simultaneously, a staggering peer forwards to one peer at a time, with a +`stagger_interval` delay between each. + +This provides time for `IDONTWANT` messages from earlier recipients to propagate +back before later recipients receive their copy, significantly reducing +redundant transmissions. + +### Algorithm + +``` +if message_size < stagger_threshold: + forward to all mesh peers simultaneously (standard behavior) + return + +peers = mesh[topic] +sort peers by some heuristic (e.g., score, latency, random) + +for peer in peers: + if messageID not in peer.dont_send_message_ids: + forward message (or fragments) to peer + wait stagger_interval +``` + +### Interaction with IDONTWANT + +Staggering amplifies the effectiveness of `IDONTWANT` (v1.2). When a peer +receives a message from the first staggered send, it immediately broadcasts +`IDONTWANT` to its mesh peers. Because subsequent staggered sends are delayed, +there is a high probability that the `IDONTWANT` arrives before the next +staggered send, preventing the redundant transmission entirely. + +Without staggering, simultaneous sends mean that `IDONTWANT` messages arrive +too late to prevent any duplicates. + +## PREAMBLE Control Message + +### Overview + +The `PREAMBLE` is a lightweight control message sent by a peer *before* it begins +transmitting a large message (or its fragments). It announces the `messageID`, +the total message size, and the topic. + +### Sender Behavior + +When a peer is about to relay a message with size exceeding `preamble_threshold`: + +1. Send a `ControlPreamble { messageID, messageSize, topicID }` to each mesh + peer in the topic. +2. Proceed with message/fragment transmission (potentially staggered). + +The `PREAMBLE` SHOULD be sent immediately, even when staggering is in effect. + +### Receiver Behavior + +Upon receiving a `PREAMBLE`, a peer: + +1. Records that a message with the given `messageID` is incoming. +2. MAY use the `messageSize` to pre-allocate buffers. +3. MAY delay responding to `IHAVE` messages for this `messageID`, as the full + message is expected to arrive shortly. +4. If the peer already has the message, it MAY immediately respond with + `IDONTWANT` to suppress the transmission. + +## IMRECEIVING Control Message + +### Overview + +`IMRECEIVING` is sent by a peer that is *currently in the process* of receiving +a large message (i.e., it has received the `PREAMBLE` or the first fragment, +but not the complete message yet). It notifies mesh neighbors that they should +suppress sending this message. + +This fills the gap left by `IDONTWANT`, which can only be sent after full +reception: `IMRECEIVING` provides immediate suppression during the (potentially +long) reception window of a large message. + +### Sender Behavior + +When a peer begins receiving a large message (triggered by receiving a +`PREAMBLE` or the first `LargeMessageFragment`): + +1. Immediately send `ControlImReceiving { messageID }` to all mesh peers in + the topic. + +### Receiver Behavior + +Upon receiving `IMRECEIVING` from a peer: + +1. Add the `messageID` to the peer's `dont_send_message_ids` set (same set + used by `IDONTWANT`). +2. When later relaying this `messageID`, skip this peer. + +`IMRECEIVING` is advisory, like `IDONTWANT`. A sender MAY still transmit the +message after receiving `IMRECEIVING`; doing so MUST NOT be penalized. + +### Comparison with IDONTWANT + +| Property | IDONTWANT (v1.2) | IMRECEIVING (v1.4) | +| -------------------- | ------------------------ | ------------------------------------ | +| When sent | After full reception | At start of reception | +| Suppression window | Future sends only | Immediate (during reception) | +| Certainty | Definitive (has message) | Probabilistic (reception in progress)| +| Message size benefit | All sizes | Primarily large messages | + +## Router State Changes + +### New Per-Peer State + +In addition to the existing per-peer state, v1.4 peers maintain: + +- `receiving_messages`: a set of `messageID`s currently being received + (populated by `PREAMBLE` / first fragment, cleared on reassembly completion + or timeout). +- `fragment_buffers`: a map of `messageID → fragment[]` for reassembly. + +### Modified Message Processing + +When processing incoming messages, the router additionally: + +1. On receiving `ControlPreamble`: records the incoming message, optionally + sends `IDONTWANT` (if already have) or `IMRECEIVING` (if receiving from + another peer). +2. On receiving `ControlImReceiving`: adds `messageID` to the sender's + `dont_send_message_ids`. +3. On receiving `LargeMessageFragment`: stores in fragment buffer, forwards + fragment to mesh peers (if forwarding fragments), and sends `IMRECEIVING` + if this is the first fragment for this message. + +## Heartbeat Changes + +During heartbeat processing: + +- Expired entries in `receiving_messages` (older than `fragment_timeout`) + SHOULD be cleaned up, and corresponding fragment buffers discarded. +- Expired entries in `dont_send_message_ids` populated by `IMRECEIVING` + SHOULD be pruned (same pruning strategy as `IDONTWANT` entries). + +## Scoring Implications + +- A peer that sends an excessive number of `PREAMBLE` messages without + following up with actual message data SHOULD be penalized through P₇ + (Behavioural Penalty) as defined in v1.1. +- Fragment flooding (sending many fragments for non-existent messages) SHOULD + trigger P₄ (Invalid Messages) penalties upon reassembly failure. +- A peer that consistently sends `IMRECEIVING` without subsequently delivering + or forwarding the message MAY be penalized through P₇. + +## Security Considerations + +### Fragment Flooding + +An attacker could send a large number of fragments for fake messages to exhaust +memory. The `max_pending_fragments` parameter limits per-peer reassembly state. +Implementations SHOULD also apply a global limit on total reassembly buffers. + +### PREAMBLE Abuse + +Sending `PREAMBLE` messages for messages that never arrive wastes receiver +resources (pre-allocated buffers). The `fragment_timeout` and behavioural +penalties (P₇) mitigate this. Implementations MAY rate-limit `PREAMBLE` +messages from a single peer. + +### IMRECEIVING Abuse + +A malicious peer could send `IMRECEIVING` for messages it never intends to +receive, causing neighbors to skip sending to it and degrading its mesh +participation. Since `IMRECEIVING` entries are pruned during heartbeat and +the message will still be available via `IHAVE`/`IWANT` gossip, the impact is +limited. + +### Fragment Reassembly Resource Exhaustion + +Implementations MUST bound the total memory used for fragment reassembly. +Recommended strategy: track total bytes across all reassembly buffers and +reject new fragments when a configurable limit is exceeded. + +## Backwards Compatibility + +All extensions in this specification are backwards-compatible with gossipsub +v1.0, v1.1, v1.2, and v1.3: + +- Peers that do not support v1.4 will ignore unknown control messages + (`PREAMBLE`, `IMRECEIVING`) and unknown RPC fields (`LargeMessageFragment`) + per standard protobuf behavior. +- A v1.4 peer MUST detect whether its mesh peers support v1.4 (via the v1.3 + Extensions Control Message). For peers that do not support v1.4, the peer + MUST send complete, unfragmented messages. +- Staggering is a local behavior change and does not affect interoperability. + +## Protobuf + +The protobuf messages are defined in the +[extensions.proto](./extensions/extensions.proto) file. The following new +messages are introduced: + +```protobuf +message ControlPreamble { + optional bytes messageID = 1; + optional uint64 messageSize = 2; + optional string topicID = 3; +} + +message ControlImReceiving { + optional bytes messageID = 1; +} + +message LargeMessageFragment { + optional bytes messageID = 1; + optional uint32 fragmentIndex = 2; + optional uint32 totalFragments = 3; + optional bytes fragmentData = 4; + optional string topicID = 5; +} +``` + +These are integrated into the existing `ControlMessage` and `RPC` messages. +See `extensions.proto` for the complete definition. + +## References + +- [1] M. U. Farooq, T. Cizain, D. Kaiser. "Staggering and Fragmentation for + Improved Large Message Handling in libp2p GossipSub." arXiv:2504.10365, 2025. +- [2] M. U. Farooq, D. Kaiser. "PREAMBLE and IMRECEIVING for Improved Large + Message Handling in libp2p GossipSub." arXiv:2505.17337, 2025. +- [3] gossipsub v1.2: IDONTWANT. https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.2.md +- [4] gossipsub v1.3: Extensions Control Message. https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.3.md diff --git a/pubsub/gossipsub/implementation-status.md b/pubsub/gossipsub/implementation-status.md index 01919efea..ed0bb7b63 100644 --- a/pubsub/gossipsub/implementation-status.md +++ b/pubsub/gossipsub/implementation-status.md @@ -15,13 +15,14 @@ Gossipsub versions and Extensions. ## Gossipsub Extensions -| | [Choke Extensions] | [Partial Messages] | -| ------------- | ------------------ | --------------------------------------------------------- | -| [Go libp2p] | Not Implemented | [PR](https://github.com/libp2p/go-libp2p-pubsub/pull/631) | -| [Rust libp2p] | Not Implemented | Not Implemented | -| [JS libp2p] | Not Implemented | Not Implemented | -| [Nim libp2p] | Not Implemented | Not Implemented | -| [Java libp2p] | Not Implemented | Not Implemented | +| | [Choke Extensions] | [Partial Messages] | [Large Message Handling] | +| ------------- | ------------------ | --------------------------------------------------------- | ------------------------ | +| [Go libp2p] | Not Implemented | [PR](https://github.com/libp2p/go-libp2p-pubsub/pull/631) | Not Implemented | +| [Rust libp2p] | Not Implemented | Not Implemented | Not Implemented | +| [JS libp2p] | Not Implemented | Not Implemented | Not Implemented | +| [Nim libp2p] | Not Implemented | Not Implemented | Prototype | +| [Java libp2p] | Not Implemented | Not Implemented | Not Implemented | +| [Py libp2p] | Not Implemented | Not Implemented | Prototype | ## Gossipsub Implementation Improvements @@ -42,6 +43,9 @@ Gossipsub versions and Extensions. [1.3-alpha]: https://github.com/libp2p/specs/issues/687 [Choke Extensions]: https://github.com/libp2p/specs/pull/681 [Partial Messages]: https://github.com/libp2p/specs/pull/685 +[Large Message Handling]: https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.4.md +[Py libp2p]: https://github.com/libp2p/py-libp2p/tree/master/libp2p/pubsub [Batch Publishing]: https://ethresear.ch/t/improving-das-performance-with-gossipsub-batch-publishing/21713 [IDONTWANT on first Publish]: https://github.com/libp2p/go-libp2p-pubsub/issues/610 [WFR Gossip]: https://ethresear.ch/t/the-paths-of-least-resistance-introducing-wfr-gossip/22671/3 +