Permissionless multi-operator model training on Tangle using DeMo (Decoupled Momentum Optimization) for 10,000x communication reduction.
This Blueprint enables distributed model training across permissionless operators. Operators join and leave freely at epoch boundaries, the protocol coordinates work via the DeMo algorithm, and the Tangle chain enforces payment and slashing.
DeMo (from Nous Research) reduces inter-operator communication by 10,000x compared to naive gradient synchronization:
- Each operator trains locally with AdamW on its data shard
- Every K steps, operators synchronize by applying DCT to momentum buffers
- Top-k sparsification keeps only 0.1% of DCT coefficients (~100KB vs ~10GB)
- Compressed momentum is broadcast via libp2p gossip and aggregated
- Inverse DCT reconstructs the shared momentum update
| Component | Description |
|---|---|
operator/src/demo.rs |
DeMo optimizer — DCT, top-k sparsification, aggregation |
operator/src/coordinator.rs |
Multi-operator coordination — shard assignment, join/leave, sync barriers |
operator/src/network.rs |
libp2p gossip networking for momentum sync and checkpoint transfer |
operator/src/training.rs |
Training backend interface (axolotl, unsloth, torchtune) |
operator/src/checkpoint.rs |
Checkpoint save/load/hash for on-chain submission |
operator/src/verification.rs |
TOPLOC-inspired state transition proofs and gradient norm verification |
operator/src/server.rs |
HTTP API for job management |
operator/src/qos.rs |
Heartbeat with training-specific metrics |
contracts/ |
DistributedTrainingBSM Solidity contract |
| Method | Description |
|---|---|
| SFT | Supervised fine-tuning on instruction datasets |
| DPO | Direct Preference Optimization for alignment |
| GRPO | Group Relative Policy Optimization |
| Pretrain | Continued pre-training on domain corpora |
Operators run inside Trusted Execution Environments (Intel TDX / AMD SEV). The TeeLayer in the job router ensures all on-chain job results are attested, preventing operators from submitting fabricated training outputs.
# Build the operator
cargo build --release
# Configure (see operator config for all options)
export TRAIN_OP_TANGLE__RPC_URL=https://rpc.tangle.tools
export TRAIN_OP_TANGLE__OPERATOR_KEY=0x...
export TRAIN_OP_TRAINING__ENDPOINT=http://localhost:5000
export TRAIN_OP_GPU__GPU_COUNT=4
export TRAIN_OP_GPU__TOTAL_VRAM_MIB=327680
# Run
./target/release/training-operator Tangle Chain
|
DistributedTrainingBSM
/ | | \
Op A Op B Op C Op D (permissionless operators)
| | | |
shard0 shard1 shard2 shard3 (data parallelism)
| | | |
AdamW AdamW AdamW AdamW (local training)
\ | | /
\ | | /
DeMo Sync (every K steps)
- DCT transform momentum
- Top-k sparsify (0.1%)
- Gossip broadcast (~100KB)
- Aggregate + inverse DCT
- blueprint — Blueprint SDK
- vllm-inference-blueprint — Single-operator inference Blueprint
- gadget — Tangle operator node
- DeMo / DisTrO — Nous Research, 10,000x communication reduction
- OpenDiLoCo — Prime Intellect training
- INTELLECT-2 — Async GRPO with TOPLOC verification
MIT OR Apache-2.0