HIP-0023: Decentralized AI Compute Swarm Protocol
Abstract
This HIP specifies a protocol for distributing AI compute tasks -- inference, training, and embedding generation -- across a decentralized peer-to-peer network of heterogeneous GPU providers. The protocol uses libp2p for networking, model-aware scheduling for task placement, pipeline parallelism for large model distribution, and Proof of AI (PoAI) consensus for result verification. Economic settlement occurs via the $AI token through the Hamiltonian Market Maker (HIP-0008). The swarm aggregates idle GPUs globally -- university clusters, consumer gaming hardware, retired mining rigs, and enterprise surplus -- into a unified elastic compute layer that scales beyond any single cloud provider.
Motivation
The GPU Scarcity Problem
AI inference and training demand is growing faster than centralized GPU supply. As of 2025:
- NVIDIA H100 clusters have 6-12 month lead times from major cloud providers
- AWS, GCP, and Azure GPU instance availability is frequently constrained in peak regions
- Lambda Labs, CoreWeave, and similar GPU clouds face the same upstream supply bottleneck
- Spot/preemptible GPU pricing fluctuates 3-10x depending on demand cycles
- Small teams and researchers are priced out of frontier model training entirely
Meanwhile, millions of capable GPUs sit idle worldwide:
- University HPC clusters average 40-60% utilization outside academic cycles
- Consumer RTX 4090 / 5090 cards (24-32 GB VRAM) idle 90%+ of the time
- Former cryptocurrency mining farms hold thousands of GPUs with no profitable workload
- Enterprise data centers overprovision GPU capacity for peak loads
A decentralized swarm protocol bridges this gap by creating a permissionless marketplace where any GPU owner can contribute compute and earn $AI tokens, while any consumer can submit AI workloads at competitive prices set by open market dynamics.
Why Existing Solutions Fall Short
-
Akash Network: General-purpose container orchestration. No model-aware scheduling, no VRAM-based placement, no pipeline parallelism for large models. Deploying a 70B parameter model across multiple Akash containers requires manual sharding.
-
Render Network: Optimized for GPU rendering (3D, video). The scheduling model assumes independent, embarrassingly parallel frames -- not sequential transformer layers with inter-node communication.
-
Golem Network: Low-level WASM task distribution. No native understanding of AI model formats (safetensors, GGUF), no tensor-parallel or pipeline-parallel primitives, no AI-specific verification.
-
io.net: Aggregates GPU clusters but relies on centralized orchestration. No on-chain verification of compute correctness. Trust model depends on provider reputation alone.
-
Together AI / Petals: Closer to the right model (distributed inference), but centrally coordinated. Petals uses volunteer nodes with no economic incentive layer or Byzantine fault tolerance.
The Hanzo Swarm Protocol is purpose-built for AI: it understands model architectures, optimizes placement based on VRAM and interconnect bandwidth, distributes transformer layers across nodes via pipeline parallelism, and verifies results through PoAI consensus -- all settled economically through the $AI token and HMM.
Core Design Goals
- Permissionless participation: Any node with a supported GPU can join and earn
- Model-aware scheduling: Tasks are placed on nodes with sufficient VRAM, bandwidth, and model compatibility
- Pipeline parallelism: Large models are split across multiple nodes by transformer layer
- Verified computation: PoAI ensures providers actually performed the work correctly
- Fair economics: $AI token payments via HMM with slashing for dishonest providers
- Fault tolerance: Automatic failover, redundant computation, Byzantine resistance
- Multi-GPU support: NVIDIA (CUDA), AMD (ROCm), Apple Silicon (Metal)
Design Philosophy
Why Decentralized Compute
Centralized GPU clouds (AWS, GCP, Lambda Labs) operate on a capacity-planning model: they purchase hardware based on demand projections, charge margins sufficient to cover capital expenditure, and allocate resources through reservation systems. This model has three structural problems:
-
Capacity ceilings: A single provider can only deploy as many GPUs as they can purchase and house. During supply crunches (e.g., the H100 shortage of 2024-2025), no amount of money buys more capacity.
-
Geographic concentration: Major GPU clouds concentrate in a handful of regions (us-east-1, us-west-2, europe-west1). Latency-sensitive inference from other geographies suffers.
-
Pricing inefficiency: Fixed hourly rates do not reflect real-time supply and demand. A GPU idle at 3 AM costs the same as one at peak load.
A decentralized swarm inverts all three constraints. Capacity scales with the global population of idle GPUs -- there is no single procurement bottleneck. Geographic distribution is inherent: nodes exist wherever GPU owners are. And pricing is set by open market dynamics through HMM, naturally reflecting real-time supply and demand.
Why libp2p Networking
libp2p is the peer-to-peer networking stack used by IPFS, Filecoin, Ethereum's consensus layer, and Polkadot. It provides:
- Kademlia DHT for decentralized node discovery without a central registry
- NAT traversal (hole punching, relay circuits) so nodes behind home routers can participate
- Multiplexed streams over a single connection for parallel task communication
- Noise protocol encryption for all peer-to-peer traffic
- PeerID cryptographic identity tied to Ed25519 keypairs
- GossipSub for efficient pub/sub message propagation across the swarm
We do not need to build networking primitives. libp2p has been battle-tested at scale by networks with millions of nodes. The Rust implementation (rust-libp2p) is mature and integrates directly with Tokio async runtime.
Why Proof of AI (PoAI) Consensus
Traditional blockchain consensus mechanisms validate that nodes followed protocol rules (PoW: found a hash, PoS: attested to a block). They do not verify that useful computation occurred. PoAI extends consensus to validate AI inference and training results:
- A compute provider executes an AI task and submits the result with a cryptographic commitment
- A randomly selected subset of verifier nodes re-executes the same task independently
- Results are compared: if the provider's output matches verifier consensus (within floating-point tolerance for non-deterministic operations), the result is accepted
- Providers who submit incorrect results are slashed ($AI stake forfeited); honest providers are rewarded
This creates a trustless compute layer where consumers do not need to trust individual providers -- the protocol guarantees correctness through economic incentives and redundant verification.
For deterministic operations (embeddings, quantized inference with fixed seeds), verification is exact hash comparison. For non-deterministic operations (sampling-based generation), verification uses semantic similarity thresholds and statistical consistency checks.
How It Connects to HMM (HIP-0008)
The Hamiltonian Market Maker (HMM) provides the economic settlement layer for the swarm:
- Compute resource pools: HMM maintains liquidity pools for GPU-hours, priced in $AI
- Dynamic pricing: Supply (available GPUs) and demand (queued tasks) set real-time prices via the automated market maker curve
- Instant settlement: Task completion triggers automatic $AI transfer from consumer to provider
- Quality tiers: Different pools for different GPU classes (H100, A100, RTX 4090, etc.) with distinct pricing curves
- Slashing integration: PoAI verification failures trigger automatic stake slashing through HMM's penalty mechanism
The swarm protocol handles compute orchestration; HMM handles economics. They are complementary layers.
Specification
Node Types
The network consists of three node roles. A single physical node may serve multiple roles simultaneously.
Compute Provider
Contributes GPU resources to the swarm. Requirements:
| Field | Minimum | Recommended |
|---|---|---|
| GPU VRAM | 8 GB | 24+ GB |
| System RAM | 16 GB | 64+ GB |
| Storage | 100 GB SSD | 1 TB NVMe |
| Bandwidth | 100 Mbps | 1 Gbps |
| $AI Stake | 1,000 $AI | 10,000 $AI |
Supported GPU backends:
- NVIDIA: CUDA 11.8+ (Ampere, Ada Lovelace, Hopper, Blackwell)
- AMD: ROCm 5.7+ (RDNA 3, CDNA 2/3)
- Apple Silicon: Metal 3 (M1 Pro/Max/Ultra, M2, M3, M4)
Validator
Verifies compute results via PoAI. Validators re-execute a random subset of tasks and compare results. Requirements:
- Must run at least one supported GPU for re-computation
- Minimum 5,000 $AI stake (higher than providers to prevent Sybil attacks on verification)
- Reputation score >= 0.8 (earned through consistent honest validation)
Validators are selected per-task via verifiable random function (VRF) seeded by the task hash and current block, preventing providers from predicting which validator will check their work.
Coordinator
Manages task scheduling and piece distribution. In Phase 1, coordinators are semi-centralized (operated by Hanzo). In Phase 2, coordinator logic moves on-chain to the Hanzo L1 (HIP-0024), making scheduling fully decentralized.
Coordinator responsibilities:
- Accept task submissions from consumers
- Decompose tasks into pieces
- Match pieces to providers based on capability and reputation
- Track piece state and trigger verification
- Aggregate verified results and return to consumers
Networking Layer
All nodes communicate via libp2p with the following protocol stack:
Application: /hanzo/swarm/1.0.0 (custom protocol)
PubSub: GossipSub v1.1 (task announcements, heartbeats)
Discovery: Kademlia DHT (peer discovery, provider capability ads)
Transport: QUIC (primary), TCP+Noise (fallback)
Identity: Ed25519 PeerID (linked to $AI wallet address via DID)
Protocol Messages:
| Message | Direction | Description |
|---|---|---|
Announce | Provider -> DHT | Advertise GPU capabilities and available capacity |
TaskSubmit | Consumer -> Coordinator | Submit a compute task |
PieceAssign | Coordinator -> Provider | Assign a piece to a provider |
PieceResult | Provider -> Coordinator | Return computed result with proof |
VerifyRequest | Coordinator -> Validator | Request PoAI verification |
VerifyResult | Validator -> Coordinator | Verification outcome |
Heartbeat | All -> GossipSub | Liveness signal (every 30s) |
TaskComplete | Coordinator -> Consumer | Aggregated verified results |
Task Model
pub struct ComputeTask {
pub id: TaskId, // Blake3 hash of (submitter, nonce, timestamp)
pub task_type: TaskType,
pub priority: u32, // 0 (lowest) - 1000 (highest)
pub deadline: Option<u64>, // Unix timestamp, optional
pub budget: u64, // Max $AI willing to spend
pub redundancy: usize, // Verification redundancy (default: 3)
pub submitter: WalletAddress,
pub created_at: u64,
}
pub enum TaskType {
/// LLM text generation
Inference {
model: ModelSpec,
prompt: Vec<Message>,
params: SamplingParams,
},
/// Vector embedding generation
Embedding {
model: ModelSpec,
texts: Vec<String>,
dimensions: Option<usize>,
},
/// Model fine-tuning (LoRA or full)
Training {
base_model: ModelSpec,
dataset: DatasetRef, // IPFS CID or swarm storage hash
method: TrainingMethod, // LoRA, QLoRA, Full
hyperparams: TrainingHyperparams,
},
/// Batch inference (multiple prompts)
Batch {
model: ModelSpec,
requests: Vec<InferenceRequest>,
},
}
pub struct ModelSpec {
pub name: String, // e.g., "zen-72b"
pub format: ModelFormat, // Safetensors, GGUF, ONNX
pub size_bytes: u64,
pub vram_required_mb: u32, // Minimum VRAM for single-GPU
pub quantization: Option<Quantization>, // Q4_K_M, Q8_0, FP16, etc.
pub hash: String, // Blake3 hash of model weights
}
pub enum ModelFormat {
Safetensors, // HuggingFace standard
GGUF, // llama.cpp / whisper.cpp
ONNX, // Cross-platform inference
}
Piece Decomposition
Tasks are decomposed into independently schedulable pieces. The decomposition strategy depends on task type:
Inference tasks: If the model fits on a single node, the task is a single piece. If the model requires pipeline parallelism, it is split into N pieces corresponding to N pipeline stages (groups of transformer layers).
Embedding tasks: Each batch of texts becomes a separate piece. Pieces are embarrassingly parallel.
Training tasks: Data-parallel decomposition. Each piece processes a shard of the training dataset on a separate node. Gradient aggregation occurs at the coordinator.
Batch inference: Each request (or group of requests) becomes a separate piece.
pub struct Piece {
pub task_id: TaskId,
pub index: usize,
pub state: PieceState,
pub input: PieceInput, // Serialized input data
pub input_hash: String, // Blake3 hash for verification
pub assigned_providers: Vec<PeerId>,
pub results: HashMap<PeerId, PieceResult>,
pub verified_result: Option<Vec<u8>>,
pub pipeline_stage: Option<PipelineStage>, // For pipeline parallelism
pub redundancy: usize,
pub deadline: Option<u64>,
pub priority: u32,
pub retry_count: usize,
pub max_retries: usize, // Default: 3
}
pub enum PieceState {
Pending, // Awaiting provider assignment
Assigned, // Provider(s) assigned, not yet started
InProgress, // Active computation underway
Computed, // Result(s) received, awaiting verification
Verified, // PoAI consensus reached
Failed, // Exhausted retries
}
Model-Aware Scheduling
The scheduler places pieces on providers based on hardware capabilities, not just availability. This is the critical differentiation from general-purpose compute networks.
Capability Matching:
pub struct ProviderCapabilities {
pub peer_id: PeerId,
pub gpus: Vec<GpuInfo>,
pub total_vram_mb: u32,
pub system_ram_mb: u64,
pub storage_available_gb: u32,
pub bandwidth_mbps: u32,
pub cached_models: Vec<ModelHash>, // Models already loaded in VRAM/disk
pub supported_formats: Vec<ModelFormat>,
pub compute_backend: ComputeBackend, // CUDA, ROCm, Metal
pub max_concurrent_pieces: usize,
pub current_load: f64, // 0.0 - 1.0
}
pub struct GpuInfo {
pub name: String, // e.g., "NVIDIA RTX 4090"
pub vram_mb: u32, // e.g., 24576
pub compute_capability: String, // e.g., "8.9"
pub backend: ComputeBackend,
}
Scheduling Algorithm:
The scheduler scores each candidate provider for a given piece:
score = model_fit_score * 0.35
+ cache_bonus * 0.25
+ reputation * 0.20
+ latency_score * 0.10
+ load_score * 0.10
Where:
model_fit_score: 1.0 if model fits in provider's VRAM, 0.0 otherwise (hard constraint)cache_bonus: 1.0 if model is already cached on provider, 0.0 otherwise (avoids transfer time)reputation: Provider's rolling reputation score (0.0 - 1.0)latency_score: Inverse of network latency between coordinator and providerload_score: Inverse of current provider load (prefer idle nodes)
Providers that do not meet the hard VRAM constraint are excluded entirely. Among qualifying providers, the highest-scoring node is selected.
Pipeline Parallelism
For models too large to fit on a single node's VRAM, the swarm distributes transformer layers across multiple nodes in a pipeline:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Node A │ │ Node B │ │ Node C │ │ Node D │
│ Layers 0-15 │───>│ Layers 16-31 │───>│ Layers 32-47 │───>│ Layers 48-63 │
│ (16 GB VRAM) │ │ (16 GB VRAM) │ │ (16 GB VRAM) │ │ (16 GB VRAM) │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
Stage 0 Stage 1 Stage 2 Stage 3
Pipeline Protocol:
- Coordinator determines the number of pipeline stages based on model size and available provider VRAM
- Model layers are assigned to stages. Each stage is a piece.
- Coordinator selects providers for each stage, preferring providers with low mutual latency
- Inference flows sequentially: Stage 0 processes input, sends hidden states to Stage 1, etc.
- Inter-stage communication uses direct libp2p streams between providers (not routed through coordinator)
- Micro-batching: multiple requests are pipelined to keep all stages busy
pub struct PipelineStage {
pub stage_index: usize,
pub total_stages: usize,
pub layer_range: (usize, usize), // Start and end layer indices
pub upstream_peer: Option<PeerId>, // Previous stage provider
pub downstream_peer: Option<PeerId>, // Next stage provider
pub activation_size_bytes: u64, // Size of inter-stage tensor transfer
}
Latency Considerations:
Pipeline parallelism introduces inter-node communication overhead. The protocol optimizes for this:
- Providers in the same geographic region are preferred for pipeline stages
- Activation tensors are compressed (FP16 or quantized) before transfer
- Micro-batch size is tuned to amortize communication latency
- Minimum bandwidth requirement for pipeline nodes: 1 Gbps
Proof of AI (PoAI) Verification
PoAI is the consensus mechanism that ensures compute results are correct without trusting individual providers. It builds on the AI Mining Protocol concepts from HIP-0006 and adapts them for swarm verification.
Verification Flow:
1. Provider submits result R with commitment C = Blake3(R)
2. Coordinator selects K verifiers via VRF (default K = 3)
3. Each verifier independently re-computes the task
4. Verifiers submit their results V_i with commitments
5. Coordinator reveals all commitments simultaneously (commit-reveal scheme)
6. Consensus check:
- Deterministic tasks: all hashes must match exactly
- Non-deterministic tasks: semantic similarity >= threshold (default 0.95)
7. If consensus reached:
- Provider rewarded with $AI from task budget
- Verifiers rewarded with verification fee (5% of task cost)
8. If consensus fails:
- Provider slashed (10% of stake)
- Minority verifiers slashed (potential collusion)
- Task re-assigned to new provider
Verification Methods:
| Method | Use Case | Threshold | Cost |
|---|---|---|---|
ExactHash | Embeddings, quantized inference (fixed seed) | 100% match | Low |
FloatingPointTolerance | FP16/FP32 inference (hardware differences) | L2 distance < epsilon | Low |
SemanticSimilarity | Text generation (sampling-based) | Cosine similarity >= 0.95 | Medium |
StatisticalConsistency | Training loss curves | Kolmogorov-Smirnov test p > 0.05 | Medium |
TEEAttestation | Sensitive/private inference | SGX/TDX attestation | High |
Supermajority | General-purpose fallback | >= 67% agreement | Medium |
Non-Determinism Handling:
LLM text generation with temperature > 0 is inherently non-deterministic. The protocol handles this:
- For verification purposes, tasks include a
verification_seedthat forces deterministic sampling during the verification pass - The consumer receives the original (non-deterministic) result; the verification pass uses the seed only to confirm the provider ran the correct model with the correct input
- Verifiers check that the output is a plausible generation from the specified model, not that it is character-identical
Payment and Settlement
All payments flow through the $AI token and HMM (HIP-0008):
Task Budget Flow:
Consumer deposits $AI into escrow ──> Task executes
├── 90% to Compute Provider (on verified completion)
├── 5% to Verifiers (split among K verifiers)
├── 3% to Coordinator (scheduling fee)
└── 2% to Protocol Treasury (network maintenance)
Slashing Flow:
Provider stake (locked $AI)
├── Correct computation: stake returned + reward
└── Failed verification: 10% of stake burned, remainder returned
Pricing via HMM:
The cost of a compute task is determined by the HMM liquidity pools:
- Each GPU class has a pool (e.g., $AI/H100-hour, $AI/RTX4090-hour)
- Pool reserves set the instantaneous price via the constant-product formula
- High demand (many queued tasks) increases price; high supply (many idle GPUs) decreases price
- Consumers can set a
max_budgetand tasks queue until price falls within budget
Peer Reputation System
Every node maintains a reputation score that affects scheduling priority, verification selection, and staking requirements.
pub struct NodeReputation {
pub peer_id: PeerId,
pub score: f64, // 0.0 - 1.0
pub total_tasks: u64,
pub successful_tasks: u64,
pub failed_tasks: u64,
pub slashed_count: u32,
pub uptime_ratio: f64, // Rolling 30-day uptime
pub avg_latency_ms: u64,
pub joined_at: u64,
}
Reputation Update Rules:
| Event | Score Change |
|---|---|
| Verified computation (correct) | +0.01 (capped at 1.0) |
| Failed verification (incorrect) | -0.10 |
| Slashed (malicious) | -0.25 |
| Task timeout (no result) | -0.05 |
| Heartbeat missed | -0.02 |
| Consistent uptime (30 days) | +0.05 bonus |
Minimum Reputation Thresholds:
| Action | Minimum Score |
|---|---|
| Accept compute tasks | 0.3 |
| Accept high-priority tasks | 0.7 |
| Serve as validator | 0.8 |
| Serve as coordinator candidate | 0.9 |
New nodes start at 0.5 and must build reputation through honest participation.
Protocol State Machine
Task Lifecycle:
Submitted -> Decomposed -> Scheduling -> InProgress -> Verification -> Complete
│ │ │ │ │
└─ Rejected └─ Failed └─ NoNodes └─ Timeout └─ Slashed
(budget) (queue) (retry) (re-assign)
Piece State Transitions:
Pending ──assign──> Assigned ──start──> InProgress ──result──> Computed ──verify──> Verified
│ │ │ │
│ └──timeout──> Pending (retry) └──fail──> Pending (retry)
│ or Failed (max retries)
└──cancel──> Cancelled
Configuration
pub struct SwarmConfig {
/// libp2p identity keypair
pub keypair: Keypair,
/// Listen addresses (e.g., /ip4/0.0.0.0/udp/9000/quic-v1)
pub listen_addrs: Vec<Multiaddr>,
/// Bootstrap peers for initial DHT discovery
pub bootstrap_peers: Vec<Multiaddr>,
/// Scheduling weights
pub scheduling_weights: SchedulingWeights,
/// Default verification method
pub verification_method: VerificationMethod,
/// Default redundancy for verification
pub default_redundancy: usize, // Default: 3
/// Maximum retries per piece
pub max_retries: usize, // Default: 3
/// Piece computation timeout
pub piece_timeout: Duration, // Default: 300s
/// Minimum provider reputation for task assignment
pub min_provider_reputation: f64, // Default: 0.3
/// Heartbeat interval
pub heartbeat_interval: Duration, // Default: 30s
/// Maximum concurrent pieces per node
pub max_concurrent_pieces: usize, // Default: 4
/// Pipeline parallelism minimum bandwidth (Mbps)
pub pipeline_min_bandwidth_mbps: u32, // Default: 1000
}
Events
pub enum SwarmEvent {
// Peer events
PeerDiscovered(PeerId, ProviderCapabilities),
PeerConnected(PeerId),
PeerDisconnected(PeerId),
PeerReputationUpdated(PeerId, f64),
// Task events
TaskSubmitted(TaskId, TaskType),
TaskDecomposed(TaskId, usize), // task_id, num_pieces
TaskCompleted(TaskId, Vec<u8>),
TaskFailed(TaskId, TaskError),
// Piece events
PieceAssigned { task_id: TaskId, piece_index: usize, provider: PeerId },
PieceComputed { task_id: TaskId, piece_index: usize, provider: PeerId },
PieceVerified { task_id: TaskId, piece_index: usize },
PieceFailed { task_id: TaskId, piece_index: usize, reason: String },
PieceRetried { task_id: TaskId, piece_index: usize, attempt: usize },
// Pipeline events
PipelineStageReady { task_id: TaskId, stage: usize },
PipelineActivationTransfer { from: PeerId, to: PeerId, bytes: u64 },
// Economic events
PaymentEscrowed { task_id: TaskId, amount: u64 },
ProviderRewarded { peer_id: PeerId, amount: u64 },
ProviderSlashed { peer_id: PeerId, amount: u64, reason: String },
}
Implementation
Reference Implementation
The reference implementation lives in the Hanzo node repository:
| Component | Repository | Language | Status |
|---|---|---|---|
| Node runtime | github.com/hanzoai/node | Rust | Active development |
| Swarm protocol | hanzo-node/hanzo-libs/hanzo-compute/ | Rust | Alpha |
| libp2p networking | hanzo-node/hanzo-libs/hanzo-p2p/ | Rust | Alpha |
| Coordinator | github.com/hanzoai/coordinator | Rust | Centralized (Phase 1) |
| CLI | hanzo-node/hanzo-cli/ | Rust | Alpha |
| Provider dashboard | github.com/hanzoai/swarm-ui | TypeScript | Planned |
Dependencies
[dependencies]
libp2p = { version = "0.54", features = ["quic", "noise", "kad", "gossipsub", "identify"] }
tokio = { version = "1", features = ["full"] }
blake3 = "1.5"
serde = { version = "1", features = ["derive"] }
safetensors = "0.4"
candle-core = { git = "https://github.com/hanzoai/candle" }
candle-transformers = { git = "https://github.com/hanzoai/candle" }
Usage Example
use hanzo_compute::{SwarmConfig, ComputeSwarm, TaskType, ModelSpec, ModelFormat};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// Initialize swarm node
let config = SwarmConfig::default()
.with_listen_addr("/ip4/0.0.0.0/udp/9000/quic-v1".parse()?)
.with_bootstrap_peers(vec![
"/dns4/bootstrap.hanzo.ai/udp/9000/quic-v1".parse()?,
]);
let swarm = ComputeSwarm::new(config).await?;
// Submit an inference task
let task_id = swarm.submit_task(TaskType::Inference {
model: ModelSpec {
name: "zen-72b".into(),
format: ModelFormat::Safetensors,
size_bytes: 145_000_000_000,
vram_required_mb: 48_000,
quantization: None,
hash: "abc123...".into(),
},
prompt: vec![Message::user("Explain quantum computing in simple terms.")],
params: SamplingParams {
max_tokens: 1024,
temperature: 0.7,
top_p: 0.9,
..Default::default()
},
})
.with_budget(100) // 100 $AI max
.with_redundancy(3) // 3x verification
.send()
.await?;
// Wait for verified result
let result = swarm.await_result(task_id).await?;
println!("Result: {}", String::from_utf8_lossy(&result));
Ok(())
}
Provider Setup
# Install Hanzo node
cargo install hanzo-node
# Initialize provider configuration
hanzo-node init --role provider
# Register GPU capabilities (auto-detected)
hanzo-node gpu detect
# Output: Found NVIDIA RTX 4090 (24576 MB VRAM, CUDA 12.4)
# Stake $AI tokens
hanzo-node stake deposit --amount 1000
# Start provider node
hanzo-node start --provider
# Listening on /ip4/0.0.0.0/udp/9000/quic-v1
# Connected to 47 peers via DHT
# Registered capabilities: RTX 4090, 24 GB VRAM, CUDA
# Ready for compute tasks
Rollout Phases
| Phase | Timeline | Coordinator | Verification | Settlement |
|---|---|---|---|---|
| Phase 1 | Q1 2026 | Centralized (Hanzo-operated) | PoAI via centralized verifiers | $AI on Lux testnet |
| Phase 2 | Q3 2026 | Semi-decentralized (elected coordinators) | PoAI via staked validators | $AI on Lux mainnet |
| Phase 3 | Q1 2027 | Fully on-chain (Hanzo L1, HIP-0024) | PoAI on-chain consensus | Native Hanzo L1 settlement |
Security Considerations
Sybil Attacks
An attacker creates many fake nodes to dominate task assignment or verification.
Mitigations:
- $AI stake requirement for all roles (minimum 1,000 $AI for providers, 5,000 for validators)
- Reputation system requires history of honest participation -- new nodes start at 0.5 and cannot access high-value tasks
- DID-based identity (
hanzo-didcrate) links PeerIDs to verifiable credentials - VRF-based validator selection prevents an attacker from predicting which nodes will verify their work
Result Manipulation
A provider returns incorrect results to save compute resources (e.g., returns random bytes instead of running inference).
Mitigations:
- Redundant computation: default 3 providers compute the same piece
- PoAI verification: validators independently re-compute and compare
- Commit-reveal scheme: results are committed before reveal, preventing providers from copying others
- Slashing: 10% stake loss per failed verification, making cheating economically irrational
Model Weight Theft
An attacker joins as a provider to steal proprietary model weights.
Mitigations:
- Model weights encrypted in transit (TLS via libp2p Noise protocol)
- TEE (Trusted Execution Environment) support: models can be loaded inside SGX/TDX enclaves where the provider cannot read the weights
- For public models (open weights), this is not a concern
- For proprietary models, only TEE-attested providers are eligible
Eclipse Attacks
An attacker surrounds a target node with malicious peers to control its view of the network.
Mitigations:
- Kademlia DHT with configurable replication factor (default K=20)
- Persistent connections to Hanzo bootstrap nodes
- Peer diversity requirements: scheduler prefers providers from distinct AS numbers / IP ranges
- GossipSub mesh maintenance resists partitioning
Denial of Service
An attacker floods the network with tasks or spams heartbeats.
Mitigations:
- Task submission requires $AI escrow (economic cost to submit)
- Rate limiting per PeerId at the libp2p protocol level
- Priority scheduling ensures legitimate high-priority tasks are served first
- Heartbeat protocol uses GossipSub with configurable message rate limits
Encrypted Inference
For sensitive workloads (medical, financial, personal data):
- Input data encrypted with consumer's public key
- Decryption only inside provider's TEE enclave
- Result encrypted with consumer's key before leaving the enclave
- Provider never sees plaintext input or output
- Attestation proof confirms correct TEE execution
Backwards Compatibility
This is a new protocol. No backwards compatibility concerns exist for the initial deployment.
Future versions of the protocol will maintain backwards compatibility by:
- Semantic versioning of the libp2p protocol ID (
/hanzo/swarm/1.0.0,/hanzo/swarm/2.0.0) - Negotiation during connection handshake to determine supported protocol version
- Graceful degradation: newer nodes support older protocol versions for a deprecation period
Test Vectors
Piece Decomposition
Input: Embedding task with 100 texts, batch_size = 25
Expected: 4 pieces, each containing 25 texts
Piece hashes:
piece[0] = Blake3("task_id:0:" + serialize(texts[0..25]))
piece[1] = Blake3("task_id:1:" + serialize(texts[25..50]))
piece[2] = Blake3("task_id:2:" + serialize(texts[50..75]))
piece[3] = Blake3("task_id:3:" + serialize(texts[75..100]))
Reputation Scoring
Initial score: 0.5
After 10 successful tasks: min(1.0, 0.5 + 10 * 0.01) = 0.6
After 1 failed verification: max(0.0, 0.6 - 0.10) = 0.5
After 1 slash: max(0.0, 0.5 - 0.25) = 0.25
Status: Below compute threshold (0.3), node must rebuild reputation
Scheduling Score
Provider A: model fits (1.0), cached (1.0), reputation 0.8, latency 20ms (0.9), load 0.2 (0.8)
Score = 1.0 * 0.35 + 1.0 * 0.25 + 0.8 * 0.20 + 0.9 * 0.10 + 0.8 * 0.10
= 0.35 + 0.25 + 0.16 + 0.09 + 0.08 = 0.93
Provider B: model fits (1.0), not cached (0.0), reputation 0.9, latency 50ms (0.7), load 0.5 (0.5)
Score = 1.0 * 0.35 + 0.0 * 0.25 + 0.9 * 0.20 + 0.7 * 0.10 + 0.5 * 0.10
= 0.35 + 0.00 + 0.18 + 0.07 + 0.05 = 0.65
Result: Provider A selected (0.93 > 0.65), cache hit avoids model transfer
PoAI Verification (Exact Hash)
Task: Embedding generation for "Hello world" with model zen-embed-v1
Provider result hash: Blake3("0.123,0.456,...,0.789") = "a1b2c3..."
Verifier 1 result hash: Blake3("0.123,0.456,...,0.789") = "a1b2c3..."
Verifier 2 result hash: Blake3("0.123,0.456,...,0.789") = "a1b2c3..."
Consensus: 3/3 match = 100% -> Verified (ExactHash method)
Related HIPs
| HIP | Title | Relationship |
|---|---|---|
| HIP-0001 | $AI Token | Native currency for compute payments and staking |
| HIP-0005 | Post-Quantum Security | Cryptographic primitives for peer identity and proofs |
| HIP-0006 | Per-User Fine-Tuning | PoAI verification concepts adapted for swarm |
| HIP-0008 | Hamiltonian Market Maker | Economic settlement layer for compute pricing |
| HIP-0009 | Agent SDK | Higher-level task orchestration that submits to the swarm |
| HIP-0020 | Blockchain Node Standard | Node runtime that hosts the swarm protocol |
| HIP-0024 | Hanzo Sovereign L1 | On-chain settlement and decentralized coordinator (Phase 3) |
| HIP-0025 | Bot/Agent Wallet Protocol | Agent wallets that interact with the swarm programmatically |
Open Questions
-
Pipeline parallelism latency: What is the maximum acceptable inter-node latency for pipeline stages before throughput degrades below single-node quantized inference? Benchmarking needed.
-
Non-deterministic verification thresholds: The semantic similarity threshold (0.95) for text generation verification needs empirical tuning across model families and task types.
-
Coordinator decentralization timeline: Moving from centralized to on-chain coordinator requires the Hanzo L1 (HIP-0024) to be production-ready. Timeline is coupled.
-
Cross-chain settlement: Should the swarm support payment in tokens other than $AI (e.g., ETH, USDC) via cross-chain bridges? This adds complexity but may improve adoption.
-
Privacy-preserving verification: Can PoAI verification be done without re-executing the full task? Zero-knowledge proofs for AI inference are an active research area.
Copyright
Copyright and related rights waived via CC0.