Early access. The mesh is live and under active development — latency and uptime vary. Contributors earn credits (not cash yet). Security model →

How it works · deep dive

One collective computer.Made of every machine that joins.

Senda runs open-weight models end-to-end on the hardware contributors already own. The interesting part isn't that it's peer-to-peer — plenty of those have failed — it's that the whole design is built around the one constraint that killed the others: the physics of residential internet. This page is the honest version of how that works.

Reading the mesh…

A luminous path winding through a branching leaf-vein network

The constraint

The unit of work is a session, not a token.

Datacenter GPUs talk to each other over NVLink and InfiniBand at roughly 50–200 microseconds round-trip. Residential internet is 20–200 milliseconds — three to four orders of magnitude slower, and two to three orders worse on bandwidth. No amount of clever code closes that gap. Every architectural decision below follows from taking it seriously instead of pretending it away.

Per-token cross-peer traffic is fatal

Put the network on the per-token critical path and a 70B model that should decode around 30 tokens/sec collapses below 1 under real-world latency. This is exactly where Petals, BitTensor inference, and the earlier Mesh-LLM forks died.

Per-session cross-peer traffic is fine

A one-second setup and a few-millisecond handoff per thousand tokens is invisible to a user. So Senda routes a whole session to one peer — it doesn't stitch fragments of a forward pass across slow links mid-decode.

Speculative decoding is the exception

It's the one multi-peer pattern where a single network hop amortises across a whole batch of tokens. That's why it's the only cross-peer cooperation Senda leans on, and why it's the path to models bigger than one peer can solo.

Architecture

Two layers, one product.

Senda is split between a thin product surface — the chat UI you're using right now — and a peer-to-peer inference runtime that handles model loading, routing, and distribution across machines. They're shipped and versioned separately.

Chat surface

Senda

Where you actually use the thing.

A web chat at senda.network — open it and start typing.
A native desktop app that ships the same chat plus the controls for running a node yourself.
Streaming responses, thread persistence, model picker, OpenAI-compatible API for tools and agents.

Inference runtime · open source

Senda LLM

The peer-to-peer engine that serves the chat.

Runs on machines volunteered to the mesh — Apple Silicon Macs, NVIDIA / AMD / Intel GPU boxes, on-prem workstations.
Replication-first: a model that fits on one peer runs there end-to-end, full quality, zero per-token network overhead.
Same-box speculative decoding — a small fast draft, a larger verifier — as an optional decode speedup on a single peer.
Capability-aware routing: requests only go to peers that can actually serve them.
Built on an Iroh QUIC overlay with a gossip protocol for capability announcement.

github.com/senda-network/senda-llm →

Cooperation

How a session actually gets served.

In order of how often they're the right answer. The first is the common case the whole system is tuned for; the last is a power-user fallback we'd rather you didn't need.

Replication — the default

One peer serves a whole session end-to-end at full quality. A model that fits on one machine runs there, with zero per-token network overhead. This is the common case and the one Senda optimises for.

Speculative decoding — same box

On a single peer, a small fast draft proposes tokens and a larger verifier accepts them in batched passes — a decode speedup with no network in the loop. The cross-peer variant (draft and verifier on different machines) was benchmarked and shelved: the WAN round-trip erased the gain.

Inter-model collaboration

Several peers can quietly contribute to one answer — a multi-modal input handled by one model, a second opinion from another. The caller still sees a single streamed response.

Pipeline / expert split — the fallback

For models too large for any single peer, weights can be split across machines. It's documented and available, but deprecated as a daily driver: it puts the network back on the critical path, which the physics above says to avoid.

The path of a request

From your keystroke to a peer and back.

Anyone can chat without running anything. Inference is served by peers who've chosen to contribute compute by running the Senda LLM runtime on their own hardware. Anybody can be one, both, or neither.

Chat

Web at senda.network or in the desktop app. Type a message, get a streamed response. No account, no setup, nothing to install.

Mesh entry + routing

Requests land at the public mesh entry point. A capability-aware router picks a peer that can actually serve the requested model — by backend, memory, loaded models, load, and latency — using session-sticky hashing so follow-up turns prefer the peer that already holds the KV cache.

Compute peers

Volunteered nodes serve each session end-to-end on whichever peer fits the model. The router auto-routes around offline ones. A model too big for any single peer can be split across several — a power-user fallback, not the usual path.

The app

What you actually install and use.

The desktop app is a thin native shell around the same runtime peers run in production — chat, model management, mesh status, and a one-click path to start contributing.

Senda — Chat

Chat

Private LLM, right in the app

Stream answers from the mesh with model selection, thread history, and no account.

Get the app →

Senda — Setup

Setup

One click to join the mesh

Install the runtime, opt into staying live on login, and start contributing capacity.

How to contribute →

Senda — Web

Web

Try it in the browser first

No install required — chat at senda.network, then graduate to the desktop app when you want to run a node.

Try the mesh →

Privacy & trust

What you're trusting — and what you're not.

The honest version: when you chat, you're trusting the peer your session lands on, the mesh entry node, and the chat UI. You are nottrusting any third-party AI provider. That's the trade — here's what backs it.

Open-source runtime

Every peer runs the same open-source runtime, so what a peer can and can't do is auditable. There's no closed black box deciding what happens to your prompt.

Session pseudonymity

No login. A peer doesn't know who you are unless your prompt reveals it, and sessions aren't tied to an identity. Traffic to the entry node is TLS-encrypted.

Verified peers

Each peer publishes a deterministic model-identity fingerprint, and the network re-runs an unpredictable synthetic probe to confirm it actually serves the model it advertises. A peer can't claim a big model while quietly serving a smaller one. Only synthetic probes are replayed — never your prompts.

Run your own peer

For work you don't want to trust to anyone else, the runtime other peers run is the runtime you can run yourself. Nothing about the design forces you to share compute or rent it from others.

Hardware support

Whatever the team is already running.

The installer detects OS, CPU architecture and GPU vendor, then pulls the matching runtime build. You can also pin a backend explicitly for unusual setups. Apple Silicon is the hero hardware — M-series unified memory is what makes a consumer machine genuinely capable of 30B–70B models — but the mesh is heterogeneous on purpose.

OS	Hardware	Backend
macOS	Apple Silicon	Metal
Linux	x86_64 · NVIDIA	CUDA
Linux	x86_64 · AMD	ROCm
Linux	x86_64 · Intel / other	Vulkan
Linux	x86_64 · CPU-only	CPU
Linux	aarch64	Vulkan / CPU
Windows 10/11	x86_64 · NVIDIA	CUDA
Windows 10/11	x86_64 · AMD / Intel / other	Vulkan
WSL2	x86_64 · NVIDIA passthrough	CUDA

Limits

What Senda isn't.

Stating the obvious objections before you do. Senda is for latency-tolerant, private, high-volume work — not for everything.

Not a frontier-model network

There are no GPT-class closed weights here. Senda serves open-weight models, which have caught up on most non-frontier work but aren't the top of the leaderboard.

Not the fastest median chat

A hosted API wins on first-token latency for a single quick reply. Senda is the wrong tool for shaving a second off every message and the right one for work where an instant answer isn't the point.

Not a training network

No gradient passes across the mesh. The residential-WAN physics that make per-token cross-peer traffic fatal make distributed training a non-starter — it's explicitly out of scope.

Not fungible compute

The unit is a session of a specific model served at measured quality — not an interchangeable GPU-second. A token from a 0.6B draft and a token from a 70B verifier are different products, and Senda prices them that way.