Your private LLM.On hardware people own.
A peer-to-peer mesh of contributed machines running open-weight models end-to-end — private, low-cost inference for summarizing, classifying, and background agent work. No third-party AI provider in the middle.
Two roles. One mesh.
Anyone can chat — at closedmesh.com or in the desktop app — without running anything themselves. Inference is served by peers who've chosen to contribute compute by running the ClosedMesh LLM runtime on their own hardware. Anybody can be one, both, or neither.
Web at closedmesh.com or in the desktop app. Type a message, get a streamed response. No account, no setup, nothing to install.
Requests land at the public mesh entry point and are routed to a peer that can serve the requested model — by capability, by load, by latency.
Volunteered nodes running ClosedMesh LLM serve each session end-to-end on whichever peer fits the model. Auto-routes around offline ones; can pair two peers via speculative decoding for the mid-tier.
Capacity is everywhere. ClosedMesh just uses it.
Prompts go to a peer running an open-weight model on hardware someone in the mesh owns. No OpenAI, Anthropic or Google in the loop — nothing to revoke, no provider terms to read.
On an M3 Max or M4 Max with 64–128 GB of unified memory, a $2.5–4.5k laptop becomes a 30B–70B-capable inference box at speeds same-price Windows GPU setups can't match. CUDA / ROCm / Vulkan boxes join too — each shines at different model sizes.
A model that fits on one peer runs there end-to-end, full quality, zero per-token network overhead. For the mid-tier, two peers pair via speculative decoding — a fast draft proposes, a larger verifier accepts in one batched pass.
The mesh checks that a peer actually runs the model it advertises: each publishes a deterministic fingerprint and the network re-runs an unpredictable synthetic probe to compare. A peer can't claim a big model while quietly serving a smaller one. Real prompts are never replayed.
Every peer exposes a standard /v1/chat/completions endpoint. Drop-in for any tool that speaks OpenAI — agents, IDE plugins, internal scripts.
Don't want to trust anyone else? The runtime other peers run is the runtime you can run yourself. It's fully open source — share compute, rent it, or keep it entirely in-house.
Use the mesh. Or become part of it.
A private LLM in your browser — no signup, nothing to install. Your prompts go to a peer, never a third-party AI provider.
Try the mesh →Have a capable Mac or GPU box? Download the desktop app or curl the runtime. It autostarts and joins the mesh, adding capacity for everyone.
Download →An emerging marketplace pays peers for the sessions they serve, with reputation and sample-and-verify keeping it honest. Rolling out as the network grows.
How it works →Built for the work you keep in-house.
ClosedMesh is private, low-cost inference for the work open-weight models do well. It's for teams where keeping data in-house and keeping per-token costs flat matter more than shaving a second off every reply.
- Summarizing documents and codebases
- Classifying or labeling data at scale
- Long-running background agents and pipelines
- Synthetic-data generation
- Anything private or high-volume where an instant answer isn't the point
- Private by default — prompts go to a peer, never a third-party AI provider
- Yours to control — runs on your own hardware and the mesh, not a rented black-box endpoint
- No lock-in — OpenAI-compatible API, fully open-source runtime
- Verified peers — each one proves it runs the model it advertises
The questions people ask first
What is ClosedMesh?
A peer-to-peer mesh that runs open-weight models end-to-end on hardware contributors already own. Chat with it in your browser; behind the scenes a capability-aware router sends each session to a peer that can serve it. No third-party AI provider sits in the middle.
Do I need to sign up or install anything to chat?
No. Open closedmesh.com and start typing — no account, no install. The desktop app is only needed if you want to run a node and contribute compute.
Can a peer read my prompts?
The peer serving your session has to read the prompt to run inference — that's the honest trade versus a hosted API. The runtime is open source so peers can be audited, sessions aren't tied to an identity, and for anything you don't want to trust to others, you can run your own peer with the same runtime.
Which models can I use?
Open-weight models served by live peers — the set changes as peers come and go, which is why the live status above lists what's serving right now. Apple Silicon Macs with enough memory can serve 30B–70B-class models at full quality; smaller machines serve smaller models well.
What hardware can contribute?
Apple Silicon Macs (Metal), and NVIDIA (CUDA), AMD (ROCm) or Intel/other (Vulkan) GPU boxes on macOS, Linux, or Windows. The installer detects your OS, CPU architecture and GPU vendor and pulls the matching build.
Bring a real, private LLM into your work.
Chat with the mesh in your browser, or lend your hardware and grow it for everyone.