How to run AI coding offline on Apple Silicon
- Any Apple Silicon Mac (M1+) with 16 GB+ RAM runs the whole offline coding workflow.
- Core 27B is the strongest coding tier. That's enough for real refactors.
- Kill the wifi and it keeps working. Chat, agent, file edits, project memory.
- You give up about 4× the speed of cloud. You get back zero API cost, no rate limit, no data leaving the box.
Got an M-series Mac with 16 GB of RAM or more? Then you can run a real AI coding workflow with no internet, no API key, nothing touching the cloud. This is what that actually looks like in 2026: the stack I run, the tradeoffs I've lived with, and the spots where local still loses to the cloud.
The 30-second setup
Shortest path is a Mac-native app that bundles the inference runtime, the model weights, and a chat / agent UI in one download. Outlier is one of those. The whole sequence:
- Grab the signed Mac DMG from outlier.host. The installer is about 150 MB.
- First launch pulls Nano 4B (~3 GB) from HuggingFace. After that one download, inference never touches the network again.
- Want proof? Switch off your wifi. Chat, agent runs, file edits, project memory. All of it still runs.
You've got other choices here. Ollama, LM Studio, Jan, llama.cpp straight up. They differ on defaults, on how polished the UI is, on which models they make painless to load. The offline part is the same no matter which one you pick.
What runs on what hardware
Apple Silicon shares one pool of RAM between the GPU and CPU. That's the unified memory thing. So the rule is dead simple: your RAM is your usable model size, minus a few GB for the OS and the app.
| RAM | What runs comfortably |
|---|---|
| 16 GB | Nano 4B and Lite 9B |
| 24-32 GB | Adds Quick 26B, Core 27B, Code 27B, and Vision 35B |
| 64 GB | Adds Plus 397B via the V9 paged engine at ~2.1 tok/s |
| 96+ GB | Plus 397B in Speed mode (fully resident in RAM) for higher throughput |
For coding, Core 27B is the workhorse — the strongest coding tier in the lineup. When I ran a head-to-head against Claude Opus, I genuinely couldn't tell the outputs apart on refactors, reasoning, knowledge questions, writing, or translation.
What the offline workflow gives up
Two things, and they're both real:
- Speed feel. Core 27B on my M1 Ultra runs about 22 tok/s. Claude Opus or GPT-5 in the cloud do 80 to 100. On a 5,000-token reply that works out to roughly 4 minutes locally against about 1 minute online. Same answer at the end. The wait is the price.
- Research-grade ceiling. Truly novel research, 50k+ token contexts, PhD-level multi-hop reasoning. The cloud flagships still win those. But for the everyday 95% (refactors, explanations, comparisons, debugging) Outlier's local Core 27B held its own against Claude on output quality, including the hardest cases I threw in (chess engine, raft/paxos explanation, ZK proofs, that kind of thing).
What it gives you back
Nothing leaves the Mac. No rate limit. No token bill. No surprise email telling you your model got deprecated. Your code, your repos, your chat history all sit on your own disk. It works on a plane. Behind a corporate firewall. In a coffee shop running on garbage wifi. And the context basically never resets. I've started a chat in March and picked it right back up in June, no "your context window has been cleared."
If you work under a strict data-handling policy, the offline guarantee is the entire reason to do this. For everyone else, it's insurance against the day your cloud provider quietly rewrites the terms.
Frequently asked questions
Can AI coding work completely offline?
Yes. After the first model download, chat, agent runs, file edits, and project memory all work with wifi turned off on any Apple Silicon Mac.
What Mac do I need for offline AI coding?
Any Apple Silicon Mac (M1 or newer) with 16 GB or more of RAM. Serious coding with Core 27B is best on 32 GB or more.
How does offline local AI compare to cloud tools?
On everyday coding work, local Core 27B holds its own against Claude Opus on output quality. The main tradeoff is speed: roughly 22 tok/s locally versus 80 to 100 in the cloud.
Try Outlier free
Free Nano + Lite — local, private, no account. Pro $20/mo or $149/yr adds everything (Plus 397B, Marathon mode, Computer use, Deep Research v3, long context to 128K). Lifetime Pro from $99 (Founding 200, first 200 seats) or $200 (Founders 500). Apple Silicon only.
Download for Mac