How to run AI coding offline on Apple Silicon

Outlier · solo-built in Grand Rapids · published 2026-05-19 Last updated 2026-05-20

Quick answer

Any Apple Silicon Mac (M1+) with 16 GB+ RAM runs the whole offline coding workflow.
Core 27B is the strongest coding tier. That's enough for real refactors.
Kill the wifi and it keeps working. Chat, agent, file edits, project memory.
You give up about 4× the speed of cloud. You get back zero API cost, no rate limit, no data leaving the box.

Got an M-series Mac with 16 GB of RAM or more? Then you can run a real AI coding workflow with no internet, no API key, nothing touching the cloud. This is what that actually looks like in 2026: the stack I run, the tradeoffs I've lived with, and the spots where local still loses to the cloud.

The 30-second setup

Shortest path is a Mac-native app that bundles the inference runtime, the model weights, and a chat / agent UI in one download. Outlier is one of those. The whole sequence:

Grab the signed Mac DMG from outlier.host. The installer is about 150 MB.
First launch pulls Nano 4B (~3 GB) from HuggingFace. After that one download, inference never touches the network again.
Want proof? Switch off your wifi. Chat, agent runs, file edits, project memory. All of it still runs.

You've got other choices here. Ollama, LM Studio, Jan, llama.cpp straight up. They differ on defaults, on how polished the UI is, on which models they make painless to load. The offline part is the same no matter which one you pick.

What runs on what hardware

Apple Silicon shares one pool of RAM between the GPU and CPU. That's the unified memory thing. So the rule is dead simple: your RAM is your usable model size, minus a few GB for the OS and the app.

RAM	What runs comfortably
16 GB	Nano 4B and Lite 9B
24-32 GB	Adds Quick 26B, Core 27B, Code 27B, and Vision 35B
64 GB	Adds Plus 397B via the V9 paged engine at ~2.1 tok/s
96+ GB	Plus 397B in Speed mode (fully resident in RAM) for higher throughput

For coding, Core 27B is the workhorse — the strongest coding tier in the lineup. When I ran a head-to-head against Claude Opus, I genuinely couldn't tell the outputs apart on refactors, reasoning, knowledge questions, writing, or translation.

What the offline workflow gives up

Two things, and they're both real:

Speed feel. Core 27B on my M1 Ultra runs about 22 tok/s. Claude Opus or GPT-5 in the cloud do 80 to 100. On a 5,000-token reply that works out to roughly 4 minutes locally against about 1 minute online. Same answer at the end. The wait is the price.
Research-grade ceiling. Truly novel research, 50k+ token contexts, PhD-level multi-hop reasoning. The cloud flagships still win those. But for the everyday 95% (refactors, explanations, comparisons, debugging) Outlier's local Core 27B held its own against Claude on output quality, including the hardest cases I threw in (chess engine, raft/paxos explanation, ZK proofs, that kind of thing).

What it gives you back

Nothing leaves the Mac. No rate limit. No token bill. No surprise email telling you your model got deprecated. Your code, your repos, your chat history all sit on your own disk. It works on a plane. Behind a corporate firewall. In a coffee shop running on garbage wifi. And the context basically never resets. I've started a chat in March and picked it right back up in June, no "your context window has been cleared."

If you work under a strict data-handling policy, the offline guarantee is the entire reason to do this. For everyone else, it's insurance against the day your cloud provider quietly rewrites the terms.

Frequently asked questions

Can AI coding work completely offline?

Yes. After the first model download, chat, agent runs, file edits, and project memory all work with wifi turned off on any Apple Silicon Mac.

What Mac do I need for offline AI coding?

Any Apple Silicon Mac (M1 or newer) with 16 GB or more of RAM. Serious coding with Core 27B is best on 32 GB or more.

How does offline local AI compare to cloud tools?

On everyday coding work, local Core 27B holds its own against Claude Opus on output quality. The main tradeoff is speed: roughly 22 tok/s locally versus 80 to 100 in the cloud.

Try Outlier free

Free Nano + Lite — local, private, no account. Pro $20/mo or $149/yr adds everything (Plus 397B, Marathon mode, Computer use, Deep Research v3, long context to 128K). Lifetime Pro from $99 (Founding 200, first 200 seats) or $200 (Founders 500). Apple Silicon only.

Download for Mac