Outlier  ›  run

How to run AI coding offline on Apple Silicon

Quick answer
  • Any Apple Silicon Mac (M1+) with 16 GB+ RAM runs the whole offline coding workflow.
  • Core 27B is the strongest coding tier. That's enough for real refactors.
  • Kill the wifi and it keeps working. Chat, agent, file edits, project memory.
  • You give up about 4× the speed of cloud. You get back zero API cost, no rate limit, no data leaving the box.

Got an M-series Mac with 16 GB of RAM or more? Then you can run a real AI coding workflow with no internet, no API key, nothing touching the cloud. This is what that actually looks like in 2026: the stack I run, the tradeoffs I've lived with, and the spots where local still loses to the cloud.

The 30-second setup

Shortest path is a Mac-native app that bundles the inference runtime, the model weights, and a chat / agent UI in one download. Outlier is one of those. The whole sequence:

  1. Grab the signed Mac DMG from outlier.host. The installer is about 150 MB.
  2. First launch pulls Nano 4B (~3 GB) from HuggingFace. After that one download, inference never touches the network again.
  3. Want proof? Switch off your wifi. Chat, agent runs, file edits, project memory. All of it still runs.

You've got other choices here. Ollama, LM Studio, Jan, llama.cpp straight up. They differ on defaults, on how polished the UI is, on which models they make painless to load. The offline part is the same no matter which one you pick.

What runs on what hardware

Apple Silicon shares one pool of RAM between the GPU and CPU. That's the unified memory thing. So the rule is dead simple: your RAM is your usable model size, minus a few GB for the OS and the app.

RAMWhat runs comfortably
16 GBNano 4B and Lite 9B
24-32 GBAdds Quick 26B, Core 27B, Code 27B, and Vision 35B
64 GBAdds Plus 397B via the V9 paged engine at ~2.1 tok/s
96+ GBPlus 397B in Speed mode (fully resident in RAM) for higher throughput

For coding, Core 27B is the workhorse — the strongest coding tier in the lineup. When I ran a head-to-head against Claude Opus, I genuinely couldn't tell the outputs apart on refactors, reasoning, knowledge questions, writing, or translation.

What the offline workflow gives up

Two things, and they're both real:

What it gives you back

Nothing leaves the Mac. No rate limit. No token bill. No surprise email telling you your model got deprecated. Your code, your repos, your chat history all sit on your own disk. It works on a plane. Behind a corporate firewall. In a coffee shop running on garbage wifi. And the context basically never resets. I've started a chat in March and picked it right back up in June, no "your context window has been cleared."

If you work under a strict data-handling policy, the offline guarantee is the entire reason to do this. For everyone else, it's insurance against the day your cloud provider quietly rewrites the terms.

Frequently asked questions

Can AI coding work completely offline?

Yes. After the first model download, chat, agent runs, file edits, and project memory all work with wifi turned off on any Apple Silicon Mac.

What Mac do I need for offline AI coding?

Any Apple Silicon Mac (M1 or newer) with 16 GB or more of RAM. Serious coding with Core 27B is best on 32 GB or more.

How does offline local AI compare to cloud tools?

On everyday coding work, local Core 27B holds its own against Claude Opus on output quality. The main tradeoff is speed: roughly 22 tok/s locally versus 80 to 100 in the cloud.

Try Outlier free

Free Nano + Lite — local, private, no account. Pro $20/mo or $149/yr adds everything (Plus 397B, Marathon mode, Computer use, Deep Research v3, long context to 128K). Lifetime Pro from $99 (Founding 200, first 200 seats) or $200 (Founders 500). Apple Silicon only.

Download for Mac