How much RAM do you need to run local AI on a Mac?

Outlier · solo-built in Grand Rapids · published 2026-06-09 Last updated 2026-06-09

Quick answer

16 GB: Nano 4B and Lite 9B. Everyday chat, drafting, a bit of coding.
32 GB: adds Quick 26B, Core 27B, Code 27B, Vision 35B. Now you can code seriously and feed it images.
64 GB: adds Plus 397B, the largest tier, via paged expert streaming.
Rule of thumb: usable model size ≈ your RAM minus a few GB for macOS.

RAM is the thing that decides this. On an Apple Silicon Mac the GPU and CPU share one pool of memory, so how much you have caps which AI models will actually run. Short version: 16 GB runs the small ones, 32 GB runs the coding-grade models, 64 GB runs a 397-billion-parameter model. Below is the full picture.

The rule of thumb

Apple Silicon runs on unified memory. The GPU and CPU draw from the same pool of RAM. So your usable model size is roughly total RAM minus a few GB that macOS and the app keep for themselves. A model's 4-bit weights have to fit in what's left. For Mixture-of-Experts models, only the active part has to fit.

That one fact is what maps a RAM number to the models you can run.

What each RAM size runs

RAM	Mac	Models you can run
16 GB	MacBook Air	Nano 4B, Lite 9B
32 GB	MacBook Pro	+ Quick 26B, Core 27B, Code 27B, Vision 35B
64 GB	Mac Studio / MBP	All tiers, including Plus 397B
96+ GB	Mac Studio	Plus 397B with extra headroom for long context

(The coding-grade Core/Code/Vision models won't load under about 24 GB. In practice 32 GB is the bucket you actually want.)

How Outlier stretches your RAM

Plus 397B is the case that surprises people. A 397-billion-parameter model normally wants a server's worth of RAM, north of 200 GB. The V9 paged engine gets around that. It keeps only the active experts of this Mixture-of-Experts model in memory and streams the rest off your SSD, so the thing runs on a 64 GB Mac at about 2.1 tokens/second. See how paged MoE inference works.

Which to buy for AI work

16 GB is fine if you mostly chat, write, and do the occasional bit of code on the free tiers.
32 GB is the sweet spot. Coding with Core/Code 27B, image work with Vision, no sweat.
64 GB is for one reason: you want the largest model, Plus 397B.

Frequently asked questions

Can I run local AI on 8 GB?

It's tight. 8 GB can run the smallest models slowly, but 16 GB is the realistic floor for a smooth experience with Nano and Lite. For coding-grade models, plan on 32 GB.

Do I really need 64 GB for a 397B model?

Yes, for Plus 397B. Outlier streams most of the model's experts from SSD so it doesn't need 200+ GB, but it still wants 64 GB of unified memory to hold the active set and the cache comfortably.

Does more RAM make it faster?

More RAM mainly lets you run bigger models, not run a given model faster. Speed depends more on the chip generation and the model size. More RAM also helps with longer conversations and context.

Try Outlier free

Free Nano + Lite — local, private, no account. Pro $20/mo or $149/yr adds everything (all 7 model tiers incl. Plus 397B). Lifetime Pro from $99 (Founding 200, first 200 seats) or $200 (Founders 500). Apple Silicon only.

Download for Mac