Outlier  ›  run

Local AI setup for M1 / M2 / M3 / M4 Mac Studio (2026 guide)

Quick answer
  • M1 Max 32 GB Studio handles Nano, Lite, Quick, Core, and Code 27B.
  • M2 Ultra 64 GB adds Vision 35B. It also runs Plus 397B (V9 paged inference).
  • 96 GB+ Studio runs Plus 397B in capacity mode, which means higher tok/s.
  • Measured M1 Ultra tok/s: Nano 71.7, Lite 53.4, Core 20.7, Plus 2.1.

The Mac Studio is about the best machine you can buy for local AI. It packs a huge pool of unified memory next to a genuinely fast SSD, and it never throttles under load. Below I break down which model tiers actually run on each Studio generation, the tok/s you should expect from them, and where the real cliffs hide.

Why Mac Studio matters for local AI

On Apple Silicon the GPU and CPU share the same RAM. So there's no model-loading step that shoves weights across a PCIe bus into a separate GPU. The whole model (or a streamed slice of it) sits in unified memory and the GPU works on it right where it is. One of the big bottlenecks PC builds have to design around just isn't there.

The Mac Studio Ultra configs (M1, M2, M3 Ultra) ship with 64 GB to 192 GB of unified memory. Sustained SSD reads land between 5 and 15 GB/s. Put those two together and a 397B-parameter MoE model becomes viable on one desktop.

Tier-by-Mac-Studio cheat sheet

Outlier ships seven model tiers. This is what each Studio can actually load.

Mac StudioRAMRecommended tiers
M1 Max (any)32 GBNano, Lite, Quick 26B, Core 27B, Code 27B
M1 / M2 Ultra (base)64 GB+ Vision 35B, + Plus 397B (V9 paged inference, ~2.1 tok/s)
M2 / M3 Ultra (mid)96 GB+ Plus 397B in capacity mode (faster, ~5 tok/s estimated)
M1 / M2 / M3 Ultra (max)128–192 GBAll tiers in capacity mode, full headroom for long context

64 GB is the line everything pivots on. Stay below it and you're capped at Core 27B and down. Hit 64 GB and Plus 397B opens up, the strongest model Outlier makes, running through the V9 paged engine. Go past 96 GB and the streaming disappears entirely.

Measured tok/s on M1 Ultra

These come straight off the Outlier Mac bench. A 192 GB M1 Ultra, batch size 1, 4096-token prefill, 256-token decode.

TierDecode tok/s
Nano 4B71.7
Lite 9B53.4
Core 27B20.7
Plus 397B (V9 paged inference, K=20)2.1

To put that in context, an M4 MacBook Air 16 GB does about 32 tok/s on Nano 4B from a clean boot. Newer Ultra Studios (M2, M3, M4) beat these M1 figures thanks to a faster SoC and quicker memory. The order of magnitude per tier stays the same, though.

What to buy if you're starting from scratch

Building one mainly for coding? This is how I'd spend.

Run the math against the cloud. A $200/month AI subscription is $4,800 over two years. A used M1 Max Studio paired with Outlier Pro ($20/mo, or lifetime from $99) comes in well under that, and it gives you the same coding-quality workflow on the 95% of tasks that matter. No recurring bill. No usage cap.

Frequently asked questions

Which Mac Studio is best for local AI?

A 64 GB M2 Ultra is the sweet spot: it runs every tier including Plus 397B via the V9 paged engine. 32 GB handles up to Core 27B.

How fast is local AI on an M1 Ultra?

Measured decode speeds: Nano 4B 71.7 tok/s, Lite 9B 53.4, Core 27B 20.7, and Plus 397B 2.1 in V9 paged inference.

Do newer Mac Studios run faster?

Yes. M2 and M3 Ultra have faster memory and SSDs than M1, so tok/s is higher, though the order of magnitude per tier holds.

Try Outlier free

Free Nano + Lite — local, private, no account. Pro $20/mo or $149/yr adds everything (Plus 397B, Marathon mode, Computer use, Deep Research v3, long context to 128K). Lifetime Pro from $99 (Founding 200, first 200 seats) or $200 (Founders 500). Apple Silicon only.

Download for Mac