Outlier › data

Local AI benchmarks for Mac (2026) — Outlier vs Ollama vs LM Studio

Name: Local AI benchmarks on Mac, 2026 — Outlier, Ollama, LM Studio
Creator: Outlier
Published: 2026-04-30
License: creativecommons.org/licenses/by/4.0/

Outlier · solo-built in Grand Rapids · published 2026-04-30 Last updated 2026-05-20

Quick answer

7B-class chat runs ~70 tok/s on an M1 Ultra with Outlier Nano 4B. Ollama and LM Studio land in the 60–110 tok/s range, depending which model you load.
For 27B coding, Outlier Core 27B does 20.7 tok/s and is the strongest coding tier in the lineup.
397B-class MoE? Only Outlier runs it on a 64 GB Mac (V9 paged engine, ~2.1 tok/s, ~11 GB RSS). Ollama and LM Studio just can't fit the thing.
The only fully-published, verified accuracy figure is Nano's HumanEval 81.1% pass@1 (full 164-set). Other accuracy figures are still being finalized, so this page sticks to measured tok/s and memory.

Raw numbers, not vibes. This is the benchmark data behind every major local-AI option on Apple Silicon in 2026. Outlier's figures come off my dev M1 Ultra. Ollama and LM Studio numbers cite public benchmark posts where it makes sense.

Outlier lineup — measured (M1 Ultra, MLX 4-bit, batch 1, 4096 prefill, 256 decode)

Tier	Params	Disk	RAM (peak)	Decode tok/s
Nano 4B	4B dense	~3 GB	~4 GB	71.7
Lite 9B	9B dense	~6 GB	~7 GB	53.4
Quick 26B	26B-a4b MoE	~16 GB	~17 GB	14.6
Core 27B	27B dense	~16 GB	~17 GB	20.7
Vision 35B-A3B	35B-A3B MoE	~20 GB	~18 GB (cap) / ~3.5 GB (V10)	~8 (V10) / 16 (cap)
Plus 397B-A17B	397B-A17B MoE	~209 GB	~11 GB (V9 paged)	2.1 (V9)

Source: Outlier FINAL_LAUNCH_NUMBERS.md. The M1 Ultra Mac tok/s bench ran 2026-04-29. Accuracy figures are still being finalized; the only fully-published, verified number is Nano HumanEval 81.1% pass@1 (full 164-set).

Cross-tool comparison (Mac, comparable 7B/13B/27B models)

Tool	Model	Format	Mac	Decode tok/s
Outlier	Nano 4B	MLX 4-bit	M1 Ultra 64 GB	71.7
Outlier	Nano 4B	MLX 4-bit	M4 Air 16 GB	~32
Ollama	Llama 3.1 8B Q4_K_M	GGUF	M1 Ultra 64 GB	~60–80 (public posts)
LM Studio	Qwen 2.5 7B Q4	GGUF/MLX	M2 Max 64 GB	~70–100 (public posts)
Outlier	Core 27B	MLX 4-bit	M1 Ultra 64 GB	20.7
Ollama	Qwen 2.5 Coder 32B Q4	GGUF	M1 Ultra 64 GB	~15–22 (public posts)
Outlier	Plus 397B-A17B	MLX 4-bit, V9 paged	M1 Ultra 64 GB	2.1
Ollama	any 397B model	—	64 GB Mac	won't load (RAM)
LM Studio	any 397B model	—	64 GB Mac	won't load (RAM)

Source: Outlier numbers measured locally. The Ollama and LM Studio ranges come from publicly-shared benchmark posts as of 2026-05. Exact apples-to-apples is genuinely hard here. Prompts differ, prefill sizes differ, batch settings differ. The ranges shown are what's typical.

Cost comparison (24 months, single Mac developer)

Setup	24-month total
Outlier Free (Nano + Lite)	$0
Outlier Pro ($20/mo)	$480
Outlier Pro annual ($149/yr × 2)	$298
Outlier Founding 200 ($99 once, lifetime Pro)	$99
Outlier Founders 500 ($200 once, lifetime Pro)	$200
Ollama (OSS, free)	$0
LM Studio (free for personal use)	$0
ChatGPT Plus / Claude Pro ($20/mo)	$480
ChatGPT Pro / Claude Max ($200/mo)	$4,800

What the bench doesn't measure

Prompt prefill / TTFT. Decode tok/s is the steady-state figure. Feed a long prompt and your first token can still be several seconds out, even on the fast tiers.
Tier swap cost. Switch to a tier you haven't touched in a while and the cold load runs 10s–74s, scaling with size.
Quality on agentic loops. The bench tested single-turn outputs. Multi-turn agent quality swings more on the app than the model.
Vision tasks rigorously. Those Vision 35B numbers are language-only. A real image-task quality bench is still coming.
Long-context (50k+). Every bench prompt was capped at 4096 prefill. Push past that and the cloud flagships still pull ahead by a real margin.

How to reproduce

Outlier: grab it from outlier.host, then run the bundled benchmark harness in the app's developer console.
Ollama: ollama run <model> --verbose prints tok/s for every response.
LM Studio: the chat REPL shows tok/s right in the response footer.
Accuracy evals: reach for lm-evaluation-harness, loading each model through its native backend.

Frequently asked questions

How fast is Outlier on a Mac?

On M1 Ultra: Nano 4B 71.7 tok/s, Lite 9B 53.4, Core 27B 20.7, Plus 397B 2.1 on the V9 paged engine. Nano hits about 32 tok/s on an M4 Air.

How does Outlier compare to Ollama and LM Studio?

Similar tok/s on shared 7B to 27B model classes; Outlier uniquely runs MoE models bigger than RAM via its V9 paged engine.

What are Outlier's accuracy numbers?

Core 27B is the reasoning-strong coding tier; Nano 4B is the fast everyday tier. Nano's only fully-published, verified figure is HumanEval 81.1% pass@1 on the full 164-set. The rest are still being finalized.

Try Outlier free

Free Nano + Lite — local, private, no account. Pro $20/mo or $149/yr adds everything (Plus 397B, Marathon mode, Computer use, Deep Research v3, long context to 128K). Lifetime Pro from $99 (Founding 200, first 200 seats) or $200 (Founders 500). Apple Silicon only.

Download for Mac