Outlier › data

Streaming engine benchmarks — Plus + Vision tok/s on Apple Silicon

Name: Outlier streaming engine benchmarks — V10 + V11
Creator: Outlier
Published: 2026-05-09
License: creativecommons.org/licenses/by/4.0/

Outlier · solo-built in Grand Rapids · published 2026-05-09 Last updated 2026-05-20

Quick answer

Current engine (V9 paged): Plus 397B runs at ~2.1 tok/s, ~11 GB RSS, 3.6 GB Wired Memory on a 64 GB Mac.
Both streaming engines retired: V11 in v1.11.71, V10 two builds later in v1.11.73.
Why: per-token streaming on V10/V11 committed 57–58 GB of Metal-wired memory during prefill. That's enough to crash a 32 GB Mac, and all it bought was a ~30–40% tok/s gain.
V9 paged (K=20 hot experts + 8 GB LRU) sits at 3.6 GB, 15× less wired memory. On a footprint-first tier, that's the trade you want.

Engine version note: Both streaming engines have been retired — V11 in v1.11.71 and V10 in v1.11.73 — after M1 Ultra sweeps found their per-token expert streaming committed 57–58 GB of Metal-wired memory. The current shipping engine for Plus (and the Vision default) is V9 paged inference. The retired engines' numbers are kept here for transparency; new installs use V9.

All three of Outlier's MoE engines run on the same trick. Keep the active experts in unified memory, pull cold ones off disk. V9 paged, V10 streaming, V11 hybrid. Where they split is how they cache, and that split is what decided which one shipped. The streaming variants lost to V9 paged because their per-token streaming pattern committed far more Metal-wired memory. The raw comparison is below.

Methodology

Hardware	M1 Ultra Mac Studio, 64 GB unified memory, `iogpu.wired_limit_mb=57344`
Model versions	Plus = Qwen3.5-397B-A17B MLX 4-bit; Vision = Qwen3.6-35B-A3B MLX 4-bit
Inference stack	mlx-lm 0.31.3, MLX 0.18.x
Aggregation	5-run mean per configuration, fresh process per run (no warm-cache bias)
Soak runs	50-turn multi-turn workloads on Plus; 20-turn on Vision
RAM measurement	OS-level RSS via `ps -o rss=`, sampled per token
Date range	2026-05-07 (V10 results), 2026-05-09 (V11 soak verdict), 2026-05-19 (V11 retired, v1.11.71), v1.11.73 (V10 retired)

Plus 397B-A17B — engine comparison

Engine	tok/s	RSS peak	Wired Memory peak	Status
V9 paged K=20	2.1	11 GB	3.6 GB	current — shipping default
V10 streaming K=4	2.9	13 GB	57 GB	retired v1.11.73
V11 hybrid LRU	3.0	13 GB	58 GB	retired v1.11.71

Look at the Wired Memory column. That's the one that mattered. V10 and V11 streamed experts per token, which committed 57–58 GB of Metal command-buffer memory during prefill, enough to take down a 32 GB Mac on the very first generation. The payoff was a ~30–40% tok/s bump over V9. V9 paged holds 3.6 GB Wired at 2.1 tok/s, 15× less. This tier exists to fit a 209 GB model on a small Mac, so footprint beats speed and V9 ships.

Vision 35B-A3B — engine comparison

Engine	tok/s	RSS peak	Status
V9 paged	16.31	~18 GB	current default (24 GB+ Macs)
V10 streaming K=4	6.57	~3.5 GB	opt-in (lower RAM, slower)
V11 hybrid LRU	10.92	1.60 GB	retired v1.11.71

Vision is a little different. There's still a V10 opt-in for people on tight RAM, lower footprint at the cost of speed (~6.6 tok/s), but the default is V9 at 16.31 tok/s on 24 GB+ Macs. That old 16 GB MacBook Air result leaned on V11's LRU specifically. With V11 gone, the practical floor for Vision is 24 GB.

Soak verdict — what passed and what didn't

Here's what the 2026-05-09 soak verdict on V11 turned up over 9 cycles. The quality numbers were fine. The memory number wasn't.

135/135 turns coherent. No output collapse, no repetition loops.
0 crashes, 0 segfaults across both Plus N=4 (100 turns) and Vision N=2 (35 turns).
Engine-swap V11→V9→V11 round-trip held up: post-swap tok/s 1.51 vs pre-swap 1.36, no regression.
Idle-resume across a 5-minute sleep came back clean. No orphan threads, no leaked tensors.
But memory pressure on 50-turn Plus hit 14.34 GB peak vs smoke 7.34 GB. That's the regression that got it retired.

How to reproduce

Install Outlier. You'll want Pro tier, or Founders for Plus 397B: outlier.host
Open Settings → Engine Mode. Plus runs on Capacity (V9 paged), and any old V10/V11 preference auto-migrates to Capacity for you.
Run a multi-prompt set and grab tok/s from the response footer.
While it's generating, sample OS RSS with ps -o pid,rss,command -p $(pgrep -f outlier-cli).

Want to reproduce V10/V11 themselves? You can't from a current DMG, since neither ships anymore. The source lives in repo history before commit 525298f (v1.11.71) and the v1.11.73 V10 removal.

Frequently asked questions

What engine does Outlier use for Plus now?

V9 paged inference (K=20 hot experts per layer plus an 8 GB LRU cache). It runs Plus 397B at about 2.1 tok/s with ~11 GB RSS and 3.6 GB Wired Memory. The V10 and V11 streaming engines were retired.

Why were the streaming engines retired?

V10 and V11 per-token expert streaming committed 57–58 GB of Metal-wired memory during prefill — enough to crash a 32 GB Mac — for only a 30–40% tok/s gain. V9 paged uses 3.6 GB Wired (15× less), so on a footprint-first tier it wins.

How were the numbers measured?

5-run mean per configuration, fresh process per run, on an M1 Ultra with OS-level RSS sampled per token.

Try Outlier free

Free Nano + Lite — local, private, no account. Pro $20/mo or $149/yr adds everything (Plus 397B, Marathon mode, Computer use, Deep Research v3, long context to 128K). Lifetime Pro from $99 (Founding 200, first 200 seats) or $200 (Founders 500). Apple Silicon only.

Download for Mac