Streaming engine benchmarks — Plus + Vision tok/s on Apple Silicon
- Current engine (V9 paged): Plus 397B runs at ~2.1 tok/s, ~11 GB RSS, 3.6 GB Wired Memory on a 64 GB Mac.
- Both streaming engines retired: V11 in v1.11.71, V10 two builds later in v1.11.73.
- Why: per-token streaming on V10/V11 committed 57–58 GB of Metal-wired memory during prefill. That's enough to crash a 32 GB Mac, and all it bought was a ~30–40% tok/s gain.
- V9 paged (K=20 hot experts + 8 GB LRU) sits at 3.6 GB, 15× less wired memory. On a footprint-first tier, that's the trade you want.
All three of Outlier's MoE engines run on the same trick. Keep the active experts in unified memory, pull cold ones off disk. V9 paged, V10 streaming, V11 hybrid. Where they split is how they cache, and that split is what decided which one shipped. The streaming variants lost to V9 paged because their per-token streaming pattern committed far more Metal-wired memory. The raw comparison is below.
Methodology
| Hardware | M1 Ultra Mac Studio, 64 GB unified memory, iogpu.wired_limit_mb=57344 |
|---|---|
| Model versions | Plus = Qwen3.5-397B-A17B MLX 4-bit; Vision = Qwen3.6-35B-A3B MLX 4-bit |
| Inference stack | mlx-lm 0.31.3, MLX 0.18.x |
| Aggregation | 5-run mean per configuration, fresh process per run (no warm-cache bias) |
| Soak runs | 50-turn multi-turn workloads on Plus; 20-turn on Vision |
| RAM measurement | OS-level RSS via ps -o rss=, sampled per token |
| Date range | 2026-05-07 (V10 results), 2026-05-09 (V11 soak verdict), 2026-05-19 (V11 retired, v1.11.71), v1.11.73 (V10 retired) |
Plus 397B-A17B — engine comparison
| Engine | tok/s | RSS peak | Wired Memory peak | Status |
|---|---|---|---|---|
| V9 paged K=20 | 2.1 | 11 GB | 3.6 GB | current — shipping default |
| V10 streaming K=4 | 2.9 | 13 GB | 57 GB | retired v1.11.73 |
| V11 hybrid LRU | 3.0 | 13 GB | 58 GB | retired v1.11.71 |
Look at the Wired Memory column. That's the one that mattered. V10 and V11 streamed experts per token, which committed 57–58 GB of Metal command-buffer memory during prefill, enough to take down a 32 GB Mac on the very first generation. The payoff was a ~30–40% tok/s bump over V9. V9 paged holds 3.6 GB Wired at 2.1 tok/s, 15× less. This tier exists to fit a 209 GB model on a small Mac, so footprint beats speed and V9 ships.
Vision 35B-A3B — engine comparison
| Engine | tok/s | RSS peak | Status |
|---|---|---|---|
| V9 paged | 16.31 | ~18 GB | current default (24 GB+ Macs) |
| V10 streaming K=4 | 6.57 | ~3.5 GB | opt-in (lower RAM, slower) |
| V11 hybrid LRU | 10.92 | 1.60 GB | retired v1.11.71 |
Vision is a little different. There's still a V10 opt-in for people on tight RAM, lower footprint at the cost of speed (~6.6 tok/s), but the default is V9 at 16.31 tok/s on 24 GB+ Macs. That old 16 GB MacBook Air result leaned on V11's LRU specifically. With V11 gone, the practical floor for Vision is 24 GB.
Soak verdict — what passed and what didn't
Here's what the 2026-05-09 soak verdict on V11 turned up over 9 cycles. The quality numbers were fine. The memory number wasn't.
- 135/135 turns coherent. No output collapse, no repetition loops.
- 0 crashes, 0 segfaults across both Plus N=4 (100 turns) and Vision N=2 (35 turns).
- Engine-swap V11→V9→V11 round-trip held up: post-swap tok/s 1.51 vs pre-swap 1.36, no regression.
- Idle-resume across a 5-minute sleep came back clean. No orphan threads, no leaked tensors.
- But memory pressure on 50-turn Plus hit 14.34 GB peak vs smoke 7.34 GB. That's the regression that got it retired.
How to reproduce
- Install Outlier. You'll want Pro tier, or Founders for Plus 397B: outlier.host
- Open Settings → Engine Mode. Plus runs on Capacity (V9 paged), and any old V10/V11 preference auto-migrates to Capacity for you.
- Run a multi-prompt set and grab tok/s from the response footer.
- While it's generating, sample OS RSS with
ps -o pid,rss,command -p $(pgrep -f outlier-cli).
Want to reproduce V10/V11 themselves? You can't from a current DMG, since neither ships anymore. The source lives in repo history before commit 525298f (v1.11.71) and the v1.11.73 V10 removal.
Frequently asked questions
What engine does Outlier use for Plus now?
V9 paged inference (K=20 hot experts per layer plus an 8 GB LRU cache). It runs Plus 397B at about 2.1 tok/s with ~11 GB RSS and 3.6 GB Wired Memory. The V10 and V11 streaming engines were retired.
Why were the streaming engines retired?
V10 and V11 per-token expert streaming committed 57–58 GB of Metal-wired memory during prefill — enough to crash a 32 GB Mac — for only a 30–40% tok/s gain. V9 paged uses 3.6 GB Wired (15× less), so on a footprint-first tier it wins.
How were the numbers measured?
5-run mean per configuration, fresh process per run, on an M1 Ultra with OS-level RSS sampled per token.
Try Outlier free
Free Nano + Lite — local, private, no account. Pro $20/mo or $149/yr adds everything (Plus 397B, Marathon mode, Computer use, Deep Research v3, long context to 128K). Lifetime Pro from $99 (Founding 200, first 200 seats) or $200 (Founders 500). Apple Silicon only.
Download for Mac