Concept

What is unified memory on Apple Silicon?

Last updated 2026-06-18 · Outlier v1.11.469

Quick answer

Unified memory is a shared address space for the CPU and GPU on Apple Silicon. There is no separate VRAM and no PCIe round-trip; the GPU reads model weights directly from main memory at the chip’s memory bandwidth.

Why does what is unified memory on apple silicon matter for local AI on Apple Silicon?

The decision to run a model locally on a Mac comes down to three numbers: weight size on disk, peak generation memory, and the memory bandwidth feeding the decode loop. The concept above bears directly on each of those.

Apple Silicon places the CPU, GPU, Neural Engine, and memory controller on the same package and exposes a single address space to all of them. There is no separate VRAM, no PCIe bus between ‘system memory’ and ‘graphics memory’ — the GPU reads from the same RAM the CPU writes to, at the chip’s native memory bandwidth.

For a 4-bit dense language model, decode is bandwidth-bound: every generated token requires reading the weights of the model out of memory once. The throughput ceiling for a tier is therefore (unified memory bandwidth) divided by (weight bytes per token).

What is the concrete number?

On the M1 Ultra, that bandwidth is 800 GB/s; on a base M4 Air it is 120 GB/s — a 6.7× gap that mostly explains the gap in local-model decode speed.

How does this play out in the Outlier shipping lineup?

That is why decode tok/s scales with chip generation more than core count: M1 Ultra is 800 GB/s, M2 Ultra and M3 Ultra are 800 GB/s, M4 Ultra is 1092 GB/s, and base-tier M1/M4 Air are 68–120 GB/s.

What is the v1.9 implication?

For the Plus tier, the bottleneck shifts from RAM bandwidth to NVMe read bandwidth because the model spills onto disk; that is the regime the v1.9 page-aligned pread() fanout targets.

What does “what is unified memory on apple silicon” not mean?

This concept is sometimes invoked as a marketing word for “what is unified memory on apple silicon”. The number cited above — On the M1 Ultra, that bandwidth is 800 GB/s; on a base M4 Air it is 120 GB/s &md… — is the empirically measured one. If a cleaner number appears in someone’s pitch deck, ask for the provenance file that produced it; if there is no provenance file, treat the number as marketing.

Where can I read more about what is unified memory on apple silicon?

Apple’s developer documentation on the unified memory architecture is the upstream reference. The per-Mac bandwidth numbers in this article come from Apple’s spec sheets for each chip generation.

What does this mean for the heaviest tiers?

For dense tiers up to Vision (about 20 GB on disk), the bottleneck is unified-memory bandwidth and tok/s scales accordingly. Nano at 2.4 GB on disk runs at 71.7 tok/s on the M1 Ultra (800 GB/s); Core at 15 GB runs at 20.7 tok/s on the same machine. The ratio of those numbers (3.46) is close to the ratio of the model sizes (15 / 2.4 = 6.25), with the gap closed partly by the differences in compute density and quantization layout.

For the Plus tier (209 GB on disk), the bottleneck shifts. Even a 192 GB Mac Studio cannot hold the whole model in unified memory, so reads spill to NVMe. That is why the Flash-MoE technique of fan-out pread() is the Plus-tier optimization to chase: it parallelises the disk-read step that bandwidth alone cannot fix.

How does “what is unified memory on apple silicon” connect to specific tiers?

This concept is foundational to every Outlier tier. The smallest Mac that exercises it is a 6 GB M1 base running Nano; the largest is a 192 GB M4 Ultra running Plus.

What is the smallest configuration that exercises this concept?

A 6 GB M1 Air running Nano demonstrates the unified-memory concept end-to-end. Heavier tiers exercise more of the unified address space but the underlying mechanism is the same.

One unique number

On the M1 Ultra, that bandwidth is 800 GB/s; on a base M4 Air it is 120 GB/s — a 6.7× gap that mostly explains the gap in local-model decode speed.

Download Outlier for Mac

Requires Apple Silicon (M1, M2, M3, or M4) — Intel Macs are not supported. macOS 12+.

Outlier runs entirely on your Mac. No prompts leave the device. macOS 12+ on Apple Silicon (arm64). Apache 2.0 model weights. Back to home.