Outlier  ›  learn

How to run AI on Mac without a discrete GPU

Quick answer

Yes — Apple Silicon Macs have GPU cores built directly into the chip, sharing unified memory with the CPU. You don't need a separate graphics card. Even an M4 MacBook Air can run a 7B model at 20+ tokens per second.

If you've looked into running AI locally on a PC, you've probably run into the GPU wall: most guides assume you have an NVIDIA card with dedicated VRAM. The Mac story is different, and it comes down to how Apple Silicon is physically built.

Why AI usually needs a GPU

On a typical Windows or Linux desktop, the CPU and GPU are separate chips connected by a PCIe bus. The GPU has its own pool of dedicated video RAM (VRAM) — 8 GB, 16 GB, 24 GB depending on the card. LLM inference is essentially a huge sequence of matrix multiplications, and GPUs are built to execute those in parallel across thousands of cores.

The catch: a model has to fit inside that dedicated VRAM. An RTX 4090 has 24 GB. A 70B-parameter model in 4-bit quantization needs roughly 35 GB. It won't load. You're either paying for multiple high-end cards or you're stuck with smaller models.

The other catch is the software stack. GPU-accelerated inference on PC relies almost entirely on NVIDIA's CUDA platform. AMD GPUs have ROCm, but driver and library coverage is narrower. Intel integrated graphics on Windows — the kind most laptops ship with — offer very limited acceleration for LLM workloads. Running a 7B+ model on integrated Intel or AMD graphics on a Windows laptop is possible but slow enough to be frustrating in practice.

Why Mac is different

Apple Silicon uses a unified memory architecture: the CPU, GPU, Neural Engine, and all the other processors share a single pool of fast on-package memory. There is no separate VRAM, no PCIe bus, no memory copy from system RAM to GPU RAM. Everything is the same memory.

This changes the math for local AI entirely. When a model loads on Apple Silicon, it lands in unified memory where both the CPU and GPU can read it directly. The GPU cores in the chip perform the matrix multiplications that drive inference — the same work a discrete GPU does on a PC — but they're operating on memory that's already shared. No transfer overhead, no 24 GB ceiling imposed by a separate card.

What the M-series chip actually contains

Apple Silicon isn't just a CPU with a small integrated GPU bolted on. Every M-series chip packs several compute engines onto one die:

The GPU cores are real GPU cores. They're not a watered-down integrated graphics chip; they're the same tile-based deferred renderer Apple uses for gaming and creative workloads. For LLM inference, frameworks like MLX and llama.cpp's Metal backend route matrix operations directly to these cores.

What this means for local AI performance

The practical result is that a Mac without any discrete GPU can run local AI at speeds that would have required a dedicated workstation GPU a couple of years ago.

Measured on real hardware with Outlier:

Memory bandwidth is the real limiting factor for LLM inference — more so than raw GPU core count. The GPU has to read model weights on every forward pass, and how fast it can do that determines throughput. The M2 Ultra delivers 800 GB/s of memory bandwidth. The M4 Max delivers 546 GB/s. An RTX 4090 hits 1,008 GB/s, but only across 24 GB of VRAM. Once your model exceeds that 24 GB, the entire setup breaks down; on Apple Silicon, larger models just use more of the shared pool.

Spec M4 MacBook Air RTX 4090 gaming PC
GPU cores 10-core GPU (on-chip) 16,384 CUDA cores (discrete)
Memory pool 16–32 GB unified memory 24 GB VRAM (system RAM separate)
Memory bandwidth 120 GB/s (16 GB) / 273 GB/s (32 GB) 1,008 GB/s
Max model size Up to available RAM (7B–13B on 16 GB) Capped at 24 GB VRAM
Approximate price $1,099 (complete laptop) ~$1,600 (GPU only)
Discrete GPU required No Yes

The catch: you still need enough RAM

Unified memory solves the VRAM ceiling problem, but it doesn't eliminate the memory requirement. A model has to fit in your Mac's RAM. Here's a rough guide by RAM tier:

If you're buying a Mac for local AI, 16 GB is the practical floor for everyday use. If budget allows, 32 GB gives you substantially more model range without needing any special software tricks.

Getting started on Mac without a GPU

The minimum you need: any M1 or newer Mac, macOS 13 Ventura or later, and model weights that fit in your available RAM. No discrete GPU, no CUDA, no NVIDIA driver setup.

From there, you have a few paths:

All three use the Metal GPU backend on Apple Silicon, meaning they route matrix operations to your chip's GPU cores automatically. You don't configure anything; the Mac just works.

On the numbers: Speed figures in this article (32 tok/s on M4 Air, ~20 tok/s on M1 Ultra for Core 27B, 2.1 tok/s for Plus 397B) come from my own measured runs using Outlier on those machines. Your results may vary by model quantization and system load.

Try local AI on your Mac

Outlier runs on any M1 or newer Mac. Download the app, pick a model, and start running inference on your own hardware — no GPU, no cloud account required.

Download Outlier