How to run Qwen locally on a Mac
- Yes — Qwen runs locally on an Apple Silicon Mac. Use a quantized GGUF in Ollama or LM Studio, or a Mac-native app like Outlier (whose own models are Qwen-based).
- Pick a size that fits your RAM: a 7-8B model runs on 8 GB, a 32B wants ~32 GB, a 70B wants 64 GB+.
- Quantization (GGUF, usually Q4) shrinks the weights so a big model fits in your unified memory.
- Everything runs offline once downloaded: no account, no cloud, Wi-Fi can be off.
Qwen, Alibaba's open-weight model family, is one of the better open options for coding and multilingual work — and it runs entirely on your Mac. No data center, no subscription. Download a quantized copy of the weights, point a local runner at it, and you have a private assistant that works with the Wi-Fi off. The only real decision is which size your RAM can hold.
Why Qwen, and why local
Qwen ships as open weights, so anyone can download the files and run them. The family is known for solid coding and strong multilingual coverage, which makes it a popular base for a local code assistant. Running it on your own machine means prompts never leave the laptop, there's no monthly meter, and the model can't be retired out from under you, the way OpenAI pulled GPT-4 out of ChatGPT in April 2025. A file on your disk doesn't get deprecated on someone else's schedule.
What you need first
An Apple Silicon Mac (M1 or newer) carries the load. The GPU and unified memory do the work, so the model and its context share one RAM pool. That makes sizing simple: the quantized model plus a working context window has to fit in your RAM, with headroom for macOS and your other apps. Quantization is what makes it fit. A raw 32B model is huge; a 4-bit GGUF quant is a fraction of the size with little quality loss for everyday work.
How much RAM for which Qwen size
Use this as a sizing guide for quantized (roughly Q4) GGUF weights. Bigger is smarter and slower; pick the largest row your Mac can hold with room to spare.
| Qwen size | Quantized file (~Q4) | Mac RAM | Good for |
|---|---|---|---|
| 1.5-3B | ~1-2 GB | 8 GB | Quick chat, autocomplete, drafts |
| 7-8B | ~4-5 GB | 8 GB (16 GB comfortable) | Everyday assistant, light coding |
| 14B | ~8-9 GB | 16 GB | Stronger reasoning and code |
| 32B | ~18-20 GB | 32 GB | Serious coding, long context |
| 70B-class | ~40+ GB | 64 GB+ | Top open-model quality, slowest |
Sizes are approximate and vary by the exact quant you choose. If a model won't load or your Mac starts swapping, drop to a smaller size or a heavier quant. There's more detail in our RAM guide for local AI.
The general path: Ollama or LM Studio
Two tools cover almost everyone, both free, both running GGUF weights through llama.cpp under the hood.
- Ollama (one command). Install Ollama, then in Terminal run
ollama run qwen2.5-coder:7b(swap in the size and variant you want). It pulls the quantized model and drops you into a chat prompt, and exposes a local API for editors and scripts. The developer favorite. - LM Studio (no terminal). Install LM Studio, open the model browser, search "Qwen," and pick a quant the app flags as fitting your RAM. Download, then chat in the built-in window. It runs a local server too. Best if you'd rather skip the command line.
Either way: pick the variant (a code-tuned Qwen for programming), pick a size from the table, download once, and you're offline-capable from then on. Other open runners like Jan (open-source, MIT) and GPT4All follow the same GGUF pattern if you want to shop around, and you can walk the full install step by step.
The Mac-native shortcut: Outlier
For the batteries-included version, Outlier is a Mac-native app whose own models are Qwen-based, so the same daily coding and writing work is covered without assembling a stack. One signed download, no account, no terminal, no Docker, and it works with Wi-Fi off. The difference from the runners above is the engine. GGUF tools are RAM-bound, meaning the model has to fit in memory; Outlier's patent-pending paged inference streams experts on demand, so it runs models bigger than your RAM. Its 397B-class Qwen-based Plus tier runs at about 11 GB peak RSS on a 64 GB Mac. Free Nano and Lite run with no account; Pro and lifetime seats add the larger tiers, the coding agent, and deep research, and you can import your own MLX model. The 397B-on-a-Mac breakdown shows how the paging works.
Frequently asked questions
Is Qwen good for coding?
Yes. Qwen is open-weight and one of the stronger open families at coding and multilingual work, which is why it's a common pick for a local coding assistant. Outlier's own models are built on the Qwen family; on a 54-prompt comparison Outlier's Core 27B matched Claude Opus on 98.9% of rubric checks. For local coding, run a code-tuned Qwen variant in Ollama or LM Studio sized to your RAM, or use Outlier's Code 27B.
How much RAM do you need to run Qwen on a Mac?
It depends on the size you pick. As a rough guide for quantized GGUF: a 7-8B Qwen runs on 8 GB (16 GB comfortable), a 14B wants 16 GB, a 32B wants about 32 GB, and 70B-class models want 64 GB or more. The model plus its context has to fit in unified memory. Outlier is the exception, since its paged inference engine runs a 397B-class Qwen-based model at about 11 GB peak RSS on a 64 GB Mac.
What's the easiest way to run Qwen on a Mac?
For a no-terminal install, LM Studio gives you a model browser and chat window: search Qwen, pick a quant that fits your RAM, download, chat. Ollama is the developer favorite if you're comfortable with one command. Outlier is the batteries-included Mac-native route: one signed download, no terminal or Docker, and its own models are Qwen-based, including a 397B-class tier that runs on a 64 GB Mac.
Try Outlier free
Free Nano + Lite — local, private, no account. Pro $20/mo or $149/yr adds everything (all 7 model tiers incl. Plus 397B). Lifetime Pro from $99 (Founding 200, first 200 seats) or $200 (Founders 500). Apple Silicon only.
Download for Mac