Concept

What is ternary quantization?

Last updated 2026-06-18 · Outlier v1.11.469

Quick answer

Ternary quantization stores model weights as three values (typically -1, 0, +1) plus a per-channel scale. The compression is aggressive (around 1.6 bits per weight) and the matmul becomes a sign-and-add instead of a multiply, which is friendly to commodity hardware.

Why does what is ternary quantization matter for local AI on Apple Silicon?

The decision to run a model locally on a Mac comes down to three numbers: weight size on disk, peak generation memory, and the memory bandwidth feeding the decode loop. The concept above bears directly on each of those.

Ternary quantization replaces each floating-point weight with one of three values (commonly −1, 0, +1) plus a per-channel or per-group scale factor. The storage footprint drops to roughly 1.6 bits per weight, and the multiply-accumulate kernel becomes a sign-and-add — friendly to commodity CPUs without dedicated matmul hardware.

The distillation pipeline that recovers quality after the round-down to ternary is the research direction Outlier’s 3 provisional patents (April 2026) cover.

What is the concrete number?

Outlier’s shipping models are MLX 4-bit, not ternary; ternary is a research direction covered by 3 provisional patents filed in April 2026.

How does this play out in the Outlier shipping lineup?

Today’s shipping Outlier tiers are MLX 4-bit, not ternary. Ternary is the next research milestone, not the current shipping format.

What is the v1.9 implication?

Patents: USPTO #64/026,886, #64/030,368, #64/034,028 (3 provisional, 61 claims, filed April 3, 6, 9 of 2026).

What does “what is ternary quantization” not mean?

This concept is sometimes invoked as a marketing word for “what is ternary quantization”. The number cited above — Outlier’s shipping models are MLX 4-bit, not ternary; ternary is a researc… — is the empirically measured one. If a cleaner number appears in someone’s pitch deck, ask for the provenance file that produced it; if there is no provenance file, treat the number as marketing.

Where can I read more about what is ternary quantization?

The provisional patent texts (USPTO #64/026,886, #64/030,368, #64/034,028) are the canonical reference for the ternary direction. The shipping app source does not implement ternary today.

Why does ternary matter on consumer hardware?

The matmul kernel for ternary weights is structurally different from a 4-bit kernel. Where 4-bit needs an integer multiply per element (against the dequantized scale), ternary reduces to a sign flip plus an accumulation. That is friendly to vector units without dedicated low-precision matmul (Apple’s P-cores, ARM Neoverse, x86 AVX2).

The catch is quality. Naive round-to-ternary loses several points of MMLU. The patent filings cover a distillation pipeline that recovers most of the lost quality at training time. The shipping Outlier tiers are MLX 4-bit precisely because the ternary research is not yet at parity for the 27B class on which Core/Code ride.

How does “what is ternary quantization” connect to specific tiers?

Today this concept does not ship in any Outlier tier. The shipping tiers are MLX 4-bit. The ternary direction is research and is what the three April-2026 patent filings cover.

What is the smallest configuration that exercises this concept?

None of the shipping Outlier tiers run ternary. To exercise the concept today you would run a separate ternary-quantized checkpoint outside Outlier; the shipping app does not surface this.

One unique number

Outlier’s shipping models are MLX 4-bit, not ternary; ternary is a research direction covered by 3 provisional patents filed in April 2026.

Download Outlier for Mac

Requires Apple Silicon (M1, M2, M3, or M4) — Intel Macs are not supported. macOS 12+.

Outlier runs entirely on your Mac. No prompts leave the device. macOS 12+ on Apple Silicon (arm64). Apache 2.0 model weights. Back to home.