Concept

What is K_override on the Plus tier?

Last updated 2026-06-18 · Outlier v1.11.469

Quick answer

K_override sets how many experts the paged engine keeps resident in RAM at any moment. Higher K reduces cache misses but inflates RAM and can regress decode speed if the model is compute-bound rather than I/O-bound.

Why does what is k_override on the plus tier matter for local AI on Apple Silicon?

The decision to run a model locally on a Mac comes down to three numbers: weight size on disk, peak generation memory, and the memory bandwidth feeding the decode loop. The concept above bears directly on each of those.

K_override sets the resident expert pool size on Outlier’s Plus tier. The model’s declared num_experts_per_tok is 10; K_override is how many experts the engine keeps in RAM at any moment. Higher K reduces cache misses on cold expert selections but inflates RAM and can regress decode speed if the model is compute-bound at this size.

The 2026-05-01 sweep tested K=4, K=20, K=32, K=48 with a 5-prompt aggregate. K=4 fails coherence (3/5), K=32 regresses 2.5% with 60% more RAM, K=48 adds 1.3% (within noise) at 33.78 GB peak. K=20 gives 1.59 tok/s with 14.04 GB peak gen and is locked.

What is the concrete number?

On Outlier’s Plus tier, K=20 produces 1.59 tok/s with a 14.04 GB peak generation footprint, locked in by a 4-point sweep on 2026-05-01.

How does this play out in the Outlier shipping lineup?

K=4 is a hard model constraint, not a tunable: routing top-10 cannot fit in a resident pool of 4 without first-token control-token escapes.

What is the v1.9 implication?

Source: sprints/v18_plus_ship/artifacts/K_SWEEP_RESULTS.md (2026-05-01, 5-prompt agg, isolated subprocess per K, mx.metal.get_peak_memory).

What does “what is k_override on the plus tier” not mean?

This concept is sometimes invoked as a marketing word for “what is k_override on the plus tier”. The number cited above — On Outlier’s Plus tier, K=20 produces 1.59 tok/s with a 14.04 GB peak gene… — is the empirically measured one. If a cleaner number appears in someone’s pitch deck, ask for the provenance file that produced it; if there is no provenance file, treat the number as marketing.

Where can I read more about what is k_override on the plus tier?

The K-sweep raw artifacts (k_sweep_K4.json, K20.json, K32.json, K48.json plus matching .log files) live under sprints/v18_plus_ship/artifacts/; the harness is k_sweep.py in the same directory.

What other knobs were tested alongside K?

The 2026-04-30 Plus-tier work also probed cache_gb (4, 8, 12), OUTLIER_MMAP_EXPERTS (0, 1), and lazy_load (True, False). cache_gb=8 dominated 4 (cold-miss penalties) and tied 12 (no cache-hit lift at 8-prompt agg). OUTLIER_MMAP_EXPERTS=1 regressed throughput 8× on macOS and was retired. lazy_load=True was kept because the streaming-loader fix (commit 95c8cc8 on the v17 branch) made it strictly better.

The locked Plus-tier configuration is therefore K_override=20, cache_gb=8.0, OUTLIER_MMAP_EXPERTS=0, lazy_load=True; this is what ships in v1.11.469.

How does “what is k_override on the plus tier” connect to specific tiers?

K_override is Plus-tier-only. The other six tiers either load entirely into unified memory or need not page experts; their inference path does not even consult the K_override knob.

What is the smallest configuration that exercises this concept?

You need at least 32 GB unified memory to load Plus with K_override=20. K_override=4 is testable at lower RAM but fails coherence; the locked production knob is K=20 and that is what the bench machine runs.

One unique number

On Outlier’s Plus tier, K=20 produces 1.59 tok/s with a 14.04 GB peak generation footprint, locked in by a 4-point sweep on 2026-05-01.

Download Outlier for Mac

Requires Apple Silicon (M1, M2, M3, or M4) — Intel Macs are not supported. macOS 12+.

Outlier runs entirely on your Mac. No prompts leave the device. macOS 12+ on Apple Silicon (arm64). Apache 2.0 model weights. Back to home.