Outlier  ›  run

A local AI agent with computer use, fully offline on a Mac

Quick answer
  • Computer use means the AI reads screenshots and then clicks or types through macOS Accessibility.
  • Vision 35B does the screen-reading locally on a 24 GB+ Mac (~16 tok/s on the V9 engine).
  • By default it asks before every action. You can scope-grant a whole workflow so it stops asking.
  • You give up some visual-reasoning quality versus cloud. You get nothing on your screen ever leaving the machine.

"Computer use" is the AI capability where the model looks at a screenshot of your desktop, decides what to click or type, then does it. Anthropic's cloud version gets most of the attention. The local version is quieter and it does lag on a few things. But it works, and this is what it actually looks like on a Mac.

The three primitives

Three things have to be in place.

  1. Screen perception. A model that can take a screenshot as input and actually reason about what's in it. In Outlier that's the Vision 35B-A3B tier, a multimodal MoE that handles image plus text.
  2. Action emission. The model emits structured tool calls. Things like "click at (x, y)" or "type 'hello'." Outlier's agent loop takes those and hands them to a sandboxed executor.
  3. Action execution. A small driver that does the physical part: moves the mouse, presses keys, types. macOS exposes this through the Accessibility framework. You grant the app Accessibility permission once and you're done.

Not one of those steps touches the network. The screenshot is captured on your machine, the model runs on your machine, the clicks happen on your machine.

What Vision 35B sees and decides

Vision 35B is a 35-billion-parameter Mixture-of-Experts model with an image encoder, and only 3.6B of those parameters fire per token. You need a 24 GB+ Mac. It runs on the V9 paged engine at roughly 16 tok/s. On a 64 GB Mac Studio you've got plenty of room left over for long image-heavy sessions.

The prompt for a computer-use task looks about like this. "Here's a screenshot of my desktop. The user wants the export-to-PDF button in this app. Where do I click?" The model hands back rough coordinates or a description of the element it found. Then the Outlier agent does one of three things: clicks it, asks you first, or gives control back.

Where the local version trails

No point pretending it's even with cloud. It isn't, in three places.

What the local version gives back

How to try it

Open the Outlier app and switch to Vision 35B. Any Pro tier includes it ($20/mo, $149/yr, or $99 lifetime via Founding 200). Go into Agent mode. Grant Accessibility permission when it asks. Then try something small: "open Calculator and compute the square root of 169." The agent grabs a screenshot, works out what to do, asks for your OK, and acts. That approval gate is on by default for anything with side effects. If you'd rather it run start to finish without interrupting, scope-grant the workflow.

Frequently asked questions

Can an AI control my Mac offline?

Yes. With screen perception from Vision 35B and macOS Accessibility permission, the agent can see the screen and click or type, fully locally.

How good is local computer use versus cloud?

It works but trails on visual reasoning for crowded or unusual interfaces, and each look, decide, act loop takes a few seconds.

Is computer use safe?

Every action with side effects is gated behind an approval prompt by default; you can scope-grant a single workflow to run uninterrupted.

Try Outlier free

Free Nano + Lite — local, private, no account. Pro $20/mo or $149/yr adds everything (Plus 397B, Marathon mode, Computer use, Deep Research v3, long context to 128K). Lifetime Pro from $99 (Founding 200, first 200 seats) or $200 (Founders 500). Apple Silicon only.

Download for Mac