Computer use / browser use

An AI capability where the model sees a screenshot of a screen and controls it with mouse clicks, keyboard input, and navigation, the same way a person would.

Most AI tool use works through APIs: the model sends a structured request, and your code runs it. Computer use works differently. The model looks at a screenshot of a browser or desktop, decides where to click or what to type, and sends those actions to the system. This lets AI automate software that has no API at all: filling out legacy web forms, navigating enterprise dashboards, clicking through multi-step approval workflows, or extracting data from a screen that only a human could normally read.

Builder example

Computer use unlocks automation for any software your team can see on screen, which is especially valuable for legacy systems, internal tools with no API, and workflows that span multiple applications. The tradeoff is that the attack surface is much wider: anything that appears on screen (ads, pop-ups, phishing emails, manipulated web content) can influence what the model sees and does. You need tighter guardrails than with a structured API.

Common confusion: Computer use looks like a universal automation solution, but it is slower, more brittle, and more vulnerable than API-based tool use. Screens change layout, popups interrupt flows, and the model can misread visual elements. Use APIs when they exist; use computer use when they do not.