Sandbox

A sandbox is an isolated, walled-off environment where an agent can run code or commands without reaching your real files, accounts, or production systems. A mistake inside it stays contained instead of spreading.

A sandbox is an isolated, walled-off environment where an agent can run code or commands without reaching your real files, accounts, or production systems. Inside the wall the agent gets its own copy of whatever it needs and its own throwaway disk, network, and accounts, so anything it touches is a stand-in for the real thing. Say you ask an assistant to clean up a folder of documents by writing and running a script. You point it at a sandbox holding copies of those documents, not the originals, so if the script deletes the wrong file the loss is a copy you can discard. When the run finishes and you have reviewed what it did, you let the reviewed version reach your real folder. The wall is what lets you grant the agent room to try, fail, and retry safely.

Builder example

A sandbox is what lets you give an agent freedom to act before you fully trust it. If you let an assistant run shell commands or edit files directly on your machine, one wrong command can wipe data or send a half-finished message, and you find out after the damage is done. Running the same work inside a sandbox first means a bad command hits a disposable copy, so you inspect the result and only promote it to the real environment once it looks right. Set the agent loose on copies, watch what it produces, and keep the originals out of reach until you approve.

Common confusion: A sandbox isolates where an agent runs; a guardrail blocks a specific forbidden action before it runs. What separates them is scope: the sandbox contains the blast area for any mistake, while a guardrail stops one named action no matter where the agent is. Strong setups use both, since a sandbox limits damage and a guardrail prevents a known bad move outright.