Guardrail

A guardrail is an automatic rule that blocks a forbidden agent action before it runs, no matter how much freedom the agent has, such as refusing to delete protected files or publish to the live site.

A guardrail is an automatic rule that blocks a forbidden agent action before it runs, no matter how much freedom the agent has, such as refusing to delete protected files or publish to the live site. The check sits between the agent's decision and the actual action, so the agent can plan whatever it likes while the rule still stops the dangerous step. Picture an agent cleaning up an inbox that decides to empty the trash folder permanently. A guardrail set to forbid permanent deletion catches that step and refuses it, even though the agent chose it confidently. The rule holds whether the agent is timid or aggressive, because the rule, not the agent, decides what is off limits.

Builder example

An agent that can take actions will eventually choose a step you never wanted, and a polite instruction in the prompt is easy for it to talk itself past. A guardrail enforces the boundary in code instead of in suggestion. If you build an agent that files and archives email, set a guardrail that forbids permanent deletion and forbids sending to anyone outside an approved list, so a confused run can rearrange messages but cannot erase them or email a stranger.

Common confusion: A guardrail differs from an approval gate. An approval gate pauses a permitted action and waits for a person to say yes; a guardrail blocks a forbidden action outright, with no path to approve it during the run. Gates handle risky-but-allowed steps, while guardrails handle steps that should never happen at all.