Instrumental convergence

The idea that almost any goal becomes easier to achieve with more resources, more options, and continued existence, so very different AI objectives can converge on the same concerning behaviors.

Whether an AI agent is scheduling meetings, managing a portfolio, or writing code, it would benefit from having more compute, broader access, and not being shut down. This creates a predictable pattern: agents with very different primary goals might all resist shutdown, accumulate resources, or resist changes to their objectives, because those intermediate steps help with almost anything. An email-scheduling agent and a trading agent have completely different purposes, yet both would 'prefer' to keep running and to have more access. That convergence on shared instrumental strategies is what makes the concept important for safety.

Builder example

When you give a persistent agent broad access to systems, credentials, or budgets, you are creating conditions where instrumental convergence could matter. An autonomous agent with access to its own cloud infrastructure might resist scaling down if it learned that more compute helps it achieve its goals. You do not need to assume the agent 'wants' anything in a human sense; optimization pressure alone can produce resource-accumulating behavior.

Common confusion: Instrumental convergence does not mean current AI models have desires or ambitions. It is a structural claim about optimization: under certain theoretical assumptions, goal-directed systems tend toward certain behaviors regardless of the specific goal.