Retrieval-Augmented Generation (RAG) poisoning

An attack where someone plants malicious or misleading content in the knowledge sources a RAG system retrieves, causing the model to trust and act on poisoned information.

RAG systems pull in external documents to give the model relevant context before generating a response. An attacker exploits this by planting poisoned content in those sources: a public wiki edit with hidden instructions, a support document with embedded prompt injection, or a web page optimized to rank highly for the queries the RAG system makes. When the model retrieves this content, it treats the malicious text as trustworthy context. It follows the embedded instructions or repeats the false information as fact.

Builder example

Teams often adopt RAG because grounding the model in their own documents feels more reliable. The overlooked risk is that the retrieval corpus itself becomes an attack surface. If any ingested sources are web-crawled, user-contributed, or editable by external parties, an attacker can influence the model's behavior by modifying those sources.

Common confusion: RAG poisoning is a specialized form of indirect prompt injection. The difference is the attack vector: general indirect injection hides malicious instructions in any content the model processes, while RAG poisoning specifically targets the retrieval knowledge base the model trusts as its factual ground truth.