Retrieval-Augmented Generation (RAG)

A technique where the system searches your documents for relevant information at question time, then feeds that information to the model so it can answer based on your actual data.

RAG works in a few steps. First, your documents are split into smaller chunks and converted into searchable embeddings (numerical meaning-representations). When a user asks a question, the system searches those embeddings for the most relevant chunks, then passes them to the model along with the question. The model answers using your documents as its source material. This lets an AI assistant answer questions about your company's policies, product specs, or research notes without ever being retrained on that data.

Builder example

RAG is how most teams give an AI access to private or frequently changing information. A support bot can search your help center articles at question time and answer with current information, even if those articles were updated yesterday. The weakest link is search quality. If the search pulls the wrong document or an outdated version, the model will confidently present wrong information as fact. The search layer deserves as much testing as the model layer.

An employee asks about parental leave. The system retrieves last year's handbook section because it is semantically similar to the current one, and the model gives a confident wrong answer.

Filter by document freshness, prefer the current policy as the source of record, and cite the specific section so the user can verify.

Common confusion: RAG does not guarantee accuracy. The model answers based on whatever the search returns. If the search misses the right document or returns a misleading near-match, the answer will be wrong even though the process looks like it is working.