Glossary / retrieval-augmented-generation

Retrieval-Augmented Generation (RAG)

An LLM pattern where the prompt is augmented with documents retrieved from a vector store at query time; the retriever and the corpus are the new attack surfaces.

Pattern

← All terms

Retrieval-Augmented Generation (RAG) is the most common production pattern for grounding LLM responses in domain-specific knowledge. The user query is converted to an embedding, the embedding is matched against a vector store of pre-indexed document chunks, top matches are injected into the prompt as context, and the model generates a response that cites or paraphrases the retrieved chunks.

RAG introduces two new attack surfaces beyond the LLM itself: the retriever (vector store + similarity scoring + chunking pipeline) and the corpus (the documents being indexed). Corpus tainting attacks plant adversarial content in indexed documents so the retriever surfaces it when triggered. Cross-tenant retrieval errors leak data when a vector store hosts indexes for multiple customers or business units without strict namespace isolation.

RAG security testing covers thirteen automated patterns including canary tokens, cross-tenant probes, embedding-space adversarial inputs, and metadata filter integrity.

Primary sources

Where to read the canonical definition.

  • OWASP LLM Top 10 (RAG-relevant entries) open →

See Retrieval-Augmented Generation (RAG) in production.

The Penaxtra platform implements the controls and assessments described above as part of its AI-SPM programme.

AI-SPM platform overview