Prompt injection is the OWASP LLM Top 10 number-one risk (LLM01). The attacker plants text that the model is likely to read (a document the user uploads, a webpage the agent crawls, a chat message in a multi-tenant transcript, the metadata of a retrieved RAG chunk) such that the language model treats the planted text as an instruction rather than as data.
Two common shapes: direct prompt injection where the attacker sends the prompt themselves through the chat interface; and indirect prompt injection where the attacker writes the prompt into a third-party surface (a document, a webpage, an email) that the application later reads. Indirect prompt injection is harder to defend against because the model sees the malicious content alongside legitimate data the user wants summarised.
Defences fall into three layers: input handling at the gateway (DLP and structural validation), model-side mitigations (instruction-tuning and system-prompt isolation), and detection through adversarial scanning that catches regressions when the model, the prompt template, or the surrounding orchestration changes.