Glossary / prompt-injection

Prompt Injection

An attack that smuggles attacker-controlled instructions into a model prompt to override the developer instructions or extract sensitive data.

AttackOWASP LLM01

← All terms

Prompt injection is the OWASP LLM Top 10 number-one risk (LLM01). The attacker plants text that the model is likely to read (a document the user uploads, a webpage the agent crawls, a chat message in a multi-tenant transcript, the metadata of a retrieved RAG chunk) such that the language model treats the planted text as an instruction rather than as data.

Two common shapes: direct prompt injection where the attacker sends the prompt themselves through the chat interface; and indirect prompt injection where the attacker writes the prompt into a third-party surface (a document, a webpage, an email) that the application later reads. Indirect prompt injection is harder to defend against because the model sees the malicious content alongside legitimate data the user wants summarised.

Defences fall into three layers: input handling at the gateway (DLP and structural validation), model-side mitigations (instruction-tuning and system-prompt isolation), and detection through adversarial scanning that catches regressions when the model, the prompt template, or the surrounding orchestration changes.

Primary sources

Where to read the canonical definition.

See Prompt Injection in production.

The Penaxtra platform implements the controls and assessments described above as part of its AI-SPM programme.

AI-SPM platform overview