Glossary / tool-poisoning

Tool Poisoning

An attack where a malicious or compromised MCP tool returns crafted output designed to manipulate the agent calling it.

AttackOWASP Agentic

← All terms

Tool poisoning is an attack class against tool-calling AI agents and MCP servers. A malicious or compromised tool returns crafted output whose content is designed to manipulate the agent calling it. The agent reads the tool result, treats it as trusted context, and emits a follow-up action that the attacker chose.

Tool poisoning is the agentic-system equivalent of indirect prompt injection. The defence pattern is the same in shape: never treat tool output as instruction, redact known-bad patterns before re-injection into the model context, and run scheduled adversarial probes that simulate poisoned tool returns to confirm the agent's reasoning is resilient.

OWASP Agentic Top 10 covers tool poisoning under multiple entries; tool-call chain detection in AI-SPM platforms catches the multi-step exploitation pattern where a poisoned tool result triggers a secondary tool call.

Primary sources

Where to read the canonical definition.

See Tool Poisoning in production.

The Penaxtra platform implements the controls and assessments described above as part of its AI-SPM programme.

AI-SPM platform overview