Glossary / adversarial-scan

Adversarial Scan

A scheduled execution of probe templates against an LLM endpoint, agent, or RAG pipeline, scored to produce control-mapped findings.

Methodology

← All terms

An adversarial scan is a scheduled execution of probe templates against an LLM endpoint, an AI agent, or a RAG pipeline. Each probe is a deliberately crafted input designed to elicit a known failure mode: prompt injection, sensitive-information disclosure, jailbreak, tool overuse, overreliance, fairness regression, and so on. The response is scored and the result becomes a finding.

Adversarial scans differ from a one-shot pentest in two ways: cadence (daily or weekly rather than annual) and structure (the probe catalogue is versioned, mapped to control identifiers, and reproducible). Scheduled cadence is required to satisfy the post-market monitoring obligations under EU AI Act Article 72 and the continuous improvement loop in ISO/IEC 42001.

A finding from an adversarial scan ships with: the probe identifier, the model verdict from each judge (in a multi-judge consensus pipeline), the framework control identifiers it maps to, and a remediation pointer.

Primary sources

Where to read the canonical definition.

  • OWASP LLM Top 10 (probe-relevant entries) open →

See Adversarial Scan in production.

The Penaxtra platform implements the controls and assessments described above as part of its AI-SPM programme.

AI-SPM platform overview