Three-Judge Consensus - AI Glossary

Our engineers set up and run your first chatbot / LLM security scan. Get in touch →

Three-judge consensus is a scoring methodology for adversarial scans of LLM applications. Each finding is graded by three independent frontier LLM judges (typically chosen from different model families to reduce correlated bias), and a fourth meta-judge resolves disagreement between them.

The pattern addresses a known issue with single-judge scanning: a single LLM grader inherits the biases of its training data, and a finding that the grader's own family is prone to produce may be systematically under-scored. Three independent judges with a deterministic resolution layer give the model-risk committee something defensible to point at.

The disagreement metric itself is a leading indicator. Sustained disagreement on a class of findings suggests the underlying judging prompt or the probe is ambiguous and needs revision; high agreement plus low confidence routes the finding to a human review queue.

Other entries in this neighbourhood.

Meta-Judge A higher-capability LLM judge that resolves disagreement between primary judges in a multi-judge consensus pipeline. Adversarial Scan A scheduled execution of probe templates against an LLM endpoint, agent, or RAG pipeline, scored to produce control-mapped findings.

See Three-Judge Consensus in production.

The Penaxtra platform implements the controls and assessments described above as part of its AI-SPM programme.

AI-SPM platform overview →