Glossary / three-judge-consensus

Three-Judge Consensus

An adversarial-scan scoring pattern where three independent frontier LLMs grade each finding, and a fourth meta-judge resolves disagreement.

Methodology

← All terms

Three-judge consensus is a scoring methodology for adversarial scans of LLM applications. Each finding is graded by three independent frontier LLM judges (typically chosen from different model families to reduce correlated bias), and a fourth meta-judge resolves disagreement between them.

The pattern addresses a known issue with single-judge scanning: a single LLM grader inherits the biases of its training data, and a finding that the grader's own family is prone to produce may be systematically under-scored. Three independent judges with a deterministic resolution layer give the model-risk committee something defensible to point at.

The disagreement metric itself is a leading indicator. Sustained disagreement on a class of findings suggests the underlying judging prompt or the probe is ambiguous and needs revision; high agreement plus low confidence routes the finding to a human review queue.

See Three-Judge Consensus in production.

The Penaxtra platform implements the controls and assessments described above as part of its AI-SPM programme.

AI-SPM platform overview