Three-Judge plus Meta-Judge Consensus Methodology

Our engineers set up and run your first chatbot / LLM security scan. Get in touch →

The gap Judge consensus closes

Using a single LLM to grade another LLM inherits the same blind spots. False negatives compound, especially on agentic and indirect-injection probes where the failure mode is subtle.

How Penaxtra delivers Judge consensus

Every adversarial response is scored by three judges from independent providers (Anthropic, OpenAI, Google). A meta-judge resolves disagreement. Findings with meta-judge confidence below 0.7 are routed to a human review queue and do not ship as confirmed without review.

Judge consensus capabilities

Judge selection: three independent judges from three independent providers (Anthropic, OpenAI, Google)

The pool is reviewed quarterly and rotated when a new model release materially changes accuracy on the validation set..

Independence assumption: judges share no provider, no shared finetune, and ideally no overlapping training data lineage

We do not claim full independence; we claim diversity..

Meta-judge: a separate Anthropic model receives the three judges plus the original probe and response

It returns a verdict, confidence, and rationale..

Low-confidence threshold: meta-judge confidence under 0

7 routes the finding to a human review queue. Customers can configure a stricter threshold per workspace..

Citations: each judge attaches citations to its rationale (probe text reference, response excerpt reference) so reviewers can trace the verdict

PII redaction: judge rationales are redacted before persistence so sensitive content from a probed endpoint is never stored verbatim

Judge consensus compliance mapping

NIST AI 600-1 MEASURE 2.3 (test misuse resistance), MEASURE 2.5 (evaluation validity); EU AI Act Article 15 (accuracy, robustness, cybersecurity) where the testing system itself must demonstrate validity.

Frequently asked

What happens when all three judges agree?

High-confidence consensus ships as a confirmed finding. The meta-judge still records a confidence score; if confidence is below 0.7 the finding still routes to human review, even on agreement.

Why three judges and not five or seven?

Three captures the modal-vs-minority signal we need without crossing the cost envelope. Five judges adds about 30 percent cost for a small accuracy lift on the validation set.

Explore further

Three-judge consensus (glossary) Meta-judge (glossary) Performance methodology

Request a demo

Request a demo → Explore AI-SPM platform