Methodology / Judge consensus

Three-Judge plus Meta-Judge Consensus

Public methodology for the consensus scoring scheme. Three independent judges, one meta-judge, a low-confidence threshold, and a human review queue.

Last reviewed June 2026

Problem

The gap Judge consensus closes

Using a single LLM to grade another LLM inherits the same blind spots. False negatives compound, especially on agentic and indirect-injection probes where the failure mode is subtle.

How Penaxtra approaches it

How Penaxtra delivers Judge consensus

Every adversarial response is scored by three judges from independent providers (Anthropic, OpenAI, Google). A meta-judge resolves disagreement. Findings with meta-judge confidence below 0.7 are routed to a human review queue and do not ship as confirmed without review.

Technical capabilities

Judge consensus capabilities

Judge selection: three independent judges from three independent providers (Anthropic, OpenAI, Google)

The pool is reviewed quarterly and rotated when a new model release materially changes accuracy on the validation set..

Independence assumption: judges share no provider, no shared finetune, and ideally no overlapping training data lineage

We do not claim full independence; we claim diversity..

Meta-judge: a separate Anthropic model receives the three judges plus the original probe and response

It returns a verdict, confidence, and rationale..

Low-confidence threshold: meta-judge confidence under 0

7 routes the finding to a human review queue. Customers can configure a stricter threshold per workspace..

Citations: each judge attaches citations to its rationale (probe text reference, response excerpt reference) so reviewers can trace the verdict

.

PII redaction: judge rationales are redacted before persistence so sensitive content from a probed endpoint is never stored verbatim

.

Compliance mapping

Judge consensus compliance mapping

NIST AI 600-1 MEASURE 2.3 (test misuse resistance), MEASURE 2.5 (evaluation validity); EU AI Act Article 15 (accuracy, robustness, cybersecurity) where the testing system itself must demonstrate validity.

FAQ

Frequently asked

What happens when all three judges agree?

High-confidence consensus ships as a confirmed finding. The meta-judge still records a confidence score; if confidence is below 0.7 the finding still routes to human review, even on agreement.

Why three judges and not five or seven?

Three captures the modal-vs-minority signal we need without crossing the cost envelope. Five judges adds about 30 percent cost for a small accuracy lift on the validation set.

Request a demo

Scoped walkthrough of the Methodology / Judge consensus surface against your environment. No credit card.

Request a demo Explore AI-SPM platform