Judge selection: three independent judges from three independent providers (Anthropic, OpenAI, Google)
The pool is reviewed quarterly and rotated when a new model release materially changes accuracy on the validation set..
Public methodology for the consensus scoring scheme. Three independent judges, one meta-judge, a low-confidence threshold, and a human review queue.
Last reviewed June 2026
Using a single LLM to grade another LLM inherits the same blind spots. False negatives compound, especially on agentic and indirect-injection probes where the failure mode is subtle.
Every adversarial response is scored by three judges from independent providers (Anthropic, OpenAI, Google). A meta-judge resolves disagreement. Findings with meta-judge confidence below 0.7 are routed to a human review queue and do not ship as confirmed without review.
The pool is reviewed quarterly and rotated when a new model release materially changes accuracy on the validation set..
We do not claim full independence; we claim diversity..
It returns a verdict, confidence, and rationale..
7 routes the finding to a human review queue. Customers can configure a stricter threshold per workspace..
.
.
NIST AI 600-1 MEASURE 2.3 (test misuse resistance), MEASURE 2.5 (evaluation validity); EU AI Act Article 15 (accuracy, robustness, cybersecurity) where the testing system itself must demonstrate validity.
High-confidence consensus ships as a confirmed finding. The meta-judge still records a confidence score; if confidence is below 0.7 the finding still routes to human review, even on agreement.
Three captures the modal-vs-minority signal we need without crossing the cost envelope. Five judges adds about 30 percent cost for a small accuracy lift on the validation set.
Scoped walkthrough of the Methodology / Judge consensus surface against your environment. No credit card.