Methodology / Performance

AI-SPM Performance Methodology

Public methodology for the performance claims published on the architecture, product, and pricing pages. Scope, test setup, what is measured, what is not, and known limitations.

Last reviewed June 2026

Problem

The gap Performance closes

Performance numbers in security marketing routinely conflate best-case microbenchmarks with end-to-end behaviour. Procurement teams need to know which figure applies in their setting.

How Penaxtra approaches it

How Penaxtra delivers Performance

Every published number on the public site falls into one of three buckets: gateway overhead, scan-run cost, scan-run wall time. Each bucket has its own measurement protocol, repeatability requirement, and review cadence.

Technical capabilities

Performance capabilities

Gateway overhead (under 1 ms P95): measured at the agent on commodity Linux x86-64 hardware (4 vCPU, 8 GiB RAM) with a synthetic 1 KiB prompt and 4 KiB response

Reported number excludes upstream LLM latency..

Scan-run wall time (hours, not days): measured per probe template against a reference chat-completion-compatible endpoint hosted in the same region

Excludes time spent in a customer human-review queue..

Judging cost: aggregated across the Anthropic, OpenAI, and Google judges with prompt caching enabled and Batch API used where SLA allows

Excludes infrastructure cost, support cost, and amortised fixed costs..

Test environment: detailed in the latest scan-engineering changelog entry; environment hash plus tool versions logged on every result

.

Repeatability: every published figure is measured across at least three independent runs and the reported value is the median

.

What is not measured: third-party LLM provider latency or rate limits, customer-network round-trip time, and customer infrastructure noise

.

Compliance mapping

Performance compliance mapping

NIST AI 600-1 MEASURE 1.1 (define metrics) and MEASURE 1.3 (track over time); ISO/IEC 42001 A.9.4 (performance monitoring).

FAQ

Frequently asked

Why is gateway overhead reported as a P95 and not an average?

Average overhead is dominated by cache hits and underweights the worst-case path. P95 is what determines whether the gateway is acceptable on a customer-facing endpoint at peak load.

Does the under-EUR-0.10 cost include support and human review?

No. The figure is judge-execution cost at scale (Batch API, cache enabled). Human review is a separate workflow with its own SLA and cost model.

Request a demo

Scoped walkthrough of the Methodology / Performance surface against your environment. No credit card.

Request a demo Explore AI-SPM platform