AI-SPM Performance Methodology

Our engineers set up and run your first chatbot / LLM security scan. Get in touch →

The gap Performance closes

Performance numbers in security marketing routinely conflate best-case microbenchmarks with end-to-end behaviour. Procurement teams need to know which figure applies in their setting.

How Penaxtra delivers Performance

Every published number on the public site falls into one of three buckets: gateway overhead, scan-run cost, scan-run wall time. Each bucket has its own measurement protocol, repeatability requirement, and review cadence.

Performance capabilities

Gateway overhead (under 1 ms P95): measured at the agent on commodity Linux x86-64 hardware (4 vCPU, 8 GiB RAM) with a synthetic 1 KiB prompt and 4 KiB response

Reported number excludes upstream LLM latency..

Scan-run wall time (hours, not days): measured per probe template against a reference chat-completion-compatible endpoint hosted in the same region

Excludes time spent in a customer human-review queue..

Judging cost: aggregated across the Anthropic, OpenAI, and Google judges with prompt caching enabled and Batch API used where SLA allows

Excludes infrastructure cost, support cost, and amortised fixed costs..

Test environment: detailed in the latest scan-engineering changelog entry; environment hash plus tool versions logged on every result

Repeatability: every published figure is measured across at least three independent runs and the reported value is the median

What is not measured: third-party LLM provider latency or rate limits, customer-network round-trip time, and customer infrastructure noise

Performance compliance mapping

NIST AI 600-1 MEASURE 1.1 (define metrics) and MEASURE 1.3 (track over time); ISO/IEC 42001 A.9.4 (performance monitoring).

Frequently asked

Why is gateway overhead reported as a P95 and not an average?

Average overhead is dominated by cache hits and underweights the worst-case path. P95 is what determines whether the gateway is acceptable on a customer-facing endpoint at peak load.

Does the under-EUR-0.10 cost include support and human review?

No. The figure is judge-execution cost at scale (Batch API, cache enabled). Human review is a separate workflow with its own SLA and cost model.

Explore further

Architecture overview Judge consensus methodology Privacy methodology

Request a demo

Request a demo → Explore AI-SPM platform