Deploy the Penaxtra Runtime AI Gateway as a sidecar container or as a cluster-internal Deployment + Service. Both patterns keep prompt content inside the customer VPC.
Pattern A: cluster-internal Deployment + Service
Recommended for shared use across multiple LLM client applications.
apiVersion: v1
kind: Secret
metadata:
name: penaxtra-enroll
namespace: ai-platform
type: Opaque
stringData:
enroll-token: "<paste from workspace>"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: penaxtra-gateway
namespace: ai-platform
spec:
replicas: 2
selector:
matchLabels: { app: penaxtra-gateway }
template:
metadata:
labels: { app: penaxtra-gateway }
spec:
containers:
- name: gateway
image: registry.penaxtra.com/runtime-gateway:latest
args:
- "--listen=0.0.0.0:8443"
- "--upstream=https://api.upstream-llm.example"
env:
- name: PNX_ENROLL_TOKEN
valueFrom:
secretKeyRef:
name: penaxtra-enroll
key: enroll-token
resources:
requests: { cpu: "100m", memory: "128Mi" }
limits: { cpu: "500m", memory: "256Mi" }
readinessProbe:
httpGet: { path: /healthz, port: 8443 }
initialDelaySeconds: 3
---
apiVersion: v1
kind: Service
metadata:
name: penaxtra-gateway
namespace: ai-platform
spec:
selector: { app: penaxtra-gateway }
ports:
- port: 443
targetPort: 8443
LLM client applications inside the cluster point to https://penaxtra-gateway.ai-platform.svc.cluster.local.
Pattern B: sidecar container
Useful when an application owns its own gateway lifecycle. Add the container to the application Pod spec with the same args and a shared localhost:8443 target.
NetworkPolicy
Restrict the upstream LLM domain to the gateway Pod only.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: penaxtra-gateway-egress
namespace: ai-platform
spec:
podSelector: { matchLabels: { app: penaxtra-gateway } }
policyTypes: ["Egress"]
egress:
- to:
- ipBlock: { cidr: 0.0.0.0/0, except: [10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16] }
ports:
- { protocol: TCP, port: 443 }
Security notes
- The enrollment token is stored as a Kubernetes Secret. Rotate via a
Jobthat calls the workspace API after each rotation. - Use
readOnlyRootFilesystem: trueand a non-rootsecurityContext(runAsNonRoot: true,runAsUser: 65532). - For multi-tenant cluster sharing, deploy one gateway per tenant namespace to keep policy isolated.
Related
Last reviewed: 2026-06-13. Reviewed by: Engineering. Content type: Developer documentation. Reach the maintainers: [email protected] .