How to Build an AI Asset Inventory That Survives the First Audit

Short version: An AI asset inventory needs eight categories, not the three most teams start with. The auditor will not accept "we mostly use OpenAI"; they will ask for the embedding model behind the RAG retriever and the MCP servers the agent can call. Here is the shape of a real one.

When we started building inventories for design-partner customers in late 2025, almost everyone began the same way: a single sheet listing "the chatbot endpoint" and "the OpenAI account". Six months and a few audit findings later, those sheets had grown to eight columns and a navigation pane. This post is the shape we converged on, with the failure modes that pushed each addition.

The eight categories we ended up with

It turns out you cannot fit an LLM application into one row. The supporting stack matters as much as the model.

LLM endpoints - the chat-completions URLs the application hits. One row per (provider, model, deployment).
AI agents and MCP servers - everything that calls tools on the model's behalf. Each agent needs its tool catalogue alongside.
RAG pipelines - retriever URL, indexed corpus name, refresh cadence. RAG is not a single asset; it is at least three (retriever, corpus, vector store) glued together.
Vector databases - the storage layer behind the RAG retriever. We list the engine, region, and namespace count separately because tenant isolation lives at the namespace boundary.
Embedding models - the model that produced the vectors in the vector database. Swap it and your stored vectors are useless; security people care because embedding inversion attacks recover original document content.
Fine-tuned and self-hosted models - your own weights, the licence under which the parent model was tuned, where the weights physically live.
Prompt gateways and runtime AI gateways - any in-line proxy that sits between the application and the upstream LLM provider. The configuration file (rules, DLP patterns, signed blob version) is part of the asset.
Managed cloud AI services - the foundation-model platform accounts on the major cloud providers, with the IAM role used to access them.

The first time an auditor read this list back to us, they asked a question that became the standard test for completeness: "show me a single row that maps a customer complaint about a hallucinated answer to the specific embedding model, the corpus chunk it retrieved, and the agent tool the chain triggered." If the inventory cannot produce that row, the inventory is not finished yet.

Two failure modes we keep seeing

There are two ways the inventory still ends up wrong even with all eight categories. One is shadow AI. The other is fossilised assets.

Shadow AI is what you find when you compare the inventory to the actual outbound DNS traffic. We have run this exercise with five customers; every time, at least one team has a production endpoint nobody on the platform side has heard of. The pattern is usually: a small team prototypes with a hosted API, the prototype ends up in front of customers, the prototype outgrows the prototype budget and gets a real domain, but the asset inventory was never updated. The fix is not heavier process; the fix is a runtime gateway that reports upstream LLM hosts seen in production traffic and an alert when a new host appears.

Fossilised assets are the opposite problem: rows in the inventory that no longer correspond to anything live. A model gets decommissioned, but its row stays in the sheet for two years. An agent's MCP server moves to a new tool catalogue, but the inventory still describes the old one. We caught this by adding a last_seen_at timestamp to every asset, populated by the same gateway telemetry. Anything that has not been seen in thirty days flips to a retirement-review state.

The script we actually use

Inventories that depend on humans remembering to update them rot. The pattern that worked: derive what you can derive, ask for the rest.

Derivable:

LLM endpoint URLs from gateway telemetry plus webhook delivery logs.
Cloud AI service accounts from read-only IAM scans.
Vector store engines and regions from the same cloud-account scan.
Agent tool catalogues from MCP server introspection.
Embedding model linkage from the RAG application configuration files.

Asked-for-once-then-cached:

Owner (a real person rather than a team alias).
Business purpose in one sentence.
Whether the asset processes special-category data under GDPR or KVKK.

Reviewed quarterly:

Owner still right.
Business purpose still accurate.
Special-category flag still accurate.

Nothing in the asked-for list is hard to answer the first time. The discipline is in the quarterly review; without it, owner aliases drift and the business-purpose column becomes a one-line history of every previous owner's pet project.

What the auditor actually does with this

Two patterns we have observed.

The first is that the auditor opens the inventory, picks one row at random, and asks for the evidence trail behind it: the last scan result, the framework controls the asset is tested against, the policy bundle the runtime gateway is currently loading for it. If the inventory can produce all three in under five minutes, the meeting moves on. If it cannot, the inventory becomes the audit finding rather than the asset.

The second is that the auditor asks for the difference between this quarter's inventory and last quarter's. We were not expecting that one. The answer needs to include both additions (new assets, with the date discovered) and removals (decommissioned assets, with the date retired). Without it, the auditor cannot tell whether the inventory is a snapshot or a process.

Where to start

If you do not have an inventory yet, do not start with a spreadsheet. Start by pointing a runtime gateway at one of your production LLM endpoints and reading what comes back over a week. The list of hosts the gateway sees will be longer than you expect, and the surface area you build the inventory against will be honest from day one.

For the AI-SPM lens on this, see the platform overview. The AI-BOM glossary entry covers the document-level framing.

The eight categories we ended up with

Two failure modes we keep seeing

The script we actually use

What the auditor actually does with this

Where to start

Related reading