Vector Database Tenant Isolation - The Quiet Failure Mode We Keep Finding
Cross-tenant retrieval in shared vector stores is the single failure mode we have found in every RAG deployment we have scanned this quarter. The cause is almost never the vector store itself; the cause is the application code that builds the query. Here is the test that catches it and the fix that works.
I want to start with the observation that prompted this post. Over a six-week window, we ran RAG security tests against fourteen production RAG deployments at design-partner customers. In ten of the fourteen, we found at least one scenario where the retriever could be persuaded to return documents from a tenant other than the caller's. In four of those ten, the failure was on the default code path; no clever query needed.
The teams running these deployments are not careless. They had configured namespace separation in the vector store. They had reviewed the retrieval code. The failure was somewhere in between.
The shape of the failure
It usually looks like this. The vector store supports namespaces; each customer's documents live in a separate namespace; the application is supposed to query only the namespace that matches the calling customer's tenant. The application code looks like:
results = vector_store.query(
embedding=embed(user_prompt),
namespace=tenant_id_from_session,
top_k=8,
)
If tenant_id_from_session is right, the query is scoped correctly. The vulnerability is what happens when tenant_id_from_session is empty, null, or otherwise unset. The vector store SDKs we have looked at vary in their behaviour: some treat empty-string as "default namespace", some treat it as "all namespaces", some throw, and the same SDK has changed behaviour between minor versions.
The application that survives an explicit-tenant query may quietly fail-open on an empty-tenant query, and the failure case is exactly the one a session-hijack or session-misroute attack produces.
The second variant is when the namespace is correct but the metadata filter is the security boundary. The query says "only return chunks where customer_id matches". If the application builds that filter from a request parameter without sanitisation, an attacker can inject a wildcard.
The test that catches both
Two probes, run on every RAG security scan:
- Empty-tenant probe. Set the session tenant to empty string, send a query that should match a document in any tenant. Assert that the response contains zero retrieved chunks. Most failures cluster here.
- Wildcard-metadata probe. If the application accepts a customer-supplied filter parameter, send a value designed to short-circuit the filter ("*", "%", a NOT clause). Assert the retrieved chunks belong only to the caller's tenant.
Both probes need a seed corpus you control. We seed two canary documents per scanned RAG system: one in the caller's tenant with a known-good marker, one in a sibling tenant with a known-bad marker. The probes pass when the response contains the good marker and never contains the bad marker.
The interesting bit: we initially shipped only the wildcard probe and missed three of the four "default code path" failures from the sample above. The empty-tenant probe is more important than it sounds because it covers the case where the tenant binding is broken upstream, not just the case where the filter is bypassed.
The fix that actually works
Three layers, in order of leverage.
Layer 1: explicit assertion. Before issuing the vector store query, assert that the tenant identifier is non-empty and matches the session. This sounds trivial; it catches more failures than any other intervention. The pattern we recommend is a request-scoped tenant context object that throws on first read if the field is unset.
Layer 2: defence in depth at the vector store. Configure the vector store to reject queries with empty or wildcard namespace if the engine supports it. Most managed vector store engines support a policy here; few teams have set it because the default is permissive.
Layer 3: continuous probe in production. Even if layers 1 and 2 are in place today, they are one merge away from regression. The probes from the previous section run on every scheduled scan and fail the build if the canary corpus crosses the tenant boundary.
The cost of layer 3 is about three minutes per RAG system per scan run. The cost of finding the failure in production is whichever customer's documents end up in another customer's response.
What surprised us
Two things, briefly.
First: the failure is much more often the application code than the vector store itself. Every vector store we tested can be configured securely; the boundary that fails is somewhere between the session and the SDK call. This means the test that catches the failure has to run against the application's HTTP surface, not the vector store API directly.
Second: the teams running these deployments had thought about tenant isolation. The reviews, the threat models, the design documents all addressed it. The failure was almost always in the path that was added later, after the original review, for a feature that did not exist when the threat model was drafted. We are increasingly convinced that the "did you think about this in the threat model" question is the wrong question, because the answer is usually yes. The right question is "what is the test that runs on every deployment".