Embedding Inversion - AI Glossary

Our engineers set up and run your first chatbot / LLM security scan. Get in touch →

Embedding inversion is a privacy attack where the attacker reconstructs the original input text from a stored embedding vector. Modern embedding models lose less information than is intuitive; published research has demonstrated that significant portions of the original text can be recovered with access to the embedding alone and a similar embedding model.

The implication for AI-SPM is that a vector store containing embeddings of sensitive documents is itself a sensitive store, even if the original documents are kept elsewhere. Treating an embedding store as a low-sensitivity index is a recurring audit finding.

Defences include using embedding models with reduced inversion fidelity, applying noise during indexing (with the trade-off of degraded retrieval quality), and treating the vector store with the same access controls as the source documents.

Other entries in this neighbourhood.

Embedding A dense numeric vector representation of text, image, or audio produced by an embedding model and used for similarity search and clustering. Vector Store A database optimised for similarity search over high-dimensional embedding vectors; the canonical storage layer for RAG. Retrieval-Augmented Generation (RAG) An LLM pattern where the prompt is augmented with documents retrieved from a vector store at query time; the retriever and the corpus are the new attack surfaces.

See Embedding Inversion in production.

The Penaxtra platform implements the controls and assessments described above as part of its AI-SPM programme.

AI-SPM platform overview →