Glossary / embedding

Embedding

A dense numeric vector representation of text, image, or audio produced by an embedding model and used for similarity search and clustering.

Component

← All terms

An embedding is a dense numeric vector representation of an input (text, image, audio) produced by an embedding model. Embeddings live in a high-dimensional space where semantic similarity between inputs corresponds to numeric proximity between vectors.

In an RAG pipeline, each indexed document chunk is converted to an embedding and stored in a vector store; the user query is also embedded and the system returns the chunks whose embeddings are closest to the query embedding.

The choice of embedding model matters for security. Two models trained on different corpora produce different vector spaces, so swapping the embedding model after a corpus has been indexed invalidates all stored vectors. Embedding inversion attacks recover original document content from the stored embeddings, which is why AI-SPM tracks the embedding-model linkage as a first-class asset.

See Embedding in production.

The Penaxtra platform implements the controls and assessments described above as part of its AI-SPM programme.

AI-SPM platform overview