An embedding is a dense numeric vector representation of an input (text, image, audio) produced by an embedding model. Embeddings live in a high-dimensional space where semantic similarity between inputs corresponds to numeric proximity between vectors.
In an RAG pipeline, each indexed document chunk is converted to an embedding and stored in a vector store; the user query is also embedded and the system returns the chunks whose embeddings are closest to the query embedding.
The choice of embedding model matters for security. Two models trained on different corpora produce different vector spaces, so swapping the embedding model after a corpus has been indexed invalidates all stored vectors. Embedding inversion attacks recover original document content from the stored embeddings, which is why AI-SPM tracks the embedding-model linkage as a first-class asset.