Embedding - AI Glossary

Our engineers set up and run your first chatbot / LLM security scan. Get in touch →

An embedding is a dense numeric vector representation of an input (text, image, audio) produced by an embedding model. Embeddings live in a high-dimensional space where semantic similarity between inputs corresponds to numeric proximity between vectors.

In an RAG pipeline, each indexed document chunk is converted to an embedding and stored in a vector store; the user query is also embedded and the system returns the chunks whose embeddings are closest to the query embedding.

The choice of embedding model matters for security. Two models trained on different corpora produce different vector spaces, so swapping the embedding model after a corpus has been indexed invalidates all stored vectors. Embedding inversion attacks recover original document content from the stored embeddings, which is why AI-SPM tracks the embedding-model linkage as a first-class asset.

Other entries in this neighbourhood.

Vector Store A database optimised for similarity search over high-dimensional embedding vectors; the canonical storage layer for RAG. Embedding Inversion A privacy attack that reconstructs the original input text from a stored embedding vector. Retrieval-Augmented Generation (RAG) An LLM pattern where the prompt is augmented with documents retrieved from a vector store at query time; the retriever and the corpus are the new attack surfaces.

See Embedding in production.

The Penaxtra platform implements the controls and assessments described above as part of its AI-SPM programme.

AI-SPM platform overview →