15 mai 2025·5 min read

RAG: giving an AI documentary memory

RAG lets an AI answer questions from real documents rather than being limited to its internal knowledge.

IARAGVector Database

The problem of approximate answers

An LLM has a knowledge cut-off date and knows nothing about your internal documents. Result: it hallucinates or answers generically. RAG (Retrieval-Augmented Generation) solves this by injecting relevant document excerpts into the model's context, retrieved in real time.

The role of embeddings

An embedding is a vector representation of text in a high-dimensional space. Semantically close texts have close vectors. We encode each document and each user query as vectors, then search for the documents closest to the query.

Vector search

Vector databases like Pinecone, Weaviate or pgvector store millions of vectors and perform similarity searches in milliseconds. That is the engine behind the documentary memory of a RAG system.

RAG pipeline example

A simplified semantic search in a vector database:

typescript

async function searchDocuments(query: string) {
  const embedding = await createEmbedding(query);

  const results = await vectorDatabase.search({
    vector: embedding,
    topK: 5,
  });

  return results.map((doc) => doc.content);
}

See all articles