RAG

RAG (Retrieval-Augmented Generation) retrieves relevant chunks before generation—reducing hallucinations and grounding answers in your data. LangChain covers loaders, splitters, embeddings, vector stores, and retrievers.


Pipeline

flowchart LR
  D[Documents] --> L[Loader]
  L --> S[Splitter]
  S --> E[Embedding]
  E --> V[VectorStore]
  Q[Question] --> R[Retriever]
  V --> R
  R --> P[Prompt + Context]
  P --> M[LLM / Agent]

Indexing (offline)

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

loader = TextLoader("docs/guide.txt", encoding="utf-8")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(chunks, embeddings, persist_directory="./chroma_db")
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

Tune chunk_size / overlap; pick Chroma, Pinecone, pgvector, Qdrant for production.


Querying (online)

Runnable chain

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_template(
    "Answer using context:\n{context}\n\nQuestion: {question}"
)

chain = (
    {"context": retriever | format_docs, "question": lambda x: x["question"]}
    | prompt
    | ChatOpenAI(model="gpt-4.1-mini")
)

RAG agent (flexible)

@tool
def search_docs(query: str) -> str:
    """Search internal documentation."""
    docs = retriever.invoke(query)
    return "\n---\n".join(d.page_content for d in docs)

agent = create_agent(
    "openai:gpt-4.1-mini",
    tools=[search_docs],
    system_prompt="Use search_docs for facts. Admit insufficient context.",
)

Agentic RAG for multi-step retrieval; fixed chains for simple Q&A.


Quality tips

Hybrid search, rerankers, metadata filters, parent-document retrieval, LangSmith eval datasets.


Middleware

Implicit RAG in before_model vs explicit tool-based RAG (often preferred for auditability).


Troubleshooting

Empty retrieval → embedding/language/chunk settings. Off-topic answers → raise k, rerank, stricter prompts. Latency → cache embeddings, async indexing.


Next steps