RAG
RAG (Retrieval-Augmented Generation) retrieves relevant chunks before generation—reducing hallucinations and grounding answers in your data. LangChain covers loaders, splitters, embeddings, vector stores, and retrievers.
Pipeline
flowchart LR
D[Documents] --> L[Loader]
L --> S[Splitter]
S --> E[Embedding]
E --> V[VectorStore]
Q[Question] --> R[Retriever]
V --> R
R --> P[Prompt + Context]
P --> M[LLM / Agent]
Indexing (offline)
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
loader = TextLoader("docs/guide.txt", encoding="utf-8")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(chunks, embeddings, persist_directory="./chroma_db")
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
Tune chunk_size / overlap; pick Chroma, Pinecone, pgvector, Qdrant for production.
Querying (online)
Runnable chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
prompt = ChatPromptTemplate.from_template(
"Answer using context:\n{context}\n\nQuestion: {question}"
)
chain = (
{"context": retriever | format_docs, "question": lambda x: x["question"]}
| prompt
| ChatOpenAI(model="gpt-4.1-mini")
)
RAG agent (flexible)
@tool
def search_docs(query: str) -> str:
"""Search internal documentation."""
docs = retriever.invoke(query)
return "\n---\n".join(d.page_content for d in docs)
agent = create_agent(
"openai:gpt-4.1-mini",
tools=[search_docs],
system_prompt="Use search_docs for facts. Admit insufficient context.",
)
Agentic RAG for multi-step retrieval; fixed chains for simple Q&A.
Quality tips
Hybrid search, rerankers, metadata filters, parent-document retrieval, LangSmith eval datasets.
Middleware
Implicit RAG in before_model vs explicit tool-based RAG (often preferred for auditability).
Troubleshooting
Empty retrieval → embedding/language/chunk settings. Off-topic answers → raise k, rerank, stricter prompts. Latency → cache embeddings, async indexing.
Next steps