RAG means augmenting an LLM's answers with your own material. Local RAG runs on your own hardware: thousands of documents, millisecond semantic search, not a byte in a cloud.
How it's built
Stack: an open embedding model (e.g. bge-m3), sqlite-vec or LanceDB as the vector database, SQLite for metadata. Indexing runs in the background. When you ask something, the assistant pulls relevant chunks from your material and hands them to the model as context.
What kinds of material?
PDF papers, Word documents, emails, meeting notes, audio transcripts, source code, contracts, board material. Typical scale: 10,000–1,000,000 pages.
Frequently asked
- How long does indexing take?
- On an M-series Mac, 10,000 PDFs index overnight. For large datasets a workstation (Framework Desktop, DGX Spark) speeds things up significantly.
Updated 2026-04-21