|
| 1 | +--- |
| 2 | +title: "AI basics – RAG systems" |
| 3 | +description: "RAG (Retrieval-Augmented Generation): why add retrieval to LLMs, pipeline stages, chunking, naive vs advanced RAG" |
| 4 | +date: "2026-03-25" |
| 5 | +slug: "ai-basics-rag" |
| 6 | +tags: |
| 7 | + - Artificial Intelligence |
| 8 | + - Machine Learning |
| 9 | + - База |
| 10 | +image: cover.jpg |
| 11 | +--- |
| 12 | + |
| 13 | +A «Bare minimum» article on **retrieval-augmented generation**: how to ground a language model in external documents and up-to-date knowledge. |
| 14 | + |
| 15 | +### What RAG is {.toc-heading-only} |
| 16 | + |
| 17 | +<details class="post-accordion"> |
| 18 | +<summary style="cursor: pointer; font-weight: 600;">What RAG is</summary> |
| 19 | +<div style="margin-top: 0.75em;"> |
| 20 | + |
| 21 | +<p><strong>Retrieval-Augmented Generation</strong> means the model answers not only from weights learned at training time but also from <strong>fragments retrieved</strong> from an external store.</p> |
| 22 | + |
| 23 | +<p><strong>The LLM-only issue:</strong> knowledge is largely <strong>static</strong> after training — the model does not automatically learn what happened next, <strong>does not self-update</strong>, and reflects the world <strong>as of the training cutoff</strong>. That is weak for fresh facts, internal playbooks, or a personal paper library.</p> |
| 24 | + |
| 25 | +<ul> |
| 26 | +<li>RAG is the <strong>technology of wiring external sources</strong> into the generation step.</li> |
| 27 | +<li>It <strong>mitigates stale knowledge</strong> by pulling relevant chunks from a current corpus.</li> |
| 28 | +<li>The model is <strong>augmented with document context</strong>, not just pre-trained parameters.</li> |
| 29 | +</ul> |
| 30 | + |
| 31 | +<p>RAG does not eliminate hallucinations, but it grounds answers in retrievable snippets that humans (or rules) can verify more easily.</p> |
| 32 | + |
| 33 | +</div> |
| 34 | +</details> |
| 35 | + |
| 36 | +### Stages of a RAG pipeline {.toc-heading-only} |
| 37 | + |
| 38 | +<details class="post-accordion"> |
| 39 | +<summary style="cursor: pointer; font-weight: 600;">Stages of a RAG pipeline</summary> |
| 40 | +<div style="margin-top: 0.75em;"> |
| 41 | + |
| 42 | +<p>Think of two phases: offline indexing and the online user query.</p> |
| 43 | + |
| 44 | +<ul> |
| 45 | +<li><strong>Ingestion and chunking.</strong> Documents enter the system and are split into pieces sized for the index and the context window.</li> |
| 46 | +<li><strong>Vector index build.</strong> Each chunk gets an embedding; similarity search finds chunks “close” to the query in meaning.</li> |
| 47 | +<li><strong>Retrieval.</strong> For a user question, the system selects the most relevant chunks from the index.</li> |
| 48 | +<li><strong>Generation.</strong> Those chunks are placed in the prompt (with instructions and the question), and the LLM produces an answer conditioned on that context.</li> |
| 49 | +</ul> |
| 50 | + |
| 51 | +<p>Concrete choices (embeddings, vector DB, how many chunks to inject) strongly affect quality, but the pattern <em>retrieve → inject → generate</em> stays the same.</p> |
| 52 | + |
| 53 | +</div> |
| 54 | +</details> |
| 55 | + |
| 56 | +### Chunking {.toc-heading-only} |
| 57 | + |
| 58 | +<details class="post-accordion"> |
| 59 | +<summary style="cursor: pointer; font-weight: 600;">Chunking</summary> |
| 60 | +<div style="margin-top: 0.75em;"> |
| 61 | + |
| 62 | +<p><strong>Chunking</strong> splits documents into smaller segments. Those segments are the <strong>basic units of indexing and search</strong> — what you embed and what you pass to the model.</p> |
| 63 | + |
| 64 | +<ul> |
| 65 | +<li>The LLM sees <strong>fragments, not whole documents</strong> at once — full docs rarely fit the window, and search needs granular matches.</li> |
| 66 | +<li><strong>Chunk quality drives whether facts are reachable:</strong> split a coherent block in the wrong place and retrieval may miss it; make chunks huge and noise dilutes the signal.</li> |
| 67 | +</ul> |
| 68 | + |
| 69 | +<p>In practice people tune chunk size, overlap between neighbors, and sometimes structure-aware splits (headings, paragraphs) instead of fixed character counts only.</p> |
| 70 | + |
| 71 | +</div> |
| 72 | +</details> |
| 73 | + |
| 74 | +### Types of RAG systems {.toc-heading-only} |
| 75 | + |
| 76 | +<details class="post-accordion"> |
| 77 | +<summary style="cursor: pointer; font-weight: 600;">Types of RAG systems</summary> |
| 78 | +<div style="margin-top: 0.75em;"> |
| 79 | + |
| 80 | +<p>People loosely contrast “naive” and “advanced” pipelines — the boundary is fuzzy, but the labels help navigate complexity.</p> |
| 81 | + |
| 82 | +<ul> |
| 83 | +<li><strong>Naive RAG:</strong> query → nearest-chunk search → chunks go straight into the model <strong>without extra processing</strong>. Easy to ship; quality hinges on corpus, chunks, and embeddings.</li> |
| 84 | +<li><strong>Advanced RAG:</strong> adds steps around retrieval and generation: <strong>query rewriting or expansion</strong>, <strong>reranking</strong> with a cross-encoder or another model, <strong>deduplication</strong> of overlapping hits, sometimes metadata filters. Goal: sharper, cleaner context for the LLM.</li> |
| 85 | +</ul> |
| 86 | + |
| 87 | +<p>For coursework or a research prototype, naive RAG is a common start; you add sophistication where you see failure modes like wrong paragraph, duplicates, or vocabulary mismatch between query and docs.</p> |
| 88 | + |
| 89 | +</div> |
| 90 | +</details> |
| 91 | + |
| 92 | +### Who uses it {.toc-heading-only} |
| 93 | + |
| 94 | +<details class="post-accordion"> |
| 95 | +<summary style="cursor: pointer; font-weight: 600;">Who uses it</summary> |
| 96 | +<div style="margin-top: 0.75em;"> |
| 97 | + |
| 98 | +<p>Teams adopt RAG when answers must be grounded in a <strong>chosen document set</strong> — internal, customer-facing, or personal — rather than only in the model’s training-time knowledge.</p> |
| 99 | + |
| 100 | +<ul> |
| 101 | +<li><strong>Enterprises.</strong> Knowledge bases, policies, support playbooks: employees or customers ask questions, the system retrieves relevant snippets, and the model answers with reference to up-to-date org text.</li> |
| 102 | +<li><strong>Developers and product teams.</strong> Assistants over docs, wikis, tickets: less guessing about APIs from the open web — a controlled corpus sets the boundary.</li> |
| 103 | +<li><strong>Education and research.</strong> Working with a curated stack of papers, notes, and PDFs: ask questions over course materials or a literature review without replacing source checks.</li> |
| 104 | +<li><strong>Regulated or expert domains.</strong> Legal, clinical, finance, and similar settings where tying answers to company or regulatory text matters — always with human verification and data-access policies.</li> |
| 105 | +</ul> |
| 106 | + |
| 107 | +</div> |
| 108 | +</details> |
0 commit comments