RAG Codebase Assistant

A local RAG assistant for asking natural-language questions about a codebase or documentation folder.

The MVP uses:

Ollama for local embeddings and chat
Weaviate in Docker for vector search
Python CLIs for ingestion and chat
FastAPI for the first API surface

How It Works

The application has two main workflows: indexing and asking.

During indexing, ingest.py receives a local repo or docs folder path. The walker scans that folder, skips noisy directories like .git, .venv, node_modules, dist, and build, and keeps useful source/documentation files. The chunker then splits those files into smaller records. Python files are split around top-level functions and classes, Markdown files are split by headings, JavaScript and TypeScript files are split around common declarations, and other supported text files are split by size.

Each chunk is sent to Ollama's embedding API using the nomic-embed-text model. The resulting vector plus metadata such as filepath, language, chunk type, line range, and symbol name are stored in Weaviate. Weaviate is configured with vectorizer: none, which means this app supplies its own embeddings instead of asking Weaviate to generate them.

During question answering, chat.py or the FastAPI /ask endpoint receives a user question. The question is embedded with the same Ollama embedding model, then Weaviate searches for the closest stored chunks using vector similarity. Those chunks are formatted into a grounded prompt and sent to Ollama's chat model, currently llama3.2. The final answer is returned with source references so the user can inspect which files supported the response.

In junior-dev terms: the app turns code files into searchable numeric fingerprints, finds the fingerprints most similar to your question, and gives only those relevant snippets to the LLM so the answer is based on the indexed repo instead of generic guesses.

Interview Summary

This project is a local RAG pipeline for querying codebases and documentation in natural language. I built the ingestion, retrieval, and answer-generation flow manually in Python so the architecture is easy to inspect and explain. It uses Ollama for private local embeddings and chat inference, Weaviate for vector storage/search, and FastAPI for a simple API layer.

The key design decision is separating the system into clear stages: file discovery, chunking, embedding, indexing, retrieval, prompt construction, and answer generation. That keeps the app modular and makes it straightforward to improve retrieval quality, add more file parsers, swap models, or build a frontend later.

Setup

Run these from C:\src\projects\rag-codebase-assistant.

uv sync
docker compose up -d
ollama list

If the models are not listed yet:

ollama pull nomic-embed-text
ollama pull llama3.2

Index A Repo

Point the ingester at any local repo or docs folder. It does not need to live inside this project.

uv run python ingest.py C:\src\projects\some-other-repo --reset

Use --reset when you want to clear the existing Weaviate collection before indexing.

Chat

uv run python chat.py

Retrieve more or fewer chunks per answer:

uv run python chat.py --top-k 10

API

uv run uvicorn rag_assistant.api.main:app --reload

Then call:

Invoke-RestMethod `
  -Method Post `
  -Uri http://127.0.0.1:8000/ask `
  -ContentType application/json `
  -Body '{"question":"How does authentication work?","top_k":5}'

The frontend uses the newline-delimited JSON stream at /ask/events.

Frontend

In a second terminal:

cd frontend
npm.cmd install
npm.cmd run dev

Then open http://localhost:3000.

If your API is not running on http://127.0.0.1:8000, copy frontend/.env.example to frontend/.env.local and update NEXT_PUBLIC_API_URL.

Configuration

Copy .env.example to .env if you want to customize endpoints or model names.

WEAVIATE_URL=http://localhost:8080
OLLAMA_URL=http://localhost:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_CHAT_MODEL=llama3.2
WEAVIATE_CLASS=CodeChunk

Tests

uv run python -m unittest discover -s tests

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
frontend		frontend
rag_assistant		rag_assistant
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
chat.py		chat.py
docker-compose.yml		docker-compose.yml
ingest.py		ingest.py
pyproject.toml		pyproject.toml
rag_codebase_assistant_proposal.md		rag_codebase_assistant_proposal.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Codebase Assistant

How It Works

Interview Summary

Setup

Index A Repo

Chat

API

Frontend

Configuration

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Codebase Assistant

How It Works

Interview Summary

Setup

Index A Repo

Chat

API

Frontend

Configuration

Tests

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages