Skip to content

Latest commit

 

History

History
200 lines (161 loc) · 7.67 KB

File metadata and controls

200 lines (161 loc) · 7.67 KB

Project Proposal: Codebase & Docs Q&A Assistant (RAG)

Overview

A locally-runnable RAG (Retrieval-Augmented Generation) chatbot that lets you ask natural language questions about any codebase or documentation folder. You point it at a repo or docs directory, it indexes everything, and you can ask things like:

  • "How does authentication work in this project?"
  • "Where is the database connection initialized?"
  • "What does the processPayment function do and where is it called?"

This mirrors exactly what NPX does with their proposal generation tool — ingest a corpus of documents, embed them into a vector store, and use an LLM to answer questions grounded in that content.


Tech Stack

Layer Tool Why
Embeddings + LLM Ollama (local) Free, private, matches NPX's stack exactly
Vector Database Weaviate (Docker) NPX's actual stack
Ingestion + orchestration Python (LangChain or manual) Simple, you already know it
Frontend (optional) React + Next.js NPX's stack, makes it demo-able
File parsing Python (ast, pathlib, tiktoken) Parse code + docs into chunks

You can swap Ollama for OpenAI API if you want faster/better responses during dev — just use the same interface.


Architecture

[ Codebase / Docs Folder ]
        |
        v
[ Ingestion Pipeline ]  <-- Python script
  - Walk directory tree
  - Parse .py, .ts, .md, .txt, .json files
  - Chunk by file / function / heading
  - Generate embeddings (Ollama: nomic-embed-text)
        |
        v
[ Weaviate Vector Store ]  <-- Docker container
  - Store chunks + metadata (filename, line range, language)
        |
        v
[ Query Pipeline ]  <-- Python / API
  - Take user question
  - Embed the question
  - Retrieve top-k relevant chunks from Weaviate
  - Build prompt: "Given this context: {chunks} — answer: {question}"
  - Send to LLM (Ollama: llama3 or mistral)
        |
        v
[ Response ]  <-- streamed answer with source file references

Project Structure

rag-codebase-assistant/
├── ingestion/
│   ├── walker.py          # Recursively walk and filter files
│   ├── chunker.py         # Split files into meaningful chunks
│   ├── embedder.py        # Generate embeddings via Ollama
│   └── indexer.py         # Push chunks + embeddings into Weaviate
├── retrieval/
│   ├── query.py           # Embed question, query Weaviate, return top-k
│   └── prompt.py          # Build prompt with retrieved context
├── llm/
│   └── ollama_client.py   # Wrapper for Ollama chat completions
├── api/
│   └── main.py            # FastAPI server exposing /ask endpoint
├── frontend/              # Optional React/Next.js chat UI
│   └── ...
├── docker-compose.yml     # Weaviate + optional Ollama container
├── ingest.py              # CLI entrypoint: python ingest.py ./my-repo
├── chat.py                # CLI entrypoint: python chat.py
└── README.md

Implementation Plan

Phase 1 — Ingestion Pipeline

  1. Set up Weaviate locally via Docker (docker-compose up)
  2. Write walker.py — recursively collect files, filter by extension (.py, .ts, .md, .txt, ignore node_modules, .git, build dirs)
  3. Write chunker.py — split files into chunks:
    • For code: chunk by function/class using AST parsing (Python) or regex (TS)
    • For markdown/docs: chunk by heading sections
    • Max chunk size: ~500 tokens with ~50 token overlap
  4. Write embedder.py — call Ollama's embedding endpoint (nomic-embed-text model)
  5. Write indexer.py — create Weaviate schema and upsert chunks with metadata
  6. Wire together in ingest.py CLI

Phase 2 — Query + Answer Pipeline

  1. Write query.py — embed incoming question, query Weaviate for top 5 chunks by cosine similarity
  2. Write prompt.py — build a prompt like:
    You are a helpful assistant for a software codebase.
    Use only the following context to answer the question.
    If the answer isn't in the context, say so.
    
    Context:
    {retrieved_chunks}
    
    Question: {user_question}
    Answer:
    
  3. Write ollama_client.py — call Ollama chat endpoint, stream response
  4. Wire together in chat.py CLI with a simple input loop

Phase 3 — API + Frontend (makes it demo-able)

  1. Wrap query pipeline in a FastAPI /ask endpoint
  2. Build a minimal React chat UI (Next.js):
    • Text input for question
    • Streamed response display
    • Source file references shown under each answer
  3. Connect frontend to FastAPI backend

Phase 4 — Polish for Portfolio

  • Add a README with setup instructions and a demo GIF
  • Test it against a real open source repo (e.g. your FindIT project)
  • Add a --repo flag that auto-clones a GitHub URL and ingests it
  • Deploy Weaviate + API to Azure (matches NPX's cloud stack)

Weaviate Schema

schema = {
    "class": "CodeChunk",
    "properties": [
        {"name": "content", "dataType": ["text"]},       # the actual code/text
        {"name": "filepath", "dataType": ["text"]},      # relative file path
        {"name": "language", "dataType": ["text"]},      # python, typescript, markdown
        {"name": "chunkType", "dataType": ["text"]},     # function, class, section, file
        {"name": "startLine", "dataType": ["int"]},      # line number start
        {"name": "endLine", "dataType": ["int"]},        # line number end
    ],
    "vectorizer": "none"  # we supply our own embeddings
}

Docker Compose

version: '3.8'
services:
  weaviate:
    image: semitechnologies/weaviate:latest
    ports:
      - "8080:8080"
    environment:
      QUERY_DEFAULTS_LIMIT: 20
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
    volumes:
      - weaviate_data:/var/lib/weaviate

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

volumes:
  weaviate_data:
  ollama_data:

Key Prompts for Codex

Use these when working through implementation:

  • "Implement walker.py — recursively walk a directory, return all files with extensions in a given allowlist, skip common ignore patterns like node_modules, .git, pycache, dist"
  • "Implement chunker.py — for Python files use the ast module to split by function and class definitions. For markdown split by ## headings. For other text files split by character count with overlap. Return a list of dicts with keys: content, start_line, end_line, chunk_type"
  • "Implement embedder.py — call Ollama's POST /api/embeddings endpoint with model nomic-embed-text and return the embedding vector"
  • "Implement indexer.py — connect to Weaviate at localhost:8080, create the CodeChunk schema if it doesn't exist, batch upsert a list of chunk dicts with their embedding vectors"
  • "Implement query.py — embed a question string using Ollama, query Weaviate for the top 5 nearest CodeChunk objects by vector similarity, return their content and filepath"
  • "Implement a FastAPI app in api/main.py with a POST /ask endpoint that accepts a JSON body with a 'question' field and returns a streamed response"

What to Say About This Project

In an interview at NPX:

"I built a RAG pipeline that lets you query any codebase or documentation in natural language. It uses Weaviate for vector storage and Ollama for local LLM inference — which I chose specifically because they're in your stack. The core idea is the same as your proposal generation tool: ingest a document corpus, embed it, and use retrieval to ground the LLM's answers in real content rather than hallucinations."

That's a sentence that will land.