|
2 | 2 |
|
3 | 3 | A web application that provides an AI-powered chatbot interface for dataset discovery, using Google Gemini API on the backend and a React-based frontend. |
4 | 4 |
|
5 | | -## Architecture Overview |
6 | | - |
7 | | -The KnowledgeSpace AI Agent is a Retrieval-Augmented Generation (RAG) system designed for neuroscience dataset discovery. It combines keyword search, semantic retrieval, and large language model reasoning to surface relevant datasets across heterogeneous sources. |
8 | | - |
9 | | -The system consists of the following core components: |
10 | | - |
11 | | -- **React Frontend** |
12 | | - - Provides a chat-based interface for interacting with the dataset discovery agent. |
13 | | - |
14 | | -- **FastAPI Backend** |
15 | | - - Orchestrates LLM-based reasoning (Gemini), keyword search (Elasticsearch), and semantic search (Vertex AI Matching Engine). |
16 | | - |
17 | | -- **Data Processing Pipeline** |
18 | | - - Scrapes and normalizes neuroscience metadata, stores structured records in BigQuery, and indexes vector embeddings in Vertex AI for retrieval. |
19 | | - |
20 | | -## System Flow (High-Level) |
21 | | - |
22 | | -The following diagram shows the high-level request and data flow through the system: |
23 | | - |
24 | | -```mermaid |
25 | | -flowchart LR |
26 | | - User --> Frontend |
27 | | - Frontend --> Backend |
28 | | - Backend -->|Keyword Search| Elasticsearch |
29 | | - Backend -->|Semantic Search| VertexAI |
30 | | - Backend -->|LLM Reasoning| Gemini |
31 | | - Elasticsearch --> Backend |
32 | | - VertexAI --> Backend |
33 | | - Gemini --> Backend |
34 | | - Backend --> Frontend |
35 | | -``` |
36 | | - |
37 | 5 | ## Table of Contents |
38 | 6 |
|
39 | 7 | - [Prerequisites](#prerequisites) |
@@ -97,8 +65,6 @@ Create a file named `.env` in the project root based on `.env.template`. You can |
97 | 65 | **Option 1: Google API Key (Recommended for development)** |
98 | 66 |
|
99 | 67 | - Set `GOOGLE_API_KEY` in your `.env` file |
100 | | -> You can generate a Gemini API key from **Google AI Studio**: |
101 | | -> https://aistudio.google.com/app/apikey |
102 | 68 |
|
103 | 69 | **Option 2: Vertex AI (Recommended for production)** |
104 | 70 |
|
@@ -173,10 +139,6 @@ The backend requires specific environment variables to connect to **Google Cloud |
173 | 139 |
|
174 | 140 |
|
175 | 141 | ## Running the Application |
176 | | -> ⚠️ **Note on Port Configuration** |
177 | | -> |
178 | | -> - Local development: The React development server runs on port **5000** |
179 | | -> - Docker / Nginx deployment: The containerized frontend is exposed on port **3000** |
180 | 142 |
|
181 | 143 | #### Backend (port 8000) |
182 | 144 |
|
|
0 commit comments