An AI-powered application for querying Central Goods and Services Tax (CGST) documentation. Built with AWS Bedrock Knowledge Base, Claude Sonnet 4.5, and React, this application provides intelligent answers to GST-related questions with PDF source citations and automatic document highlighting.
- 🤖 AI-Powered Chat: Natural language queries powered by Claude Sonnet 4.5
- 📚 Knowledge Base: Semantic search across CGST documentation using AWS Bedrock
- 📄 PDF Highlighting: Automatic fuzzy text highlighting in source documents
- 🔐 Secure Authentication: AWS Cognito with managed login UI
- ⚡ Real-time Streaming: Server-sent events for instant AI responses
- 🎯 Citation Tracking: Transparent sourcing with relevance scores
- ☁️ Serverless Architecture: Fully managed AWS infrastructure
- 🌐 Custom Domains: Production deployment with CloudFront CDN
┌─────────────────────────────────────────────────────────────────┐
│ User Browser │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ React SPA (TypeScript + Vite) │ │
│ │ - ChatInterface - PDFViewer - AuthContext │ │
│ └──────────────────────────────────────────────────────────┘ │
└──────────────────┬──────────────────────────────────────────────┘
│
↓
┌──────────────────────────────────────────────────────────────────┐
│ CloudFront CDN │
│ - Custom Domain: accountant.sourish-banerjee.com │
│ - S3 Static Hosting - Origin Access Control │
└──────────────────┬───────────────────────────────────────────────┘
│
┌─────────┴─────────┬───────────────────────┬────────────┐
↓ ↓ ↓ ↓
┌────────────────┐ ┌────────────────┐ ┌──────────────────┐ ┌──────────┐
│ Cognito │ │ API Gateway │ │ Streaming │ │ S3 │
│ │ │ (Custom │ │ Lambda │ │ PDFs │
│ - User Pool │ │ Domain) │ │ (Function URL) │ │ │
│ - Identity │ │ │ │ │ │ │
│ Pool │ │ /presign │ │ Claude Sonnet | │ │
│ - OAuth │ │ /retrieve │ │ 4.5 | │ │
└────────────────┘ └────────┬───────┘ └──────────────────┘ └──────────┘
│
┌────────┴────────┐
↓ ↓
┌───────────────┐ ┌──────────────────┐
│ Presign │ │ Retrieve │
│ Lambda │ │ Lambda │
│ │ │ │
│ Generate │ │ Query KB │
│ S3 URLs │ │ │
└───────────────┘ └────────┬─────────┘
↓
┌──────────────────────┐
│ Bedrock Knowledge │
│ Base │
│ │
│ - Titan Embeddings │
│ - OpenSearch │
│ Serverless │
└──────────────────────┘
accountant/
├── backend/ # Python backend (AWS CDK + Lambda)
│ ├── cdk/ # CDK infrastructure
│ │ ├── app.py # CDK app entry point
│ │ └── stacks/ # Stack definitions
│ │ ├── bedrock_stack.py # Knowledge Base + OpenSearch
│ │ ├── auth_stack.py # Cognito authentication
│ │ ├── streaming_stack.py # Claude streaming Lambda
│ │ ├── api_stack.py # API Gateway + integrations
│ │ └── web_stack.py # CloudFront + S3 hosting
│ ├── lambda/ # Lambda functions
│ │ ├── presign/ # S3 presigned URL generation
│ │ ├── retrieve/ # Bedrock KB retrieval
│ │ └── stream/ # Claude streaming (Docker)
│ ├── scripts/ # Utility scripts
│ │ └── generate_metadata.py # KB metadata generation
│ └── copilot/ # Scratch workspace (gitignored)
│ ├── kb_source/ # Knowledge Base PDFs
│ └── summaries/ # Task documentation
├── frontend/ # React frontend (TypeScript + Vite)
│ ├── src/
│ │ ├── components/ # React components
│ │ │ ├── ChatInterface.tsx # Main chat UI
│ │ │ ├── PDFViewer.tsx # PDF rendering with highlighting
│ │ │ ├── LoginPage.tsx # Cognito authentication UI
│ │ │ └── ...
│ │ ├── contexts/ # React contexts
│ │ │ └── AuthContext.tsx # Authentication state
│ │ ├── utils/ # Utility functions
│ │ │ ├── bedrock.ts # KB retrieval API
│ │ │ ├── anthropic.ts # Claude streaming
│ │ │ ├── s3.ts # S3 presigned URLs
│ │ │ └── auth.ts # AWS SigV4 signing
│ │ ├── aws-config.ts # AWS Amplify config
│ │ └── main.tsx # App entry point
│ ├── .env.dev # Development config (gitignored)
│ ├── .env.prod # Production config (gitignored)
│ └── copilot/ # Scratch workspace (gitignored)
└── README.md # This file
- React 18 - UI framework
- TypeScript - Type-safe JavaScript
- Vite - Build tool and dev server
- AWS Amplify - Authentication integration
- react-pdf - PDF rendering
- pdfjs-dist - PDF parsing and text extraction
- AWS CDK - Infrastructure as code (Python)
- AWS Lambda - Serverless compute
- API Gateway - RESTful API with custom domain
- Bedrock Knowledge Base - Semantic document search
- Bedrock Runtime - Claude Sonnet 4.5 integration
- OpenSearch Serverless - Vector database (1024-dim embeddings)
- Cognito - User authentication and authorization
- CloudFront - CDN and static hosting
- S3 - Object storage for PDFs and website
- Secrets Manager - API key management
- Route53 - DNS and custom domains
- ACM - SSL/TLS certificates
- Python 3.13 - Backend runtime
- uv - Fast Python package manager
- Docker - Container runtime for streaming Lambda
- Lambda Web Adapter - HTTP server in Lambda
- FastAPI - Python web framework for streaming
- Uvicorn - ASGI server
- Python 3.13+ with uv installed
- Node.js 18+ and npm
- AWS CLI configured with credentials
- AWS CDK installed globally (
npm install -g aws-cdk) - Docker for Lambda container builds
- Anthropic API key for Claude access
-
Clone repository:
git clone <repository-url> cd accountant
-
Install backend dependencies:
cd backend uv sync -
Install frontend dependencies:
cd ../frontend npm install
-
Create Anthropic API key secret:
aws secretsmanager create-secret \ --name anthropic-api-key \ --secret-string "your-anthropic-api-key" -
Bootstrap CDK (one-time per AWS account/region):
cd backend/cdk uv run cdk bootstrap -
Deploy infrastructure:
uv run cdk deploy --all
-
Create frontend environment files:
cd ../../frontend cp .env.example .env.dev # Edit .env.dev with CDK stack outputs
-
Start development server:
npm run dev
Visit http://localhost:5173
-
Configure custom domains in
backend/cdk/app.py:DOMAIN_NAME = "accountant.sourish-banerjee.com" API_DOMAIN_NAME = "api.accountant.sourish-banerjee.com"
-
Deploy infrastructure:
cd backend/cdk uv run cdk deploy --all -
Upload Knowledge Base PDFs:
cd ../copilot/kb_source # Add PDF files cd ../.. python scripts/generate_metadata.py # Upload to S3 (get bucket name from BedrockStack outputs) aws s3 sync copilot/kb_source/ s3://<kb-bucket-name>/
-
Build and deploy frontend:
cd ../../frontend # Update .env.prod with stack outputs npm run build cd ../backend/cdk uv run cdk deploy AccountantWebStack
-
Access application at your custom domain
# Cognito Configuration
VITE_USER_POOL_ID=us-east-1_XXXXXXXXX
VITE_USER_POOL_CLIENT_ID=xxxxxxxxxxxxxxxxxxxx
VITE_IDENTITY_POOL_ID=us-east-1:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
VITE_OAUTH_DOMAIN=accountant-XXXXX.auth.us-east-1.amazoncognito.com
# OAuth URLs
VITE_REDIRECT_SIGN_IN=https://accountant.sourish-banerjee.com
VITE_REDIRECT_SIGN_OUT=https://accountant.sourish-banerjee.com
# API Configuration
VITE_API_BASE_URL=https://api.accountant.sourish-banerjee.com
VITE_STREAMING_LAMBDA_URL=https://xxxxxx.lambda-url.us-east-1.on.aws
# AWS Region
VITE_AWS_REGION=us-east-1Lambda environment variables are configured by CDK stacks. No manual configuration needed.
To use custom domains:
- Create Route53 hosted zone for your domain
- Update domain names in
backend/cdk/app.py - Deploy stacks - CDK creates ACM certificates automatically
- Wait for DNS propagation (can take 30+ minutes)
To deploy without custom domains:
# In backend/cdk/app.py
DOMAIN_NAME = None
API_DOMAIN_NAME = None- Backend README - CDK infrastructure and Lambda functions
- Frontend README - React application and features
- CDK README - Detailed CDK deployment guide
- Presign Lambda - S3 presigned URL generation
- Retrieve Lambda - Bedrock KB retrieval
- Stream Lambda - Claude streaming responses
cd backend/cdk
uv run cdk diff # Review changes
uv run cdk deploy --all # Deploy infrastructurecd frontend
npm run dev # Test locally
npm run build # Build for production
cd ../backend/cdk
uv run cdk deploy AccountantWebStack # Deploy to CloudFrontLambda code changes are detected automatically by CDK:
# After editing lambda/*/handler.py
cd backend/cdk
uv run cdk deploy AccountantApiStack # Deploys updated Lambda# Add new PDFs to copilot/kb_source/
cd backend
python scripts/generate_metadata.py
# Upload to S3
aws s3 sync copilot/kb_source/ s3://<kb-bucket-name>/
# Sync in Bedrock console (manual step)Monthly costs for production deployment:
| Service | Cost | Notes |
|---|---|---|
| OpenSearch Serverless | $700 | 2 OCU minimum (fixed cost) |
| CloudFront | $20-50 | Varies with traffic |
| Lambda | $5-10 | Based on invocations |
| API Gateway | $3-5 | Based on requests |
| Bedrock KB Queries | $10-20 | ~$0.001 per query |
| Anthropic API | $20-100 | Based on usage |
| Cognito | $0-5 | Free tier: 50,000 MAU |
| S3 Storage | $1-3 | PDFs + static assets |
| Secrets Manager | $0.40 | Per secret |
| Route53 | $0.50 | Per hosted zone |
| Total | $760-915 | Per month |
Note: OpenSearch Serverless is the primary cost driver. Consider alternatives for lower-traffic applications.
- Authentication: AWS Cognito with SRP protocol
- Authorization: Cognito User Pool authorizer on API Gateway
- API Security: IAM authentication with SigV4 signing
- Data Encryption: At rest (S3, OpenSearch) and in transit (HTTPS)
- Secret Management: AWS Secrets Manager for API keys
- Network Security: Private S3 buckets with Origin Access Control
- CORS: Strict origin validation
- PDF Access: Time-limited presigned URLs (1-hour expiration)
Advanced text matching algorithm:
- Exact match with special character cleanup
- Levenshtein distance for fuzzy matching
- Multi-page search with automatic navigation
- Handles OCR artifacts and extraction inconsistencies
Real-time AI responses using:
- Lambda Function URL with response streaming
- Lambda Web Adapter for HTTP server
- Server-sent events (SSE) protocol
- FastAPI + Uvicorn for async streaming
Transparent sourcing:
- JSON-formatted citations from Claude
- Relevance scores from Knowledge Base
- Clickable citations load PDFs
- Automatic text highlighting in documents
Lambda Cold Starts:
- Expected 1-3 second delay on first request
- Consider provisioned concurrency for production
CORS Errors:
- Verify allowed origins in Lambda/API Gateway
- Check request includes correct origin header
PDF Not Loading:
- Check S3 bucket permissions
- Verify presigned URL hasn't expired
- Ensure object exists in S3
Knowledge Base No Results:
- Verify PDFs uploaded to S3
- Trigger data source sync in Bedrock console
- Wait 5-10 minutes for indexing
Authentication Issues:
- Check Cognito User Pool configuration
- Verify OAuth redirect URLs match exactly
- Clear browser cache and cookies
View CloudWatch logs:
# API Lambda logs
aws logs tail /aws/lambda/AccountantApiStack-RetrieveFunction-* --follow
# Streaming Lambda logs
aws logs tail /aws/lambda/AccountantStreamingStack-StreamingFunction-* --followThis is a private project. For questions or issues, contact the repository owner.
- Test Lambda locally before deploying
- Use CDK watch for auto-deployment:
uv run cdk watch - Deploy single stack instead of all:
uv run cdk deploy AccountantApiStack - Cache presigned URLs in frontend to reduce API calls
- CloudWatch Logs - Check Lambda execution logs
- CDK Diff - Review infrastructure changes before deploying
- Browser DevTools - Network tab for API debugging
- React DevTools - Component state inspection
- Backend: Follow Python type hints and docstrings
- Frontend: Use TypeScript strict mode
- Lambda: Keep functions small and focused
- CDK: Use stack outputs for cross-stack references
Future enhancements:
- Multi-user support with user-specific data
- Conversation history persistence
- Export chat transcripts to PDF
- Advanced filtering by document type/year
- Notification system for new CGST updates
- Mobile-responsive UI improvements
- Cost optimization with caching layer
- Batch document upload interface
Private project - All rights reserved
- AWS Bedrock - Knowledge Base and Claude integration
- Anthropic - Claude Sonnet 4.5 AI model
- AWS Lambda Web Adapter - HTTP streaming in Lambda
- react-pdf - PDF rendering library
- AWS CDK - Infrastructure as code framework
Built with ❤️ for Indian GST professionals