This project is an advanced Indian Sign Language (ISL) Translation System designed to bridge the communication gap for the deaf and hard-of-hearing community. It captures ISL gestures in real-time using a webcam, translates them into text using Deep Learning, and then synthesizes the text into spoken language using an AI-based voice assistant.
- Project Environment Setup: Created virtual environment and
requirements.txt. - Voice Module: Configured
voice_config.pyand implementedspeaker.pyusingedge-tts. - Preprocessing Pipeline: Developed
keypoint_utils.pyandpreprocess.pyto handle.posefiles and sequence generation. - Model Architecture: Designed a Bidirectional LSTM with Attention Mechanism in
model.py. - Training & Evaluation: Implemented
train.py,dataset.py, andevaluate.py. - Real-Time Interface: Developed
live_detect.pyfor webcam-based inference and voice output. - Project Structure & Documentation: Set up
.gitignoreandREADME.md.
- MediaPipe Integration: Uses Google's MediaPipe Hands for high-performance 3D keypoint extraction from 2D images.
- Bidirectional LSTM (Bi-LSTM): Captures temporal dependencies by processing sign language sequences in both forward and backward directions.
- Attention Mechanism: Learns to assign weights to different frames in a gesture, allowing the model to focus on the most "expressive" moments of a sign.
- Edge-TTS Integration: Uses Microsoft's Neural Text-to-Speech engine for high-quality, natural-sounding Indian English voices (
en-IN-NeerjaNeural). - Sequential Smoothing: Implements a rolling frame buffer (30 frames) to maintain smooth real-time predictions.
- Confidence Filtering: Uses a threshold (default 90%) to ensure only high-confidence translations are spoken.
D:\ISL_PROJECT\
├── data/
│ ├── raw/ # Raw .pose and CSV files (iSign v1.1)
│ └── processed/ # Preprocessed .npy and encoder files
├── models/ # Saved model weights (.pth) and metrics
├── src/
│ ├── preprocess.py # Data cleaning and sequence building
│ ├── dataset.py # PyTorch Custom Dataset
│ ├── model.py # Bidirectional LSTM + Attention model
│ ├── train.py # Model training pipeline
│ ├── evaluate.py # Testing and performance reports
│ └── live_detect.py # Real-time inference and voice output
├── utils/
│ └── keypoint_utils.py # Keypoint extraction and normalization
├── voice/
│ ├── speaker.py # TTS synthesis and playback
│ ├── voice_config.py # Voice personality and settings
│ └── audio_cache/ # Cached MP3 phrases for fast playback
├── requirements.txt # Project dependencies
└── README.md
- Python 3.10+
- Webcam for real-time detection
- Internet connection (for initial voice synthesis and pre-caching)
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
pip install pose-formatEnsure your .pose files are in data/raw/poses/ and the iSign_v1.1.csv file is in data/raw/.
python src/preprocess.pypython src/train.pypython src/live_detect.pyMIT License