███╗ ███╗██╗ ██╗██╗ ██╗██████╗
████╗ ████║██║ ██║██║ ██║██╔══██╗
██╔████╔██║██║ ██║██║ ██║██║ ██║
██║╚██╔╝██║██║ ██║██║ ██║██║ ██║
██║ ╚═╝ ██║╚██████╔╝╚██████╔╝██████╔╝
╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═════╝
A desktop application that uses CNN-based genre classification, hybrid emotion regression (valence/arousal), and fuzzy fusion scoring to analyze music and recommend similar songs.
Built with TensorFlow/Keras, librosa, and a modern glassmorphism CustomTkinter GUI.
- Features
- Project Structure
- Setup & Installation
- Datasets
- Training the Models
- Running the Desktop App
- Architecture Overview
- Soft Computing Techniques
- Development Log
- License
| Feature | Details |
|---|---|
| Genre Classification | 10-class CNN trained on FMA-medium (~68 % val accuracy). Softmax outputs used as fuzzy membership degrees. Multi-segment averaging + temperature scaling. Hybrid label when top-2 genres are within 0.10. |
| Emotion Regression | Hybrid CNN + handcrafted features (tempo, spectral centroid, RMS, ZCR) → continuous Valence & Arousal on the DEAM 1–9 scale. Multi-segment averaging, V/A spread transform, and RMS-based arousal energy boost. |
| Fuzzy Fusion Scoring | Weighted fusion: 0.7 × genre_similarity + 0.3 × emotion_similarity. Graded genre similarity matrix captures inter-genre relationships. |
| Recommendation Engine | Fetches tracks natively via the Spotify API based on genre; falls back to a local database. Results are dynamically displayed in an interactive hero banner carousel with circular album art. |
| Live Microphone Mode | Continuous streaming from the default microphone with a rolling spectrogram display and inference every ~7 s. Full analysis results update live. |
| 5 s Microphone Recording | Quick "REC 5 s" button — records, auto-analyzes, then cleans up the temp file. |
| Explainability Panel | Collapsible panel showing fusion formula, genre membership bar chart, emotion similarity breakdown, and intermediate computation values. |
| V-A Visualisation | Embedded matplotlib scatter plot of Russell's circumplex model with quadrant labels. Marker size and glow intensity scale with genre confidence. |
| Top-3 Genre Probabilities | Results panel shows the top-3 predicted genres with percentage probabilities before the full bar chart. |
| Temporal Smoothing | Live mic genre predictions are smoothed over a rolling buffer of 5 inference windows to reduce label flickering. |
| Singleton Model Loading | ModelRegistry singleton loads all Keras models once at startup with warm-up passes — no reloading on repeated analyses. Results cached by file path. |
| Glassmorphism UI | Dark navy theme, neon accents, CustomTkinter elements, translucent cards, circular album art, and an animated hero carousel. |
Muud/
├── main.py # Entry point — launches desktop app
├── requirements.txt # Python dependencies
├── .gitignore
│
├── engine/ # Core ML + inference logic
│ ├── __init__.py
│ ├── feature_extraction.py # load_audio, mel spectrogram & handcrafted
│ │ feature extraction (segmented)
│ ├── genre_classifier.py # GenreClassifier — predict, predict_averaged,
│ │ predict_averaged_smoothed (live), temperature
│ │ scaling, hybrid genre labeling, adaptive
│ │ segmentation for short / long clips
│ ├── emotion_regressor.py # EmotionRegressor — predict, predict_averaged,
│ │ V/A spread transform, RMS energy boost,
│ │ mood quadrant with ±1.0 neutral zone
│ ├── fusion.py # Weighted fuzzy fusion (genre + emotion),
│ │ emotion_similarity, genre_similarity helpers
│ ├── model_registry.py # ModelRegistry singleton — thread-safe model
│ │ loading + warmup
│ └── recommender.py # MusicRecommender — analyze (file), analyze_signal
│ (live), recommend, graded genre similarity matrix
│
├── ui/ # Desktop GUI
│ ├── __init__.py
│ └── desktop_app.py # MuudApp CustomTkinter app — glassmorphism UI, hero
│ carousel, V-A plot, live spectrogram, live mic streaming,
│ 5 s recording, analysis/recommend/explain panels
│
├── models/ # Trained model weights (Tracked in Git <100MB)
│ ├── best_genre_crnn.keras # FMA 10-class genre CRNN (tracked)
│ ├── emotion_hybrid_model.keras # DEAM hybrid emotion regressor (tracked)
│ └── genre_labels.json # Genre index → name mapping (tracked)
│
├── data/ # Datasets & song database
│ ├── song_db.csv # 60-song database with V/A annotations (tracked)
│ ├── DEAM/ # DEAM dataset (git-ignored — download separately)
│ ├── GTZAN/ # GTZAN dataset (git-ignored)
│ └── FMA/ # FMA-medium dataset (git-ignored)
│
├── training/ # Jupyter / Colab notebooks
│ ├── genre_mel_training.ipynb # GTZAN genre CNN training (initial model)
│ ├── train_hybrid_emotion.ipynb # DEAM hybrid emotion model training
│ ├── fma_genre_clean.ipynb # FMA-medium genre label cleaning / CSV prep
│ ├── fma_dataset_inspection.ipynb # FMA-medium dataset analysis & genre selection
│ ├── fusion_inference.ipynb # End-to-end inference pipeline test
│ ├── kaggle/ # Scripts & notebooks run on Kaggle GPUs
│ │ ├── genre_cnn_transformer_train.ipynb # CNN+Transformer genre model (Kaggle T4)
│ │ ├── genre_crnn_model.py # CRNN architecture definition
│ │ └── genre_crnn_train.py # CRNN training script
│ └── reports/ # Saved figures from training runs
│
├── inference/ # Standalone test scripts
│ ├── test_genre.py # Genre classifier sanity check
│ ├── test_emotion.py # Emotion regressor sanity check
│ ├── test_fusion.py # Fusion pipeline test
│ └── test_recommend.py # Full recommendation pipeline test
│
└── test_audio/ # Sample audio for quick testing (git-ignored)
- Python 3.10+
- Conda (recommended) or virtualenv
- A working microphone (optional — for live mic / recording features)
conda create -n emotioncnn python=3.10
conda activate emotioncnnpip install -r requirements.txtOr manually:
pip install tensorflow librosa numpy pandas matplotlib seaborn scikit-learn
pip install sounddevice scipy customtkinter spotipy pillow python-dotenvSee Datasets below for download links and placement instructions. Datasets are only needed if you plan to retrain the models.
Since previously ignored trained models (.keras) are all safely under the 100MB limit, they are now tracked by default in the repository!
You no longer need to manually train networks or download weights manually to run the Desktop App — you can launch it straight out-of-the-box! (If you plan to train new massive models >100MB, ensure you isolate them from git pushes).
python main.pyAll datasets are git-ignored due to their size. Download and place them manually.
- Used for: Emotion regression model (Valence / Arousal)
- Size: ~12 GB (1 802 songs + annotations)
- Download: DEAM on Kaggle
- Placement: Extract to
data/DEAM/preserving the original folder structure:data/DEAM/ ├── DEAM_Annotations/ │ └── annotations/ │ ├── annotations averaged per song/ │ └── annotations per each rater/ ├── DEAM_audio/ │ └── MEMD_audio/ └── features/ └── features/
- Used for: Genre classification CNN (10-class)
- Size: ~22 GB audio + ~350 MB metadata
- Downloads:
- Audio: fma_medium.zip
- Metadata: fma_metadata.zip
- GitHub: mdeff/fma
- Placement: Extract to
data/FMA/fma_medium/anddata/FMA/fma_metadata/.
- Used for: Initial genre model training (superseded by FMA)
- Size: ~1.2 GB (1 000 clips × 30 s, 10 genres)
- Download: GTZAN on Kaggle
- Placement: Extract to
data/GTZAN/so that genre folders (blues/,classical/, etc.) are directly inside.
Run notebooks from an activated emotioncnn environment, or adapt them for Google Colab with GPU.
Notebooks: training/fma_dataset_inspection.ipynb → training/fma_genre_clean.ipynb → training/kaggle/genre_cnn_transformer_train.ipynb
fma_dataset_inspection.ipynb— downloads & inspects FMA-medium metadata, selects top-10 genresfma_genre_clean.ipynb— cleans genre labels, creates train/val CSV splitstraining/kaggle/genre_cnn_transformer_train.ipynb— trains the CNN + Transformer model on Kaggle (T4 GPU, mixed-precision, class-weighted) → savesmodels/best_genre_cnn_trans.keras
Genre classes (10): Classical, Electronic, Experimental, Folk, Hip-Hop, Instrumental, International, Old-Time / Historic, Pop, Rock
Notebook: training/train_hybrid_emotion.ipynb
- Loads DEAM audio + per-song V/A annotations
- Splits into 3 s segments; extracts 128-bin Mel spectrogram + 4 handcrafted features (tempo, spectral centroid, RMS, ZCR)
- Trains a hybrid CNN (Mel branch + dense branch) → 2 regression outputs (valence, arousal)
- Saves →
models/emotion_hybrid_model.keras
Notebook: training/fusion_inference.ipynb — end-to-end pipeline test (feature extraction → genre → emotion → fusion → recommendation)
conda activate emotioncnn
python main.pyModelRegistrysingleton loads both Keras models frommodels/- Warm-up forward passes compile TF graphs (first launch slightly slower)
- Modern glassmorphism CustomTkinter window opens.
| Button | Action |
|---|---|
| BROWSE | Select a .wav / .mp3 / .flac / .ogg file |
| ANALYZE | Run genre + emotion analysis on the selected file |
| RECOMMEND | Get top-5 similar songs from the song database |
| EXPLAIN | Toggle explainability panel (fusion breakdown) |
| REC 5 s | Record 5 seconds from microphone → auto-analyze |
| 🎤 LIVE MIC | Toggle continuous microphone streaming with live spectrogram and rolling inference |
- Results — Top-3 genre probabilities, full fuzzy membership bar chart, mood quadrant, valence/arousal scores
- V-A Plot — Russell's circumplex scatter; marker size scales with confidence
- Recommendations — Interactive hero carousel featuring 30-sec previews, Spotify links, and fused score ranks
- Explainability — Intermediate fusion computation values
┌──────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Audio File │────▶│ Feature Extract │────▶│ Genre CNN │──▶ Fuzzy memberships
│ or Live Mic │ │ (Mel + Stats) │ │ (10-class FMA) │ (softmax probs)
└──────────────┘ │ × N segments │ └─────────────────┘
│ │ ┌─────────────────┐
│ │────▶│ Emotion Hybrid │──▶ Valence, Arousal
└─────────────────┘ │ (CNN + Dense) │ (1–9 scale)
└─────────────────┘
│
┌────────▼────────┐
│ Fuzzy Fusion │
│ (weighted sim) │──▶ Ranked recommendations
└─────────────────┘
- File analysis: Audio loaded at 22 050 Hz, split into segments (10 s for genre, 3 s for emotion). Predictions averaged across segments.
- Live mic: Continuous 22 050 Hz stream, rolling 30 s buffer, inference every ~7 s with temporal smoothing (5-window average).
- Short clips (< 10 s): Single padded segment for genre; normal 3 s segments for emotion.
This project integrates multiple soft computing paradigms:
| Technique | Where Used | Details |
|---|---|---|
| Fuzzy Logic | Genre classification | Softmax probabilities treated as fuzzy membership degrees across all 10 genres; hybrid labelling when membership gap < 0.10 |
| Fuzzy Fusion | Recommendation scoring | Weighted combination: 0.7 × genre_similarity + 0.3 × emotion_similarity. Genre similarity uses a graded inter-genre similarity matrix. |
| Neural Network (CNN) | Genre classifier | Convolutional Neural Network on 128-bin Mel spectrograms (FMA-medium, 10 classes) |
| Hybrid Neural Network | Emotion regressor | CNN branch (Mel spectrogram) + Dense branch (handcrafted features) → multi-output regression |
| Temperature Scaling | Genre softmax | softmax(log(p) / T) post-hoc calibration to sharpen/soften probability distributions |
| Temporal Smoothing | Live mic predictions | Rolling average over 5 inference windows reduces noise in real-time genre predictions |
| V/A Spread Transform | Emotion post-processing | Linear transform expands clustered valence/arousal predictions across the full 1–9 scale |
| Energy-based Arousal Boost | Emotion post-processing | RMS energy injected directly into arousal prediction to improve sensitivity to dynamic range |
| # | What was done |
|---|---|
| 1 | Fixed DEAM training pipeline — path resolution, iterrows() float cast, librosa tempo array squeeze |
| 2 | Created hybrid emotion training notebook — CNN + handcrafted features → V/A regression (18 020 segments from 1 802 songs) |
| 3 | Built modular engine/ package — feature_extraction, genre_classifier, emotion_regressor, fusion, recommender |
| 4 | Created retro arcade Tkinter GUI — dark navy theme, neon accents, custom NeonButton canvas widgets |
| 5 | Added embedded V-A scatter plot (matplotlib FigureCanvasTkAgg) with quadrant labels and glowing dot |
| 6 | Refactored recommendations into a sortable ttk.Treeview table (7 columns, neon-green rank-1 highlight) |
| 7 | Added collapsible explainability panel — fusion formula, genre bars, emotion similarity breakdown |
| 8 | Added microphone recording — 5 s at 22 050 Hz via sounddevice, temp WAV, auto-analyze, blinking indicator |
| 9 | Optimised startup — ModelRegistry singleton loads models once, warm-up passes, analysis caching |
| 10 | Multi-segment genre averaging — split audio into N segments, average softmax vectors |
| 11 | Improved mood classification — neutral buffer zone (±1.0 from midpoint) |
| 12 | Multi-segment emotion averaging — average V/A across segments for stability |
| 13 | Hybrid genre labelling — "Hybrid: X / Y" when top-2 gap < 0.10 |
| 14 | Adaptive fusion weights — graded genre similarity matrix for cross-genre relationships |
| 15 | Temperature scaling — softmax(log(p)/T) on genre probabilities |
| 16 | FMA-medium dataset inspection — download, extract, analyse genre distribution |
| 17 | Trained FMA-medium genre CNN (10 classes, ~68 % val accuracy), replacing GTZAN model |
| 18 | Adaptive segmentation — short clips (< 10 s) use a single padded segment; 10-30 s clips use proportional segments |
| 19 | V/A spread transform — linear expansion of clustered predictions across full 1–9 range |
| 20 | RMS energy arousal boost — injects RMS energy into arousal to improve dynamic sensitivity |
| 21 | Temporal smoothing for live mic — rolling 5-window average of genre probabilities |
| 22 | Top-3 genre probabilities in results panel with percentages |
| 23 | Confidence-scaled V-A plot — marker size and glow intensity proportional to genre confidence |
| 24 | Live microphone mode — continuous streaming, rolling spectrogram, ~7 s inference cycle |
| 25 | Live mic → analysis panel — live inference results populate the full analysis text and V-A plot |
| 26 | Codebase cleanup — updated README, .gitignore, requirements.txt; removed unused files |
| 27 | Migrated entire UI to CustomTkinter glassmorphism design — added dynamic Hero Carousel, Spotify API album art integration, and smooth track navigation animations. |
This project is for academic/educational purposes.