MUUD — Hybrid Soft Computing Music Intelligence System

███╗   ███╗██╗   ██╗██╗   ██╗██████╗
████╗ ████║██║   ██║██║   ██║██╔══██╗
██╔████╔██║██║   ██║██║   ██║██║  ██║
██║╚██╔╝██║██║   ██║██║   ██║██║  ██║
██║ ╚═╝ ██║╚██████╔╝╚██████╔╝██████╔╝
╚═╝     ╚═╝ ╚═════╝  ╚═════╝ ╚═════╝

A desktop application that uses CNN-based genre classification, hybrid emotion regression (valence/arousal), and fuzzy fusion scoring to analyze music and recommend similar songs.

Built with TensorFlow/Keras, librosa, and a modern glassmorphism CustomTkinter GUI.

Features

Feature	Details
Genre Classification	10-class CNN trained on FMA-medium (~68 % val accuracy). Softmax outputs used as fuzzy membership degrees. Multi-segment averaging + temperature scaling. Hybrid label when top-2 genres are within 0.10.
Emotion Regression	Hybrid CNN + handcrafted features (tempo, spectral centroid, RMS, ZCR) → continuous Valence & Arousal on the DEAM 1–9 scale. Multi-segment averaging, V/A spread transform, and RMS-based arousal energy boost.
Fuzzy Fusion Scoring	Weighted fusion: `0.7 × genre_similarity + 0.3 × emotion_similarity`. Graded genre similarity matrix captures inter-genre relationships.
Recommendation Engine	Fetches tracks natively via the Spotify API based on genre; falls back to a local database. Results are dynamically displayed in an interactive hero banner carousel with circular album art.
Live Microphone Mode	Continuous streaming from the default microphone with a rolling spectrogram display and inference every ~7 s. Full analysis results update live.
5 s Microphone Recording	Quick "REC 5 s" button — records, auto-analyzes, then cleans up the temp file.
Explainability Panel	Collapsible panel showing fusion formula, genre membership bar chart, emotion similarity breakdown, and intermediate computation values.
V-A Visualisation	Embedded matplotlib scatter plot of Russell's circumplex model with quadrant labels. Marker size and glow intensity scale with genre confidence.
Top-3 Genre Probabilities	Results panel shows the top-3 predicted genres with percentage probabilities before the full bar chart.
Temporal Smoothing	Live mic genre predictions are smoothed over a rolling buffer of 5 inference windows to reduce label flickering.
Singleton Model Loading	`ModelRegistry` singleton loads all Keras models once at startup with warm-up passes — no reloading on repeated analyses. Results cached by file path.
Glassmorphism UI	Dark navy theme, neon accents, CustomTkinter elements, translucent cards, circular album art, and an animated hero carousel.

Project Structure

Muud/
├── main.py                          # Entry point — launches desktop app
├── requirements.txt                 # Python dependencies
├── .gitignore
│
├── engine/                          # Core ML + inference logic
│   ├── __init__.py
│   ├── feature_extraction.py        # load_audio, mel spectrogram & handcrafted
│   │                                  feature extraction (segmented)
│   ├── genre_classifier.py          # GenreClassifier — predict, predict_averaged,
│   │                                  predict_averaged_smoothed (live), temperature
│   │                                  scaling, hybrid genre labeling, adaptive
│   │                                  segmentation for short / long clips
│   ├── emotion_regressor.py         # EmotionRegressor — predict, predict_averaged,
│   │                                  V/A spread transform, RMS energy boost,
│   │                                  mood quadrant with ±1.0 neutral zone
│   ├── fusion.py                    # Weighted fuzzy fusion (genre + emotion),
│   │                                  emotion_similarity, genre_similarity helpers
│   ├── model_registry.py            # ModelRegistry singleton — thread-safe model
│   │                                  loading + warmup
│   └── recommender.py               # MusicRecommender — analyze (file), analyze_signal
│                                      (live), recommend, graded genre similarity matrix
│
├── ui/                              # Desktop GUI
│   ├── __init__.py
│   └── desktop_app.py               # MuudApp CustomTkinter app — glassmorphism UI, hero
│                                      carousel, V-A plot, live spectrogram, live mic streaming,
│                                      5 s recording, analysis/recommend/explain panels
│
├── models/                          # Trained model weights (Tracked in Git <100MB)
│   ├── best_genre_crnn.keras        # FMA 10-class genre CRNN (tracked)
│   ├── emotion_hybrid_model.keras   # DEAM hybrid emotion regressor (tracked)
│   └── genre_labels.json            # Genre index → name mapping (tracked)
│
├── data/                            # Datasets & song database
│   ├── song_db.csv                  # 60-song database with V/A annotations (tracked)
│   ├── DEAM/                        # DEAM dataset (git-ignored — download separately)
│   ├── GTZAN/                       # GTZAN dataset (git-ignored)
│   └── FMA/                         # FMA-medium dataset (git-ignored)
│
├── training/                        # Jupyter / Colab notebooks
│   ├── genre_mel_training.ipynb     # GTZAN genre CNN training (initial model)
│   ├── train_hybrid_emotion.ipynb   # DEAM hybrid emotion model training
│   ├── fma_genre_clean.ipynb        # FMA-medium genre label cleaning / CSV prep
│   ├── fma_dataset_inspection.ipynb # FMA-medium dataset analysis & genre selection
│   ├── fusion_inference.ipynb       # End-to-end inference pipeline test
│   ├── kaggle/                      # Scripts & notebooks run on Kaggle GPUs
│   │   ├── genre_cnn_transformer_train.ipynb  # CNN+Transformer genre model (Kaggle T4)
│   │   ├── genre_crnn_model.py      # CRNN architecture definition
│   │   └── genre_crnn_train.py      # CRNN training script
│   └── reports/                     # Saved figures from training runs
│
├── inference/                       # Standalone test scripts
│   ├── test_genre.py               # Genre classifier sanity check
│   ├── test_emotion.py             # Emotion regressor sanity check
│   ├── test_fusion.py              # Fusion pipeline test
│   └── test_recommend.py           # Full recommendation pipeline test
│
└── test_audio/                      # Sample audio for quick testing (git-ignored)

Setup & Installation

Prerequisites

Python 3.10+
Conda (recommended) or virtualenv
A working microphone (optional — for live mic / recording features)

1. Create environment

conda create -n emotioncnn python=3.10
conda activate emotioncnn

2. Install dependencies

pip install -r requirements.txt

Or manually:

pip install tensorflow librosa numpy pandas matplotlib seaborn scikit-learn
pip install sounddevice scipy customtkinter spotipy pillow python-dotenv

3. Download datasets

See Datasets below for download links and placement instructions. Datasets are only needed if you plan to retrain the models.

4. Trained Models (Included)

Since previously ignored trained models (.keras) are all safely under the 100MB limit, they are now tracked by default in the repository!

You no longer need to manually train networks or download weights manually to run the Desktop App — you can launch it straight out-of-the-box! (If you plan to train new massive models >100MB, ensure you isolate them from git pushes).

5. Launch the app

python main.py

Datasets

All datasets are git-ignored due to their size. Download and place them manually.

DEAM (MediaEval Database for Emotional Analysis of Music)

Used for: Emotion regression model (Valence / Arousal)
Size: ~12 GB (1 802 songs + annotations)
Download: DEAM on Kaggle

Placement: Extract to data/DEAM/ preserving the original folder structure:

data/DEAM/
├── DEAM_Annotations/
│   └── annotations/
│       ├── annotations averaged per song/
│       └── annotations per each rater/
├── DEAM_audio/
│   └── MEMD_audio/
└── features/
    └── features/

FMA-Medium (Free Music Archive)

Used for: Genre classification CNN (10-class)
Size: ~22 GB audio + ~350 MB metadata
Downloads:
- Audio: fma_medium.zip
- Metadata: fma_metadata.zip
- GitHub: mdeff/fma
Placement: Extract to data/FMA/fma_medium/ and data/FMA/fma_metadata/.

GTZAN Genre Collection (legacy)

Used for: Initial genre model training (superseded by FMA)
Size: ~1.2 GB (1 000 clips × 30 s, 10 genres)
Download: GTZAN on Kaggle
Placement: Extract to data/GTZAN/ so that genre folders (blues/, classical/, etc.) are directly inside.

Training the Models

Run notebooks from an activated emotioncnn environment, or adapt them for Google Colab with GPU.

1. Genre CNN + Transformer — FMA-medium

Notebooks: training/fma_dataset_inspection.ipynb → training/fma_genre_clean.ipynb → training/kaggle/genre_cnn_transformer_train.ipynb

fma_dataset_inspection.ipynb — downloads & inspects FMA-medium metadata, selects top-10 genres
fma_genre_clean.ipynb — cleans genre labels, creates train/val CSV splits
training/kaggle/genre_cnn_transformer_train.ipynb — trains the CNN + Transformer model on Kaggle (T4 GPU, mixed-precision, class-weighted) → saves models/best_genre_cnn_trans.keras

Genre classes (10): Classical, Electronic, Experimental, Folk, Hip-Hop, Instrumental, International, Old-Time / Historic, Pop, Rock

2. Hybrid Emotion Model — DEAM

Notebook: training/train_hybrid_emotion.ipynb

Loads DEAM audio + per-song V/A annotations
Splits into 3 s segments; extracts 128-bin Mel spectrogram + 4 handcrafted features (tempo, spectral centroid, RMS, ZCR)
Trains a hybrid CNN (Mel branch + dense branch) → 2 regression outputs (valence, arousal)
Saves → models/emotion_hybrid_model.keras

3. Inference Test

Notebook: training/fusion_inference.ipynb — end-to-end pipeline test (feature extraction → genre → emotion → fusion → recommendation)

Running the Desktop App

conda activate emotioncnn
python main.py

What happens on launch

ModelRegistry singleton loads both Keras models from models/
Warm-up forward passes compile TF graphs (first launch slightly slower)
Modern glassmorphism CustomTkinter window opens.

Controls

Button	Action
BROWSE	Select a `.wav` / `.mp3` / `.flac` / `.ogg` file
ANALYZE	Run genre + emotion analysis on the selected file
RECOMMEND	Get top-5 similar songs from the song database
EXPLAIN	Toggle explainability panel (fusion breakdown)
REC 5 s	Record 5 seconds from microphone → auto-analyze
🎤 LIVE MIC	Toggle continuous microphone streaming with live spectrogram and rolling inference

Output Panels

Results — Top-3 genre probabilities, full fuzzy membership bar chart, mood quadrant, valence/arousal scores
V-A Plot — Russell's circumplex scatter; marker size scales with confidence
Recommendations — Interactive hero carousel featuring 30-sec previews, Spotify links, and fused score ranks
Explainability — Intermediate fusion computation values

Architecture Overview

┌──────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Audio File  │────▶│ Feature Extract  │────▶│  Genre CNN      │──▶ Fuzzy memberships
│  or Live Mic │     │ (Mel + Stats)    │     │  (10-class FMA) │    (softmax probs)
└──────────────┘     │ × N segments     │     └─────────────────┘
                     │                  │     ┌─────────────────┐
                     │                  │────▶│  Emotion Hybrid │──▶ Valence, Arousal
                     └─────────────────┘     │  (CNN + Dense)  │    (1–9 scale)
                                              └─────────────────┘
                                                       │
                                              ┌────────▼────────┐
                                              │  Fuzzy Fusion   │
                                              │  (weighted sim) │──▶ Ranked recommendations
                                              └─────────────────┘

Input Processing

File analysis: Audio loaded at 22 050 Hz, split into segments (10 s for genre, 3 s for emotion). Predictions averaged across segments.
Live mic: Continuous 22 050 Hz stream, rolling 30 s buffer, inference every ~7 s with temporal smoothing (5-window average).
Short clips (< 10 s): Single padded segment for genre; normal 3 s segments for emotion.

Soft Computing Techniques

This project integrates multiple soft computing paradigms:

Technique	Where Used	Details
Fuzzy Logic	Genre classification	Softmax probabilities treated as fuzzy membership degrees across all 10 genres; hybrid labelling when membership gap < 0.10
Fuzzy Fusion	Recommendation scoring	Weighted combination: `0.7 × genre_similarity + 0.3 × emotion_similarity`. Genre similarity uses a graded inter-genre similarity matrix.
Neural Network (CNN)	Genre classifier	Convolutional Neural Network on 128-bin Mel spectrograms (FMA-medium, 10 classes)
Hybrid Neural Network	Emotion regressor	CNN branch (Mel spectrogram) + Dense branch (handcrafted features) → multi-output regression
Temperature Scaling	Genre softmax	`softmax(log(p) / T)` post-hoc calibration to sharpen/soften probability distributions
Temporal Smoothing	Live mic predictions	Rolling average over 5 inference windows reduces noise in real-time genre predictions
V/A Spread Transform	Emotion post-processing	Linear transform expands clustered valence/arousal predictions across the full 1–9 scale
Energy-based Arousal Boost	Emotion post-processing	RMS energy injected directly into arousal prediction to improve sensitivity to dynamic range

Development Log

#	What was done
1	Fixed DEAM training pipeline — path resolution, `iterrows()` float cast, librosa tempo array squeeze
2	Created hybrid emotion training notebook — CNN + handcrafted features → V/A regression (18 020 segments from 1 802 songs)
3	Built modular `engine/` package — `feature_extraction`, `genre_classifier`, `emotion_regressor`, `fusion`, `recommender`
4	Created retro arcade Tkinter GUI — dark navy theme, neon accents, custom `NeonButton` canvas widgets
5	Added embedded V-A scatter plot (matplotlib `FigureCanvasTkAgg`) with quadrant labels and glowing dot
6	Refactored recommendations into a sortable `ttk.Treeview` table (7 columns, neon-green rank-1 highlight)
7	Added collapsible explainability panel — fusion formula, genre bars, emotion similarity breakdown
8	Added microphone recording — 5 s at 22 050 Hz via `sounddevice`, temp WAV, auto-analyze, blinking indicator
9	Optimised startup — `ModelRegistry` singleton loads models once, warm-up passes, analysis caching
10	Multi-segment genre averaging — split audio into N segments, average softmax vectors
11	Improved mood classification — neutral buffer zone (±1.0 from midpoint)
12	Multi-segment emotion averaging — average V/A across segments for stability
13	Hybrid genre labelling — `"Hybrid: X / Y"` when top-2 gap < 0.10
14	Adaptive fusion weights — graded genre similarity matrix for cross-genre relationships
15	Temperature scaling — `softmax(log(p)/T)` on genre probabilities
16	FMA-medium dataset inspection — download, extract, analyse genre distribution
17	Trained FMA-medium genre CNN (10 classes, ~68 % val accuracy), replacing GTZAN model
18	Adaptive segmentation — short clips (< 10 s) use a single padded segment; 10-30 s clips use proportional segments
19	V/A spread transform — linear expansion of clustered predictions across full 1–9 range
20	RMS energy arousal boost — injects RMS energy into arousal to improve dynamic sensitivity
21	Temporal smoothing for live mic — rolling 5-window average of genre probabilities
22	Top-3 genre probabilities in results panel with percentages
23	Confidence-scaled V-A plot — marker size and glow intensity proportional to genre confidence
24	Live microphone mode — continuous streaming, rolling spectrogram, ~7 s inference cycle
25	Live mic → analysis panel — live inference results populate the full analysis text and V-A plot
26	Codebase cleanup — updated README, .gitignore, requirements.txt; removed unused files
27	Migrated entire UI to CustomTkinter glassmorphism design — added dynamic Hero Carousel, Spotify API album art integration, and smooth track navigation animations.

License

This project is for academic/educational purposes.

DEAM: Creative Commons — MediaEval
FMA: Creative Commons — GitHub
GTZAN: Research use — Marsyas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MUUD — Hybrid Soft Computing Music Intelligence System

Table of Contents

Features

Project Structure

Setup & Installation

Prerequisites

1. Create environment

2. Install dependencies

3. Download datasets

4. Trained Models (Included)

5. Launch the app

Datasets

DEAM (MediaEval Database for Emotional Analysis of Music)

FMA-Medium (Free Music Archive)

GTZAN Genre Collection (legacy)

Training the Models

1. Genre CNN + Transformer — FMA-medium

2. Hybrid Emotion Model — DEAM

3. Inference Test

Running the Desktop App

What happens on launch

Controls

Output Panels

Architecture Overview

Input Processing

Soft Computing Techniques

Development Log

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
engine		engine
inference		inference
models		models
training		training
ui		ui
.cache		.cache
.gitignore		.gitignore
README.md		README.md
check_libs.py		check_libs.py
main.py		main.py
requirements.txt		requirements.txt
test_spotipy.py		test_spotipy.py

Folders and files

Latest commit

History

Repository files navigation

MUUD — Hybrid Soft Computing Music Intelligence System

Table of Contents

Features

Project Structure

Setup & Installation

Prerequisites

1. Create environment

2. Install dependencies

3. Download datasets

4. Trained Models (Included)

5. Launch the app

Datasets

DEAM (MediaEval Database for Emotional Analysis of Music)

FMA-Medium (Free Music Archive)

GTZAN Genre Collection (legacy)

Training the Models

1. Genre CNN + Transformer — FMA-medium

2. Hybrid Emotion Model — DEAM

3. Inference Test

Running the Desktop App

What happens on launch

Controls

Output Panels

Architecture Overview

Input Processing

Soft Computing Techniques

Development Log

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages