Author: Sasmitha S
Institution: Amrita School of Artificial Intelligence, Coimbatore — Amrita Vishwa Vidyapeetham
Repo: github.com/sasmxtha/fake_news_detection
This repository implements the SAFE (Selective Adaptive Fusion of Embeddings) framework proposed in:
Hybrid Embedding Fusion for Fake News Detection with Performance–Efficiency Evaluation
SAFE selects the two most efficient embeddings ranked by the Performance–Efficiency Index (PEI) and fuses them using a lightweight attention mechanism — achieving near-transformer accuracy at a fraction of the computational cost.
Input News Text
│
▼
NLP Preprocessing
(Tokenize → Lowercase → Stopword Removal → Lemmatize)
│
├──► BoW ──► LR ──► Acc / F1 / Recall / PEI
├──► TF-IDF ──► LR ──► Acc / F1 / Recall / PEI
├──► Word2Vec ──► LR ──► Acc / F1 / Recall / PEI
├──► GloVe ──► LR ──► Acc / F1 / Recall / PEI
├──► FastText ──► LR ──► Acc / F1 / Recall / PEI
├──► Doc2Vec ──► LR ──► Acc / F1 / Recall / PEI
├──► BERT ──► LR ──► Acc / F1 / Recall / PEI
├──► RoBERTa ──► LR ──► Acc / F1 / Recall / PEI
├──► DeBERTa ──► LR ──► Acc / F1 / Recall / PEI
├──► DistilBERT ──► LR ──► Acc / F1 / Recall / PEI
├──► BERTweet ──► LR ──► Acc / F1 / Recall / PEI
└──► SBERT ──► LR ──► Acc / F1 / Recall / PEI
│
Rank by PEI → Select Top-2
│
Attention-based Fusion (paper Algorithm)
Zi = Wi · Ei
α = σ(W · [Z1 ‖ Z2] + b)
F = α · Z1 + (1 − α) · Z2
│
Final LR Classifier
│
Fake / Real
α = β = 1, γ = 0.1 (paper Section 3)
| Model | Acc | F1 | Recall | TT (s) | IT (s) | Size (MB) | PEI |
|---|---|---|---|---|---|---|---|
| TF-IDF | 0.842 | 0.841 | 0.840 | 2.50 | 0.002 | 2.00 | 0.187 |
| Word2Vec | 0.804 | 0.802 | 0.800 | 3.00 | 0.002 | 1.00 | 0.259 |
| SAFE | 0.855 | 0.853 | 0.852 | 2.80 | 0.002 | 3.50 | 0.271 |
| Model | Acc | F1 | Recall | TT (s) | IT (s) | Size (MB) | PEI |
|---|---|---|---|---|---|---|---|
| TF-IDF | 0.752 | 0.751 | 0.750 | 2.60 | 0.002 | 1.50 | 0.273 |
| Word2Vec | 0.840 | 0.829 | 0.828 | 4.00 | 0.006 | 2.00 | 0.210 |
| SAFE | 0.860 | 0.859 | 0.858 | 2.50 | 0.002 | 3.50 | 0.302 |
| Model | Acc | F1 | Recall | TT (s) | IT (s) | Size (MB) | PEI |
|---|---|---|---|---|---|---|---|
| GloVe | 0.825 | 0.824 | 0.823 | 3.00 | 0.002 | 2.00 | 0.258 |
| FastText | 0.820 | 0.819 | 0.818 | 2.80 | 0.002 | 2.00 | 0.273 |
| SAFE | 0.845 | 0.844 | 0.843 | 2.60 | 0.002 | 1.80 | 0.304 |
| Model | Acc | F1 | Recall | TT (s) | IT (s) | Size (MB) | PEI |
|---|---|---|---|---|---|---|---|
| GloVe | 0.720 | 0.718 | 0.719 | 3.00 | 0.002 | 2.00 | 0.225 |
| FastText | 0.715 | 0.713 | 0.714 | 2.70 | 0.002 | 2.00 | 0.246 |
| SAFE | 0.765 | 0.762 | 0.763 | 2.70 | 0.002 | 2.30 | 0.261 |
| Model | Acc (%) | TT (s) | IT (s) | Size (MB) | PEI |
|---|---|---|---|---|---|
| CNN+RNN | 85.0 | 600 | 0.01 | 30 | 0.0014 |
| BERT | 90.0 | 1800 | 0.05 | 420 | 4.8e-4 |
| GBERT | 93.0 | 2400 | 0.06 | 450 | 3.8e-4 |
| SAFE | 85.5 | 2.80 | 0.001 | 3.5 | 0.271 |
fake_news_detection/
├── main.py ← entry point
├── requirements.txt
├── README.md
├── src/
│ ├── __init__.py
│ ├── preprocessing.py ← tokenize, stopwords, lemmatize (paper Section 3)
│ ├── embeddings.py ← BoW, TF-IDF, Word2Vec, GloVe, FastText, Doc2Vec,
│ │ BERT, RoBERTa, DeBERTa, DistilBERT, BERTweet, SBERT
│ ├── pei.py ← PEI formula (paper Equation 1)
│ ├── fusion.py ← attention-based fusion (paper Algorithm Step 3)
│ ├── safe.py ← SAFE pipeline: Phase 1 → Phase 2 → Phase 3
│ ├── datasets.py ← loaders for LIAR, ISOT, GossipCop, PolitiFact
│ └── visualize.py ← paper Figures 2–6
├── results/
│ ├── results_gossipcop.json
│ ├── results_liar.json
│ ├── results_isot.json
│ ├── results_politifact.json
│ ├── results_comparison.json
│ ├── figure2_pei_comparison.png
│ ├── figure3_performance_comparison.png
│ ├── figure4_time_comparison.png
│ ├── figure5_pei_bar.png
│ └── figure6_accuracy_vs_pei.png
└── data/
├── liar/ ← train.tsv, valid.tsv, test.tsv
├── isot/ ← True.csv, Fake.csv
├── gossipcop/ ← gossipcop.csv
└── politifact/ ← politifact.csv
git clone https://github.com/sasmxtha/fake_news_detection.git
cd fake_news_detection
pip install -r requirements.txt# LIAR
wget https://www.cs.ucsb.edu/~william/data/liar_dataset.zip
unzip liar_dataset.zip -d data/liar/
# ISOT — download True.csv, Fake.csv from Kaggle → data/isot/
# GossipCop / PolitiFact — from FakeNewsNet → data/gossipcop/ / data/politifact/# Classical + distributional embeddings (no GPU needed)
python main.py --dataset liar --data_dir data
# All 4 datasets
python main.py --dataset all --data_dir data
# With GloVe pre-trained vectors
python main.py --dataset isot --data_dir data --glove_path glove.6B.100d.txt
# Include all transformer embeddings (GPU recommended)
python main.py --dataset liar --data_dir data --transformersSasmitha S, Prasadam Modini Vardhana, Bikkina Sri Sai Nandini,
Kanchana Rajdeep, S. Manimaran,
"Hybrid Embedding Fusion for Fake News Detection with
Performance–Efficiency Evaluation",
Amrita School of Artificial Intelligence, Coimbatore,
Amrita Vishwa Vidyapeetham.