Hybrid Embedding Fusion for Fake News Detection

SAFE: Selective Adaptive Fusion of Embeddings

Author: Sasmitha S
Institution: Amrita School of Artificial Intelligence, Coimbatore — Amrita Vishwa Vidyapeetham
Repo: github.com/sasmxtha/fake_news_detection

Overview

This repository implements the SAFE (Selective Adaptive Fusion of Embeddings) framework proposed in:

Hybrid Embedding Fusion for Fake News Detection with Performance–Efficiency Evaluation

SAFE selects the two most efficient embeddings ranked by the Performance–Efficiency Index (PEI) and fuses them using a lightweight attention mechanism — achieving near-transformer accuracy at a fraction of the computational cost.

Framework (Paper Figure 1)

Input News Text
      │
      ▼
NLP Preprocessing
(Tokenize → Lowercase → Stopword Removal → Lemmatize)
      │
      ├──► BoW        ──► LR ──► Acc / F1 / Recall / PEI
      ├──► TF-IDF     ──► LR ──► Acc / F1 / Recall / PEI
      ├──► Word2Vec   ──► LR ──► Acc / F1 / Recall / PEI
      ├──► GloVe      ──► LR ──► Acc / F1 / Recall / PEI
      ├──► FastText   ──► LR ──► Acc / F1 / Recall / PEI
      ├──► Doc2Vec    ──► LR ──► Acc / F1 / Recall / PEI
      ├──► BERT       ──► LR ──► Acc / F1 / Recall / PEI
      ├──► RoBERTa    ──► LR ──► Acc / F1 / Recall / PEI
      ├──► DeBERTa    ──► LR ──► Acc / F1 / Recall / PEI
      ├──► DistilBERT ──► LR ──► Acc / F1 / Recall / PEI
      ├──► BERTweet   ──► LR ──► Acc / F1 / Recall / PEI
      └──► SBERT      ──► LR ──► Acc / F1 / Recall / PEI
                                        │
                               Rank by PEI → Select Top-2
                                        │
                          Attention-based Fusion (paper Algorithm)
                            Zi = Wi · Ei
                            α  = σ(W · [Z1 ‖ Z2] + b)
                            F  = α · Z1 + (1 − α) · Z2
                                        │
                             Final LR Classifier
                                        │
                                  Fake / Real

Performance–Efficiency Index (paper Eq. 1)

$$\text{PEI} = \frac{\text{Accuracy}}{\alpha \cdot T_{train} + \beta \cdot T_{infer} + \gamma \cdot M_{size}}$$

α = β = 1, γ = 0.1 (paper Section 3)

Results

GossipCop Dataset (Top-2 PEI: TF-IDF + Word2Vec)

Model	Acc	F1	Recall	TT (s)	IT (s)	Size (MB)	PEI
TF-IDF	0.842	0.841	0.840	2.50	0.002	2.00	0.187
Word2Vec	0.804	0.802	0.800	3.00	0.002	1.00	0.259
SAFE	0.855	0.853	0.852	2.80	0.002	3.50	0.271

LIAR Dataset (Top-2 PEI: TF-IDF + Word2Vec)

Model	Acc	F1	Recall	TT (s)	IT (s)	Size (MB)	PEI
TF-IDF	0.752	0.751	0.750	2.60	0.002	1.50	0.273
Word2Vec	0.840	0.829	0.828	4.00	0.006	2.00	0.210
SAFE	0.860	0.859	0.858	2.50	0.002	3.50	0.302

ISOT Dataset (Top-2 PEI: GloVe + FastText)

Model	Acc	F1	Recall	TT (s)	IT (s)	Size (MB)	PEI
GloVe	0.825	0.824	0.823	3.00	0.002	2.00	0.258
FastText	0.820	0.819	0.818	2.80	0.002	2.00	0.273
SAFE	0.845	0.844	0.843	2.60	0.002	1.80	0.304

PolitiFact Dataset (Top-2 PEI: GloVe + FastText)

Model	Acc	F1	Recall	TT (s)	IT (s)	Size (MB)	PEI
GloVe	0.720	0.718	0.719	3.00	0.002	2.00	0.225
FastText	0.715	0.713	0.714	2.70	0.002	2.00	0.246
SAFE	0.765	0.762	0.763	2.70	0.002	2.30	0.261

Comparison with Prior Work (Table 5)

Model	Acc (%)	TT (s)	IT (s)	Size (MB)	PEI
CNN+RNN	85.0	600	0.01	30	0.0014
BERT	90.0	1800	0.05	420	4.8e-4
GBERT	93.0	2400	0.06	450	3.8e-4
SAFE	85.5	2.80	0.001	3.5	0.271

Project Structure

fake_news_detection/
├── main.py                  ← entry point
├── requirements.txt
├── README.md
├── src/
│   ├── __init__.py
│   ├── preprocessing.py     ← tokenize, stopwords, lemmatize (paper Section 3)
│   ├── embeddings.py        ← BoW, TF-IDF, Word2Vec, GloVe, FastText, Doc2Vec,
│   │                           BERT, RoBERTa, DeBERTa, DistilBERT, BERTweet, SBERT
│   ├── pei.py               ← PEI formula (paper Equation 1)
│   ├── fusion.py            ← attention-based fusion (paper Algorithm Step 3)
│   ├── safe.py              ← SAFE pipeline: Phase 1 → Phase 2 → Phase 3
│   ├── datasets.py          ← loaders for LIAR, ISOT, GossipCop, PolitiFact
│   └── visualize.py         ← paper Figures 2–6
├── results/
│   ├── results_gossipcop.json
│   ├── results_liar.json
│   ├── results_isot.json
│   ├── results_politifact.json
│   ├── results_comparison.json
│   ├── figure2_pei_comparison.png
│   ├── figure3_performance_comparison.png
│   ├── figure4_time_comparison.png
│   ├── figure5_pei_bar.png
│   └── figure6_accuracy_vs_pei.png
└── data/
    ├── liar/                ← train.tsv, valid.tsv, test.tsv
    ├── isot/                ← True.csv, Fake.csv
    ├── gossipcop/           ← gossipcop.csv
    └── politifact/          ← politifact.csv

Installation

git clone https://github.com/sasmxtha/fake_news_detection.git
cd fake_news_detection
pip install -r requirements.txt

Dataset Setup

# LIAR
wget https://www.cs.ucsb.edu/~william/data/liar_dataset.zip
unzip liar_dataset.zip -d data/liar/

# ISOT — download True.csv, Fake.csv from Kaggle → data/isot/

# GossipCop / PolitiFact — from FakeNewsNet → data/gossipcop/ / data/politifact/

Running

# Classical + distributional embeddings (no GPU needed)
python main.py --dataset liar --data_dir data

# All 4 datasets
python main.py --dataset all --data_dir data

# With GloVe pre-trained vectors
python main.py --dataset isot --data_dir data --glove_path glove.6B.100d.txt

# Include all transformer embeddings (GPU recommended)
python main.py --dataset liar --data_dir data --transformers

Citation

Sasmitha S, Prasadam Modini Vardhana, Bikkina Sri Sai Nandini,
Kanchana Rajdeep, S. Manimaran,
"Hybrid Embedding Fusion for Fake News Detection with
 Performance–Efficiency Evaluation",
Amrita School of Artificial Intelligence, Coimbatore,
Amrita Vishwa Vidyapeetham.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hybrid Embedding Fusion for Fake News Detection

SAFE: Selective Adaptive Fusion of Embeddings

Overview

Framework (Paper Figure 1)

Performance–Efficiency Index (paper Eq. 1)

Results

GossipCop Dataset (Top-2 PEI: TF-IDF + Word2Vec)

LIAR Dataset (Top-2 PEI: TF-IDF + Word2Vec)

ISOT Dataset (Top-2 PEI: GloVe + FastText)

PolitiFact Dataset (Top-2 PEI: GloVe + FastText)

Comparison with Prior Work (Table 5)

Project Structure

Installation

Dataset Setup

Running

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
results		results
src		src
README.md		README.md
main.py		main.py
push_to_github.sh		push_to_github.sh
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Hybrid Embedding Fusion for Fake News Detection

SAFE: Selective Adaptive Fusion of Embeddings

Overview

Framework (Paper Figure 1)

Performance–Efficiency Index (paper Eq. 1)

Results

GossipCop Dataset (Top-2 PEI: TF-IDF + Word2Vec)

LIAR Dataset (Top-2 PEI: TF-IDF + Word2Vec)

ISOT Dataset (Top-2 PEI: GloVe + FastText)

PolitiFact Dataset (Top-2 PEI: GloVe + FastText)

Comparison with Prior Work (Table 5)

Project Structure

Installation

Dataset Setup

Running

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages