Skip to content

Sasmxtha/fake_news_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hybrid Embedding Fusion for Fake News Detection

SAFE: Selective Adaptive Fusion of Embeddings

Author: Sasmitha S
Institution: Amrita School of Artificial Intelligence, Coimbatore — Amrita Vishwa Vidyapeetham
Repo: github.com/sasmxtha/fake_news_detection


Overview

This repository implements the SAFE (Selective Adaptive Fusion of Embeddings) framework proposed in:

Hybrid Embedding Fusion for Fake News Detection with Performance–Efficiency Evaluation

SAFE selects the two most efficient embeddings ranked by the Performance–Efficiency Index (PEI) and fuses them using a lightweight attention mechanism — achieving near-transformer accuracy at a fraction of the computational cost.


Framework (Paper Figure 1)

Input News Text
      │
      ▼
NLP Preprocessing
(Tokenize → Lowercase → Stopword Removal → Lemmatize)
      │
      ├──► BoW        ──► LR ──► Acc / F1 / Recall / PEI
      ├──► TF-IDF     ──► LR ──► Acc / F1 / Recall / PEI
      ├──► Word2Vec   ──► LR ──► Acc / F1 / Recall / PEI
      ├──► GloVe      ──► LR ──► Acc / F1 / Recall / PEI
      ├──► FastText   ──► LR ──► Acc / F1 / Recall / PEI
      ├──► Doc2Vec    ──► LR ──► Acc / F1 / Recall / PEI
      ├──► BERT       ──► LR ──► Acc / F1 / Recall / PEI
      ├──► RoBERTa    ──► LR ──► Acc / F1 / Recall / PEI
      ├──► DeBERTa    ──► LR ──► Acc / F1 / Recall / PEI
      ├──► DistilBERT ──► LR ──► Acc / F1 / Recall / PEI
      ├──► BERTweet   ──► LR ──► Acc / F1 / Recall / PEI
      └──► SBERT      ──► LR ──► Acc / F1 / Recall / PEI
                                        │
                               Rank by PEI → Select Top-2
                                        │
                          Attention-based Fusion (paper Algorithm)
                            Zi = Wi · Ei
                            α  = σ(W · [Z1 ‖ Z2] + b)
                            F  = α · Z1 + (1 − α) · Z2
                                        │
                             Final LR Classifier
                                        │
                                  Fake / Real

Performance–Efficiency Index (paper Eq. 1)

$$\text{PEI} = \frac{\text{Accuracy}}{\alpha \cdot T_{train} + \beta \cdot T_{infer} + \gamma \cdot M_{size}}$$

α = β = 1, γ = 0.1 (paper Section 3)


Results

GossipCop Dataset (Top-2 PEI: TF-IDF + Word2Vec)

Model Acc F1 Recall TT (s) IT (s) Size (MB) PEI
TF-IDF 0.842 0.841 0.840 2.50 0.002 2.00 0.187
Word2Vec 0.804 0.802 0.800 3.00 0.002 1.00 0.259
SAFE 0.855 0.853 0.852 2.80 0.002 3.50 0.271

LIAR Dataset (Top-2 PEI: TF-IDF + Word2Vec)

Model Acc F1 Recall TT (s) IT (s) Size (MB) PEI
TF-IDF 0.752 0.751 0.750 2.60 0.002 1.50 0.273
Word2Vec 0.840 0.829 0.828 4.00 0.006 2.00 0.210
SAFE 0.860 0.859 0.858 2.50 0.002 3.50 0.302

ISOT Dataset (Top-2 PEI: GloVe + FastText)

Model Acc F1 Recall TT (s) IT (s) Size (MB) PEI
GloVe 0.825 0.824 0.823 3.00 0.002 2.00 0.258
FastText 0.820 0.819 0.818 2.80 0.002 2.00 0.273
SAFE 0.845 0.844 0.843 2.60 0.002 1.80 0.304

PolitiFact Dataset (Top-2 PEI: GloVe + FastText)

Model Acc F1 Recall TT (s) IT (s) Size (MB) PEI
GloVe 0.720 0.718 0.719 3.00 0.002 2.00 0.225
FastText 0.715 0.713 0.714 2.70 0.002 2.00 0.246
SAFE 0.765 0.762 0.763 2.70 0.002 2.30 0.261

Comparison with Prior Work (Table 5)

Model Acc (%) TT (s) IT (s) Size (MB) PEI
CNN+RNN 85.0 600 0.01 30 0.0014
BERT 90.0 1800 0.05 420 4.8e-4
GBERT 93.0 2400 0.06 450 3.8e-4
SAFE 85.5 2.80 0.001 3.5 0.271

Project Structure

fake_news_detection/
├── main.py                  ← entry point
├── requirements.txt
├── README.md
├── src/
│   ├── __init__.py
│   ├── preprocessing.py     ← tokenize, stopwords, lemmatize (paper Section 3)
│   ├── embeddings.py        ← BoW, TF-IDF, Word2Vec, GloVe, FastText, Doc2Vec,
│   │                           BERT, RoBERTa, DeBERTa, DistilBERT, BERTweet, SBERT
│   ├── pei.py               ← PEI formula (paper Equation 1)
│   ├── fusion.py            ← attention-based fusion (paper Algorithm Step 3)
│   ├── safe.py              ← SAFE pipeline: Phase 1 → Phase 2 → Phase 3
│   ├── datasets.py          ← loaders for LIAR, ISOT, GossipCop, PolitiFact
│   └── visualize.py         ← paper Figures 2–6
├── results/
│   ├── results_gossipcop.json
│   ├── results_liar.json
│   ├── results_isot.json
│   ├── results_politifact.json
│   ├── results_comparison.json
│   ├── figure2_pei_comparison.png
│   ├── figure3_performance_comparison.png
│   ├── figure4_time_comparison.png
│   ├── figure5_pei_bar.png
│   └── figure6_accuracy_vs_pei.png
└── data/
    ├── liar/                ← train.tsv, valid.tsv, test.tsv
    ├── isot/                ← True.csv, Fake.csv
    ├── gossipcop/           ← gossipcop.csv
    └── politifact/          ← politifact.csv

Installation

git clone https://github.com/sasmxtha/fake_news_detection.git
cd fake_news_detection
pip install -r requirements.txt

Dataset Setup

# LIAR
wget https://www.cs.ucsb.edu/~william/data/liar_dataset.zip
unzip liar_dataset.zip -d data/liar/

# ISOT — download True.csv, Fake.csv from Kaggle → data/isot/

# GossipCop / PolitiFact — from FakeNewsNet → data/gossipcop/ / data/politifact/

Running

# Classical + distributional embeddings (no GPU needed)
python main.py --dataset liar --data_dir data

# All 4 datasets
python main.py --dataset all --data_dir data

# With GloVe pre-trained vectors
python main.py --dataset isot --data_dir data --glove_path glove.6B.100d.txt

# Include all transformer embeddings (GPU recommended)
python main.py --dataset liar --data_dir data --transformers

Citation

Sasmitha S, Prasadam Modini Vardhana, Bikkina Sri Sai Nandini,
Kanchana Rajdeep, S. Manimaran,
"Hybrid Embedding Fusion for Fake News Detection with
 Performance–Efficiency Evaluation",
Amrita School of Artificial Intelligence, Coimbatore,
Amrita Vishwa Vidyapeetham.

About

Hybrid embedding fusion for fake news detection using SAFE framework with Performance-Efficiency Index (PEI) across LIAR, ISOT, GossipCop and PolitiFact datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors