Skip to content
View BhargavKumarNath's full-sized avatar
πŸ’­
Coding
πŸ’­
Coding

Block or report BhargavKumarNath

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
BhargavKumarNath/README.md

Typing SVG



LinkedIn Β  Email Β  Portfolio Β  Profile Views



Open to Opportunities

🎯 About Me

class MLEngineer:
    def __init__(self):
        self.name             = "Bhargav Kumar Nath"
        self.role             = "ML Engineer & Systems Researcher"
        self.education        = {
            "masters"  : "MSc Data Science & Analytics @ University of Leeds (2024–2025)",
            "bachelors": "BTech Computer Science @ Assam Don Bosco University (2020–2024)",
        }
        self.location         = "Leeds, UK πŸ‡¬πŸ‡§"
        self.projects_shipped = 11
        self.seeking          = "Full-time ML / Data Science roles"

    def current_focus(self):
        return {
            "systems_engineering": [
                "LLM Inference Optimization PagedAttention in Rust + CUDA",
                "Agentic RAG Pipelines LangGraph, Qdrant, RAGAS",
                "High-Throughput Signal Intelligence for Quant Finance",
                "100M+ Event Pipelines with DuckDB + Polars",
            ],
            "research": [
                "Hardware-Aware Neural Architecture Search (NAS)",
                "Mixed-Precision LLM Quantization & Model Compression",
                "Causal ML & Heterogeneous Treatment Effects",
                "Graph Neural Networks for Molecular Property Prediction",
            ],
        }

    def philosophy(self) -> str:
        return (
            "A model is a mathematical fantasy, but an ML system is a living entity.\n"
            "I design for the shifting reality of the human world,\n"
            "not the static perfection of a laboratory."
        )
πŸ“– My Journey in ML (Click to expand)

My path started with hands-on data engineering work and grew into a deep obsession with the boundary between research and production. I ship things that work in real data centres, on commodity hardware, under real latency constraints.

What drives me:

  • ⚑ Systems Performance Pushing hardware to its limits: PagedAttention in Rust, CUDA kernels, KV-cache optimization achieving 8–32Γ— throughput gains
  • πŸ€– Agentic AI Building reliable LLM pipelines that reason, retrieve, and act with verifiable faithfulness scores (0.91 on RAGAS)
  • 🎯 Causal Intelligence Moving beyond A/B testing to true treatment effect estimation, identifying micro-segments driving 70% of total uplift
  • πŸ”¬ Scientific ML Applying GNNs and hybrid architectures to accelerate material science and drug discovery
  • πŸ“Š Quantitative Finance Designing signal intelligence platforms for real-time algorithmic trading decisions

Three Core Principles:

  1. Escape the State-of-the-Art Trap Leaderboard victories rarely survive reality. Establish honest baselines first.
  2. Data Over Algorithms Architectures come and go; long-term success depends on data quality and distribution understanding.
  3. Deployment as the Starting Line A shipped model needs continuous monitoring to stay reliable. Production is where the real work begins.

Currently completing my MSc at the University of Leeds, specializing in advanced ML, big data architecture, and MLOps. Actively seeking full-time opportunities to build impactful ML systems.


πŸ”₯ Featured Projects

πŸ† Flagship Engineering & Research Work

Systems Achievement: Memory management system for LLM inference implementing PagedAttention achieving 8–32Γ— throughput improvement (424 sequences/GB vs. 53) by eliminating up to 90% VRAM waste from pre-allocated KV caches.

Key Innovations:

  • πŸ¦€ PagedAttention paging engine written in Rust with Python bindings via PyO3
  • ⚑ CUDA kernels via CuPy for high-performance attention computation
  • πŸ“ Non-contiguous block layout eliminating memory fragmentation
  • πŸ”„ Zero-copy interface for streaming decode

Tech Stack:
Rust CUDA Python PyO3 CuPy PagedAttention

Impact: Enables serving larger batch sizes on constrained hardware bridging research-grade LLMs and edge deployment.

GitHub Live

Engineering Achievement: High-throughput signal intelligence platform for quantitative hedge funds enabling sub-second signal extraction from fragmented high-velocity alternative data streams for real-time algorithmic trading decisions.

Key Features:

  • πŸ“Š Unified ingestion layer normalizing heterogeneous alternative data streams
  • ⚑ Sub-second latency signal extraction pipeline
  • πŸ” Pattern recognition across multi-source financial signals
  • 🎯 Clean analyst-facing dashboard built with Next.js + React

Tech Stack:
Next.js React Signal Processing Financial Analytics Alternative Data

Impact: Gives quant analysts a single pane of glass for real-time market intelligence.

GitHub Live

Research Contribution: Hardware-aware NAS framework reducing LLM VRAM by 40% with a 20% throughput gain compressing evolutionary search time from days to minutes on TinyLlama-1.1B.

Key Innovations:

  • 🧠 Hessian-guided evolutionary optimization for sensitivity-aware quantization
  • ⚑ Mixed-precision search space (FP16/INT8/INT4) with hardware cost modeling
  • 🎯 Multi-objective fitness: accuracy Γ— memory Γ— latency
  • πŸ”„ Fault-tolerant checkpointing for long-running evolutionary searches

Tech Stack:
PyTorch CUDA Genetic Algorithms Model Compression Streamlit

Impact: Enables edge deployment of large models on resource-constrained devices.

GitHub Demo

Engineering Achievement: Production agentic RAG pipeline for financial document analysis achieving 0.91 faithfulness score on RAGAS and +56% F1 improvement over naive retrieval.

Key Innovations:

  • πŸ” Hybrid retrieval: dense (Qdrant) + sparse retrieval for maximum recall
  • 🧠 LangGraph orchestration with strict tool-execution constraints
  • πŸ“Š RAGAS evaluation framework for continuous faithfulness monitoring
  • 🎯 Financial domain reasoning with hallucination guardrails

Tech Stack:
LangGraph Qdrant Python RAGAS Hybrid Retrieval

Impact: Reduces analyst time on document review with verifiably accurate, grounded answers.

GitHub Demo

Engineering Achievement: Analytics system processing 109.9M event logs on commodity hardware achieving 97% memory reduction (14.7 GB β†’ 1.9 GB) and 4.5Γ— conversion lift via propensity-modeled targeting.

Key Innovations:

  • πŸ¦† DuckDB + Polars in-process analytics replacing heavyweight Spark for sub-100M workloads
  • πŸ“‰ 97% memory footprint reduction via columnar storage + lazy evaluation
  • 🎯 LightGBM propensity modeling revealing high-conversion micro-segments
  • πŸ“Š Uplift curves + SHAP attribution for interpretable targeting decisions

Tech Stack:
DuckDB Polars LightGBM Propensity Modeling Python

Impact: Enterprise-scale behavioral analytics on a laptop no cluster required.

GitHub Demo

Research Contribution: Unified causal experimentation engine estimating Heterogeneous Treatment Effects (HTE) achieving <1ms inference latency and identifying micro-segments driving 70% of total uplift (+$0.14/user).

Key Innovations:

  • 🎯 X-Learner & Meta-Learner implementations for CATE estimation
  • 🎰 Thompson Sampling (Multi-Armed Bandit) for adaptive allocation
  • ⚑ Knowledge distillation for sub-millisecond production inference
  • πŸ“ˆ Uplift curve visualization for treatment effect stratification

Tech Stack:
CausalML X-Learners Thompson Sampling Knowledge Distillation Python

Impact: Reduces wasted spend by targeting users with highest causal lift.

GitHub Demo

πŸ” More Projects Click to Expand

πŸ›’ PricePoint Dynamics UK Supermarket Intelligence

Competitive intelligence system analyzing 9.5M+ daily prices across 67,000+ products. MAE Β£0.139 (RΒ²=0.98), proving Aldi as market price leader with 4–7 day lead time.

  • Sentence-BERT + FAISS for scalable product matching (20Γ— expansion)
  • LightGBM price forecasting + SHAP strategy analysis

NLP FAISS LightGBM SHAP Time Series

Demo

🌌 MALLORN Rare Transient Detection in Astronomy

Multi-channel RNN pipeline detecting rare Tidal Disruption Events at 4.86% class prevalence. +197% F1 improvement over GRU baseline (0.53 F1 score).

  • 6-band photometric processing + tsfresh feature engineering
  • SMOTE-ENN + focal loss for extreme class imbalance

PyTorch RNN/GRU tsfresh LightGBM Signal Processing

Demo

🎭 Synthetic Intelligence Privacy-Preserving Data Generation

Generative tabular data framework with +5.1% AUPRC over SMOTE and linear O(N) complexity via model-driven rejection sampling with manifold alignment guarantees.

  • PyTorch autoencoders + CTGAN for high-fidelity synthesis
  • Differential privacy metrics + t-SNE distribution validation

PyTorch CTGAN SDV Library Privacy AI t-SNE

Demo

πŸ§ͺ Melting Point Prediction Hybrid GNN Architecture

GNN + XGBoost fusion for thermodynamic property prediction. 20% MAE reduction vs. pure deep learning, <50ms latency (24.59K MAE).

  • Message-passing GNN + RDKit descriptor feature fusion
  • Optuna hyperparameter optimization + SHAP interpretability

PyTorch Geometric RDKit Optuna XGBoost

Demo

πŸ‹οΈ Fitness Tracker Production Spark ML Pipeline

Enterprise ML system processing 358K+ records from 1.9K+ users. 98% classification accuracy with 198 FFT-derived temporal features and 98% data compression.

  • PySpark ETL + MLflow experiment tracking + Docker
  • Signal Processing: FFT/PCA for noisy sensor data

Apache Spark Docker MLflow Signal Processing PySpark MLlib

Demo

🧬 Neural Architecture Search Genetic Algorithms

Evolutionary CNN optimization achieving 97.15% accuracy on medical imaging via custom genetic operators: selection, crossover, and mutation with fault-tolerant checkpointing.

  • Automated architecture search without gradient information
  • Streamlit deployment for real-time inference

Genetic Algorithms AutoML PyTorch Medical Imaging

🧠 Deep Learning Lab Interactive TypeScript Engine

Dependency-free mathematical neural network engine built from scratch in TypeScript for hands-on hyperparameter experimentation with live training noise injection.

  • Zero-dependency backpropagation from first principles
  • Real-time loss visualization + noise injection for robustness testing

TypeScript Neural Networks From Scratch Interactive

πŸ“Š UK Supermarket Competitive Intelligence Extended Analysis

Deep-dive into pricing strategy dynamics across major UK supermarket chains with causal analysis of competitor response patterns and Granger causality testing.

  • Time-series Granger causality for price leadership detection
  • Demand elasticity modeling across product categories

Econometrics Granger Causality Demand Modeling Python


πŸ› οΈ Technical Arsenal

Languages:
  Systems:          "Rust Β· C Β· Bash/Shell"
  Data Science:     "Python Β· R Β· SQL"
  Frontend:         "TypeScript Β· JavaScript"

Machine Learning:
  Deep Learning:    "PyTorch Β· Keras | CNN Β· RNN Β· Transformers Β· GNN"
  Classical ML:     "Scikit-Learn Β· XGBoost Β· LightGBM | Ensemble Methods"
  Specialized:      "CausalML Β· Uplift Modeling Β· NAS Β· Model Compression Β· Agentic AI"

High Performance Computing:
  GPU:              "CUDA Β· CuPy Β· TensorRT Β· PyO3 (Rust-Python bindings)"
  Inference:        "PagedAttention Β· KV-Cache Optimization Β· Mixed-Precision (FP16/INT8/INT4)"

LLM & Agentic AI:
  Frameworks:       "LangGraph Β· LangChain Β· Hugging Face Transformers"
  Vector DBs:       "Qdrant Β· FAISS | Hybrid Retrieval"
  Evaluation:       "RAGAS Β· Sentence-BERT | Faithfulness Β· Relevance Β· Groundedness"

Data Engineering:
  Big Data:         "Apache Spark (PySpark) Β· Hadoop Β· Apache Kafka Β· Airflow"
  In-Process:       "DuckDB Β· Polars Β· Pandas Β· NumPy"
  Databases:        "PostgreSQL Β· Redis Β· MySQL"
  Formats:          "Parquet Β· Arrow Β· JSON"

MLOps & Cloud:
  Containerization: "Docker Β· Kubernetes"
  Tracking:         "MLflow Β· Weights & Biases"
  Cloud:            "AWS Β· GCP"
  Serving:          "FastAPI Β· Streamlit Β· Next.js Β· React"
  CI/CD:            "GitHub Actions"

Specialized:
  Cheminformatics:  "RDKit Β· PyTorch Geometric Β· OpenCV"
  Optimization:     "Optuna Β· Ray Tune Β· Genetic Algorithms Β· Hessian Analysis"
  Statistics:       "Statsmodels Β· SciPy Β· Bayesian Inference Β· Hypothesis Testing"
🎨 Full Tech Stack Badges Click to Expand

Languages & Core

ML & Deep Learning

LLM & Agentic AI

High Performance Computing

Data Engineering

MLOps & Cloud

Deployment & Frontend

Specialized Libraries


πŸ“ˆ GitHub Analytics

GitHub Contribution Snake

πŸ’Ό Professional Experience

πŸ“Š Data Analyst

M/S Sanjog Trading
Jul 2020 – Nov 2021 Β· Guwahati, India

  • πŸ—οΈ Architected end-to-end ETL infrastructure with Pandas & NumPy for operational data pipelines
  • πŸ“ˆ Developed statistical time-series forecasting models for sales optimization
  • πŸ“Š Engineered interactive Streamlit dashboards for KPI visualization and pricing strategy

Impact: Built first production data pipelines, translating raw business data into actionable pricing intelligence

πŸ’» Data Engineering Intern

IIT Guwahati
Jul 2022 – Aug 2022 Β· Guwahati, India

  • πŸ“‹ Designed normalized MySQL schemas with ACID-compliant optimization for academic data systems
  • πŸ” Built heuristic constraint-satisfaction algorithm for automated timetable generation
  • πŸ“Š Conducted quantitative user research across institutions and EdTech competitive analysis

Impact: Reduced scheduling conflicts and improved operational efficiency for academic planning systems

πŸ“‘ Data Analyst Intern

Airports Authority of India
Jul 2023 – Aug 2023 Β· NER Regional HQ, India

  • πŸ” Analyzed lifecycle data across 1,053 IT assets, identifying failure patterns to optimize maintenance
  • πŸ—ΊοΈ Mapped enterprise network infrastructure: MPLS/ILL load balancing, core switching, firewalls
  • 🎯 Data quality assessment for SAP ERP integration covering 19,000+ employee records

Impact: Predictive maintenance insights enabling proactive asset lifecycle management


🎯 Research Frontiers

%%{init: {
  "theme": "base",
  "themeVariables": {
    "primaryColor": "#7aa2f7",
    "primaryTextColor": "#ffffff",
    "primaryBorderColor": "#7dcfff",
    "lineColor": "#e0af68",
    "secondaryColor": "#9ece6a",
    "tertiaryColor": "#bb9af7",
    "fontSize": "18px"
  }
}}%%
mindmap
  root((ML Research))
    LLM Systems
      PagedAttention & CUDA
      Inference Optimization
      KV-Cache Management
    Agentic AI
      LangGraph Orchestration
      Hybrid RAG Pipelines
      RAGAS Evaluation
    Neural Architecture Search
      Hardware-Aware NAS
      Mixed-Precision Quantization
      Multi-Objective Optimization
    Causal ML
      Uplift Modeling
      Treatment Effect Estimation
      Counterfactual Reasoning
    Scientific ML
      Graph Neural Networks
      Molecular Property Prediction
      Drug Discovery
    Quantitative Finance
      Signal Intelligence
      Alternative Data Processing
      Real-time Analytics
Loading

⚑ LLM Systems & Inference

Current Focus:

  • PagedAttention & KV-cache paging
  • Mixed-precision kernel design
  • Speculative decoding techniques

Status: 🟒 Active Engineering
Goal: Sub-linear memory scaling for long-context LLM serving

πŸ€– Agentic AI & RAG

Current Focus:

  • Multi-agent orchestration patterns
  • Retrieval faithfulness guarantees
  • Tool-use with verification loops

Status: 🟒 Active Engineering
Goal: Production-grade agentic pipelines with measurable reliability

πŸ”¬ Scientific & Causal ML

Current Focus:

  • Physics-informed neural networks
  • Drug-target interaction prediction
  • Counterfactual policy evaluation

Status: 🟑 Exploration Phase
Goal: Accelerate scientific discovery and causal decision systems


πŸŽ“ Research & Writing

πŸ“ Technical Publications LeedsFINsights

A comprehensive journey through AI's transformation from rule-based expert systems to modern neural architectures. Traces the paradigm shifts that enabled today's breakthroughs.

🏷️ AI History Deep Learning Neural Networks

Comparing gradient-based methods with evolutionary strategies for escaping local minima. Practical insights from neural architecture search.

🏷️ Optimization Genetic Algorithms Gradient Descent

Examining the intersection of AI advancement and environmental, social, and governance accountability in an era of accelerating compute demands.

🏷️ AI Ethics ESG Responsible AI


πŸ† Achievements & Impact

8–32Γ—

Throughput Gain
LLM Inference PageForge

109.9M

Events Processed
Customer Intelligence Platform

4.5Γ—

Conversion Uplift
Propensity Modeling

97%

Memory Reduction
DuckDB + Polars Pipeline

0.91

Faithfulness Score
Agentic RAG FinSight

40%

VRAM Reduction
LLM Quantization EMPAS

<1ms

Inference Latency
Experimentation Engine

+197%

F1 Improvement
Rare Transient Detection

11

Projects Shipped
End-to-end ML Systems

9.5M+

Price Records Analyzed
Market Intelligence System

+56%

F1 vs. Naive RAG
FinSight-Alpha

97.15%

CNN Accuracy
Genetic Algorithm NAS

Pinned Loading

  1. PricePoint-Dynamics-Decoding-the-UK-Supermarket-Competitive-Landscape-with-Machine-Learning PricePoint-Dynamics-Decoding-the-UK-Supermarket-Competitive-Landscape-with-Machine-Learning Public

    Jupyter Notebook 1

  2. Classifying-Grip-Strategies-Using-Machine-Learning- Classifying-Grip-Strategies-Using-Machine-Learning- Public

    Jupyter Notebook 1

  3. House-Prices-Advanced-Regression-Techniques House-Prices-Advanced-Regression-Techniques Public

    Jupyter Notebook 1

  4. Understanding-The-Impact-of-Social-Isolation-and-Loneliness-in-a-Game-Environment- Understanding-The-Impact-of-Social-Isolation-and-Loneliness-in-a-Game-Environment- Public

    Jupyter Notebook 1