Skip to content
View RitwijParmar's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report RitwijParmar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
RitwijParmar/README.md
Ritwij Aryan Parmar — LLM Inference, Robotics Perception, High-Throughput Backend Systems

AI Systems Engineer

LLM inference and serving runtimes · robotics perception · high-throughput backend infrastructure

MS Computer Science, University at Buffalo Available full-time immediately

LinkedIn Email GitHub

Focus

I build systems that make AI products usable under real constraints: latency, throughput, observability, deployment reliability, and evaluation quality. My strongest work sits at the intersection of LLM infrastructure, cloud-native backend services, and robotics/autonomy pipelines.

Featured Systems

LLM serving runtime on GCP NVIDIA L4 with paged KV cache, continuous batching, prefix caching, CUDA Graph decode, and benchmark instrumentation.

Language: Python Stars: 0

Signal: Paged KV Cache Signal: Continuous Batching Signal: CUDA Graph Decode

GitHub: Open Demo: Open
Controller-led multilingual mental-health GenAI system for English, Hindi, and Hinglish PHQ-9/GAD-7 item-level assessment with evidence extraction and safety routing.

Language: Python Stars: 1

Signal: Evidence Extraction Signal: PHQ/GAD Scoring Signal: Safety Routing

GitHub: Open Live: Open Demo: Open
Incident response copilot using Next.js, FastAPI, vLLM, telemetry grounding, runbook retrieval, remediation gating, and analyst feedback loops.

Language: Python Stars: 0

Signal: Telemetry Grounding Signal: Remediation Gates Signal: RLHF Pipeline

GitHub: Open Live: Open Demo: Open
Cloud-native decision engine for supply operations using Vertex AI Search, conversational APIs, BigQuery pipelines, operational traces, and cost attribution.

Language: Python Stars: 2

Signal: Vertex AI Search Signal: BigQuery Cost Attribution Signal: Playbook Tracing

GitHub: Open Demo: Open

Technical Experience

Distributed Robotics and Networked Embedded Sensing Lab — Research Aide, Robotics Systems Engineering

  • Reduced mapping drift to under 1.2% across 500m of GPS-denied environments by building a ROS2 visual SLAM pipeline for Boston Dynamics Spot with Gaussian Splatting integration.
  • Improved LiDAR and VIO fusion for low-texture subterranean navigation, reducing trajectory estimation error by 30% while sustaining 20 Hz real-time state estimation.

Tata Elxsi — Software Engineer Intern

  • Migrated autonomous vehicle software from ROS1 to ROS2 and tuned DDS QoS behavior in CARLA, reducing inter-module latency by 40% and achieving sub-100 ms communication latency.
  • Built an Extended Kalman Filter-based vehicle state estimator and safety-constrained route planner with deterministic state transitions, bounded-latency path generation, and validation hooks.

LLMate.ai — Backend Engineer Intern

  • Built asynchronous backend services with Spring Boot and RabbitMQ for production data workflows, reducing p95 response latency by 40%.
  • Deployed a GPT-3.5-based text-to-SQL workflow over 50,000 structured records and set up Docker/GitHub Actions CI/CD.

Stack

  • Languages: Python, C++, Java, SQL
  • AI systems: LLM inference, model serving, vLLM, Vertex AI, QLoRA, RLHF, CUDA Graphs, prompt evaluation
  • Backend and cloud: FastAPI, Spring Boot, Docker, GCP, Cloud Run, BigQuery, RabbitMQ, distributed systems, observability, CI/CD
  • Robotics and autonomy: ROS2, SLAM, LiDAR/VIO sensor fusion, CARLA, state estimation

Current Direction

I am looking for AI infrastructure, backend/product engineering, ML systems, or robotics/autonomy roles where I can own systems end to end: from low-level performance and reliability work to shipped user-facing demos.

Pinned Loading

  1. nervaflow-intelligence nervaflow-intelligence Public

    Google Cloud-native decision engine for supply operations. Uses Vertex AI Search + conversational APIs for grounded GenAI responses, BigQuery pipelines for scenario and signal aggregation, and Clou…

    Python 2

  2. HelixServe HelixServe Public

    A runtime-first LLM serving engine built to show how modern inference systems actually scale. It combines paged KV-cache allocation, continuous batching, chunked prefill, prefix caching, CUDA Graph…

    Python

  3. SRE-Nidaan SRE-Nidaan Public

    Production-style causal incident response copilot that helps teams identify what broke first, choose safer next actions, and avoid risky interventions using grounded LLM reasoning, MCP-style tool r…

    Python

  4. ManoVarta ManoVarta Public

    ManoVarta: Multilingual Conversational AI Chatbot for Mental Health Screening

    Python 1