Available for new opportunities

I'm Thillai,
an ML Engineer
building scalable AI systems.

Learn More ↓
Thillai Chithambaram

About Me

I'm an ML Engineer specializing in LLM systems, scalable inference, and production-scale GenAI infrastructure. Currently pursuing my Master's in Data Science at Stony Brook University.

My work centers on optimizing LLM inference pipelines - from FlashAttention and speculative decoding to KV cache optimization and CUDA kernel profiling. I build systems that serve models faster and more efficiently at scale.

I'm an active open source contributor to vLLM, llm-d, and EasyEdit, working on the infrastructure that powers LLM serving for thousands of developers. Previously, I've shipped production AI systems at Zideas LLC and conducted research at ISRO and IIT.

What I bring to the table:

LLM Inference & Optimization +
Deep expertise in LLM serving - vLLM, SGLang, FlashAttention, speculative decoding, quantization, KV cache optimization, CUDA profiling with Nsight and nvprof, and Triton kernel development.
Scalable AI Systems +
Building production-grade AI infrastructure - distributed training (FSDP, DDP), agentic RAG architectures, multi-agent systems, and end-to-end ML pipelines on AWS, GCP, and Kubernetes.
Open Source & Research +
Active contributor to vLLM (52k+ ★), llm-d, and EasyEdit. Published research in computer vision and deep learning across IEEE, MDPI, and MICCAI venues.

Work Experience

Dec 2025 - Present

Research Assistant

Stony Brook University - New York, USA
  • Researching LLM knowledge editing methods and unsafe compliance behavior - investigating how models produce unsafe content in response to unsafe requests through measurable evidence from the pretraining corpus.
  • Developing evaluation frameworks to trace model safety failures back to pretraining data, enabling targeted interventions for improving LLM alignment and safety.
May 2025 - Aug 2025

Applied AI Engineer Intern

Zideas LLC - New York, USA
  • Built a production-grade LLM document intelligence system to autonomously crawl, parse, and validate KYC artifacts across multiple regulatory sources.
  • Designed an agentic hybrid RAG + vector indexing architecture with optimized LLM inference via prompt compression and caching.
Jan 2024 - Aug 2024

Computer Vision Researcher

ISRO - Liquid Propulsion Systems Centre, Bengaluru
  • Developed a visual defect detection pipeline for X-ray radiography analysis of welded aerospace components using deep learning.
  • Designed a SegFormer-based segmentation model integrated with Kubeflow pipelines for automated quality inspection workflows.
May 2023 - Jul 2023

Research Intern

Indian Institute of Technology, Tirupati
  • Implemented a UniFormer transformer model for liver lesion diagnosis from multi-phase MRI scans.
  • Ranked among the top 15 teams globally in the MICCAI Liver Lesion Diagnosis Challenge.
Dec 2022 - Apr 2023

Machine Learning Engineer

BillOK
  • Built an OCR model integrated with a language model to process invoices and extract essential fields for financial operations.
  • Implemented an automation pipeline linking the system with WhatsApp and email for large-scale invoice processing.

Contributions

vLLM Logo
52k+

vLLM

vllm-project

A high-throughput and memory-efficient inference and serving engine for large language models. Contributed to core infrastructure, improving serving performance and developer experience.

LLM Inference Python CUDA Performance
View on GitHub
llm-d Logo
2.8k+

llm-d

llm-d

Distributed LLM serving infrastructure designed for Kubernetes-native deployments. Contributed to the disaggregated serving architecture and deployment tooling for scalable LLM inference.

Kubernetes Distributed Systems Go Infrastructure
View on GitHub
EasyEdit Logo
2.7k+

EasyEdit (ACL 2024)

zjunlp

An easy-to-use knowledge editing framework for large language models. Contributed to improving model editing capabilities and extending the framework's support for new editing methods.

Knowledge Editing LLMs Python Research
View on GitHub

Academic Background

M.S. in Data Science

Stony Brook University

Expected Graduation: May 2026

B.Tech. in Computer Science and Engineering

Vellore Institute of Technology

Graduated: May 2024

Technical Skills

Languages

Python C++ CUDA Triton Java Bash SQL Rust

ML & Inference

PyTorch TensorFlow JAX vLLM SGLang Hugging Face ONNX Runtime Triton Kernels FlashAttention Speculative Decoding Quantization KV Cache Optimization FSDP / DDP MLflow W&B

Cloud, DevOps & Agents

AWS GCP Azure Docker Kubernetes Spark Kafka Linux Git CI/CD LangChain LangGraph LlamaIndex AutoGen MCP

Featured Highlights

HPAIR Conference

Harvard Project for Asian and International Relations

Delegate for HPAIR Asia Conference 2022

Selected as a delegate for the prestigious HPAIR Asia Conference 2022 in New Delhi, presenting on AI solutions for global crises and climate change.

Nuclear Fusion Research

Research Paper - IEEE

Deep Learning-driven Detection of Nuclear Fusion Ignition

Investigated three deep learning architectures - Transformers, LSTM, and ResNet50 - for nuclear fusion event detection. Transformers achieved the highest accuracy.

Read Paper
Martian Terrain

Research Paper - IEEE

Martian Terrain Classification through Federated Learning

Developed a novel federated learning approach for multi-class Martian terrain classification using DenseNet-121 architecture while preserving data privacy.

Read Paper
ACM Research Head

Association for Computing Machinery (ACM)

Research and Development Head of ACM-VIT Chapter

Served as R&D Head in 2023, fostering a research-oriented culture through Data Science workshops and mentoring aspiring researchers.

Huntington's Disease Research

Review Article - MDPI

Exploring Huntington's Disease Diagnosis via AI Models

Comprehensive review of AI-powered algorithms for Huntington's Disease diagnosis, analyzing clinical, genetic, and neuroimaging data.

Read Paper

Latest Projects

Agentic Research Assistant

Multi-Agent AI

Agentic Research Assistant

Multi-agent AI system that automates academic research, literature review, and research paper generation using advanced LLM agents.

View Project
Vision Language Driving Perception

Vision-Language Models

Vision Language Driving Perception

VLM fine-tuning pipeline for autonomous driving with distributed training, TensorRT optimization, and custom evaluation metrics.

View Project
CBT-Copilot

Mental Health AI

CBT-Copilot

Fine-tuned Llama-3.2-3B-Instruct for compassionate CBT-style therapeutic conversations while maintaining professional boundaries.

View Project
Flash AI Search

Generative AI

Flash AI Search Engine

AI-powered search engine using Gemini 2.0 Flash with live web search results for fast, precise, source-backed answers.

View Project
LLM Benchmark

Generative AI

Dynamic Benchmarking Framework

Dynamic benchmarking framework evaluating LLM accuracy using real-time, location-specific data from WeatherAPI.

View Project
LIGO Glitch Detection

Astroinformatics

Continual LIGO Glitch Detection

Continual learning architecture for LIGO glitch detection using Vision Transformer, achieving 93.4% accuracy in glitch classification.

View Project
MediQuill LLM

Generative AI

MediQuill LLM

Fine-tuned Llama-2 7B on curated medical Q&A data for accurate diagnoses, treatment recommendations, and drug information.

View Project
Astronomical Image Denoiser

Astroinformatics

Super Resolution Astronomical Denoiser

SRGAN for galaxy image denoising, improving PSNR by 32.7% and SSIM by 19.8% using transfer learning techniques.

View Project
AI Fitness Trainer

Fitness Analytics

AI-powered Virtual Fitness Trainer

Real-time exercise tracking using Mediapipe for body landmark detection, angle calculation, and form correction feedback.

View Project
Python PyTorch TensorFlow LangChain Hugging Face Docker Kubernetes AWS GCP JAX vLLM React Go Rust Apache Spark Kafka Python PyTorch TensorFlow LangChain Hugging Face Docker Kubernetes AWS GCP JAX vLLM React Go Rust Apache Spark Kafka