INTRODUCTION
I'm Thillai, a
Data Science Student
AI Engineer
Software Engineer
AI Engineer
Software Engineer
I'm pursuing my Master's in Data Science at Stony Brook University, with experience building agentic RAG systems, LLM applications, and vision-language models. I've worked across healthcare, finance, and astronomy, publishing research in deep learning and computer vision. Passionate about Generative AI and agentic architectures, I focus on creating scalable AI systems that deliver real-world impact.


Education
M.S. in Data Science
Stony Brook University
Expected Graduation Date: May 2026B.Tech. in Computer Science and Engineering
Vellore Institute of Technology
Graduation Date: May 2024Experience
Generative AI Intern
Zideas LLC
June 2025 — August 2025
- Developed an autonomous LLM-powered web crawler to extract and validate KYC documents from multiple regulatory websites.
- Implemented an agentic RAG architecture with scalable storage and optimized retrieval for efficient access to compliance information.
Computer Vision Researcher
ISRO – Indian Space Research Organization
May 2023 — Aug 2024
- Conducted research on X-ray radiography images from welding inspections, developing deep learning models to detect fusion flaws with high precision.
- Worked alongside engineers to streamline inspection workflows using MLOps pipelines and enhance the reliability of quality checks.
Research Intern
Indian Institute of Technology, Tirupati
May 2023 — Jul 2023
- Researched advanced transformer architectures for medical imaging, implementing UniFormer to analyze multi-phase MRI scans for liver lesion diagnosis.
- Applied specialized loss functions to address class imbalance and ranked among the top 15 teams globally in the MICCAI Liver Lesion Challenge.
Machine Learning Engineer
BillOK
Dec 2022 — Apr 2023
- Built an OCR model integrated with a language model to process invoices and extract essential fields with high accuracy, minimizing manual effort in financial operations.
- Implemented an automation pipeline linking the system with WhatsApp and email, enabling seamless large-scale invoice processing.
Competencies
Technical Competencies
These are technical skills that I have aquired thus far in my computer science career.
Languages
- Python
- Java
- C++
- JavaScript
- TypeScript
- Go
- Rust
- SQL
- Bash
- C
- R
- MATLAB
Frameworks/Tools
- PyTorch
- TensorFlow
- JAX
- LangChain
- Hugging Face
- Docker
- Kubernetes
- AWS
- GCP
- Azure
- Git
- Linux
- Apache Spark
- Kafka
Research interests
- Machine Learning
- Deep Learning
- Computer Vision
- Natural Language Processing
- Data science
- Astroinformatics
- Quantum Computing
- Generative AI
- Large Language Models
- Reinforcement Learning
- Agentic AI
- Vision-Language Models
Highlights
Featured Highlights
Here are some awards, articles, documents, certificates, and whatever else I am proud of.
Martian Terrain Classification through Federated Learning: A Decentralized Approach for Understanding the Mars
Developed a novel federated learning approach for multi-class Martian terrain classification into seven categories using DenseNet-121 architecture. The system preserves data privacy while training across distributed sources, aiding in landing site selection and safe mission planning. Extensive experimentation with HiRISE dataset demonstrated robust performance, contributing to future Mars exploration missions.
Research and Development Head of ACM-VIT Chapter
Served as Research and Development Head of ACM-VIT chapter in 2023, fostering a research-oriented culture through Data Science workshops and events. Organized innovative research projects and provided hands-on learning opportunities for students. Mentored aspiring juniors and cultivated a vibrant research environment that encouraged academic and professional growth.
Exploring Huntington’s Disease Diagnosis via Artificial Intelligence Models: A Comprehensive Review
Comprehensive review of AI-powered algorithms for Huntington's Disease diagnosis, analyzing clinical, genetic, and neuroimaging data. Systematically examined existing literature to identify trends, methodologies, and challenges in this emerging field. Discussed limitations, ethical considerations, and future research directions for improving early detection and management of HD.
Delegate for HPAIR Asia Conference 2022
Selected as a delegate for the prestigious HPAIR Asia Conference 2022 in New Delhi, where I presented on AI solutions for global crises and climate change. Engaged with global leaders and innovators, exchanging strategies and gaining new perspectives on sustainable development. Established valuable connections with professionals passionate about creating positive global impact through technology.
Deep Learning-driven Detection of Nuclear Fusion Ignition: Illuminating the Path to Clean and Sustainable Energy
Investigated three deep learning architectures—Transformers, LSTM, and ResNet50—for nuclear fusion event detection. Transformers achieved the highest accuracy, outperforming LSTM and ResNet50. The study analyzed how each model processes data: Transformers use attention weights, LSTM captures temporal dependencies, and ResNet50 learns hierarchical features. These findings advance fusion detection technologies, supporting global efforts toward sustainable energy.
Martian Terrain Classification through Federated Learning: A Decentralized Approach for Understanding the Mars
Developed a novel federated learning approach for multi-class Martian terrain classification into seven categories using DenseNet-121 architecture. The system preserves data privacy while training across distributed sources, aiding in landing site selection and safe mission planning. Extensive experimentation with HiRISE dataset demonstrated robust performance, contributing to future Mars exploration missions.
Research and Development Head of ACM-VIT Chapter
Served as Research and Development Head of ACM-VIT chapter in 2023, fostering a research-oriented culture through Data Science workshops and events. Organized innovative research projects and provided hands-on learning opportunities for students. Mentored aspiring juniors and cultivated a vibrant research environment that encouraged academic and professional growth.
Exploring Huntington’s Disease Diagnosis via Artificial Intelligence Models: A Comprehensive Review
Comprehensive review of AI-powered algorithms for Huntington's Disease diagnosis, analyzing clinical, genetic, and neuroimaging data. Systematically examined existing literature to identify trends, methodologies, and challenges in this emerging field. Discussed limitations, ethical considerations, and future research directions for improving early detection and management of HD.
Delegate for HPAIR Asia Conference 2022
Selected as a delegate for the prestigious HPAIR Asia Conference 2022 in New Delhi, where I presented on AI solutions for global crises and climate change. Engaged with global leaders and innovators, exchanging strategies and gaining new perspectives on sustainable development. Established valuable connections with professionals passionate about creating positive global impact through technology.
Deep Learning-driven Detection of Nuclear Fusion Ignition: Illuminating the Path to Clean and Sustainable Energy
Investigated three deep learning architectures—Transformers, LSTM, and ResNet50—for nuclear fusion event detection. Transformers achieved the highest accuracy, outperforming LSTM and ResNet50. The study analyzed how each model processes data: Transformers use attention weights, LSTM captures temporal dependencies, and ResNet50 learns hierarchical features. These findings advance fusion detection technologies, supporting global efforts toward sustainable energy.
Martian Terrain Classification through Federated Learning: A Decentralized Approach for Understanding the Mars
Developed a novel federated learning approach for multi-class Martian terrain classification into seven categories using DenseNet-121 architecture. The system preserves data privacy while training across distributed sources, aiding in landing site selection and safe mission planning. Extensive experimentation with HiRISE dataset demonstrated robust performance, contributing to future Mars exploration missions.
Dynamic Benchmarking Framework for LLM Evaluation
Designed and implemented a dynamic benchmarking framework to evaluate LLM accuracy using real-time, location-specific data from WeatherAPI. Generates structured Q&A pairs from live weather, air quality, and astronomical events for rigorous model assessment. Built to be extendable and model-agnostic, supporting comparative evaluations across multiple LLMs with JSON-formatted outputs.
Automated Glitch Detection in LIGO Data Streams Leveraging Deep Learning Architectures
Built a continual learning architecture for LIGO glitch detection using Vision Transformer with Continual Learning Framework. The system adaptively learns from successive data streams while preventing catastrophic forgetting, achieving 93.4% accuracy in glitch classification.
Mediquill Large Language Model (LLM)
Fine-tuned the 7-billion parameter Llama-2 model using a curated medical Q&A dataset for comprehensive medical inquiry comprehension. Enhanced the model's ability to provide accurate diagnoses, treatment recommendations, and medication information. Optimized for complex medical terminology and reliable responses, significantly improving utility for medical professionals and researchers.
Super resolution Astronomical Image Denoiser
Developed an SRGAN to remove noise from galaxy images, improving PSNR by 32.7% and reducing RMSE by 37.9%. Applied transfer learning techniques for astronomical data, enhancing celestial feature clarity by 27.3% and SSIM scores by 19.8%. Enables more precise analysis of astronomical imagery for research and space exploration.
AI-powered Virtual Fitness Trainer
Developed an AI model using Mediapipe library for real-time exercise tracking and body movement analysis. Detects key body landmarks and calculates body part angles to monitor exercises like pull-ups, push-ups, squats, and sit-ups. Provides real-time feedback and precise count tracking to maintain proper form and enhance workout effectiveness and safety.
Agentic Research Assistant
Multi-agent AI system that automates academic research, literature review, and research paper generation using advanced LLM agents. The system coordinates multiple specialized AI agents to handle different aspects of research workflow, from data collection to synthesis and document generation.
CBT-Copilot
CBT-Copilot is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct, specifically designed to simulate compassionate and supportive dialogues in the style of Cognitive Behavioral Therapy (CBT). This model provides empathetic, structured therapeutic conversations while maintaining professional boundaries.
Flash AI Search engine
Developed FlashSearch, an AI-powered search engine using Google's Gemini 2.0 Flash model with live web search results for fast, precise, and source-backed answers. Built with React, Vite, Tailwind frontend and Express.js backend, featuring real-time citations and follow-up capabilities. Demonstrates effective use of LLMs and search APIs to enhance user trust and information reliability.
Dynamic Benchmarking Framework for LLM Evaluation
Designed and implemented a dynamic benchmarking framework to evaluate LLM accuracy using real-time, location-specific data from WeatherAPI. Generates structured Q&A pairs from live weather, air quality, and astronomical events for rigorous model assessment. Built to be extendable and model-agnostic, supporting comparative evaluations across multiple LLMs with JSON-formatted outputs.
Automated Glitch Detection in LIGO Data Streams Leveraging Deep Learning Architectures
Built a continual learning architecture for LIGO glitch detection using Vision Transformer with Continual Learning Framework. The system adaptively learns from successive data streams while preventing catastrophic forgetting, achieving 93.4% accuracy in glitch classification.
Mediquill Large Language Model (LLM)
Fine-tuned the 7-billion parameter Llama-2 model using a curated medical Q&A dataset for comprehensive medical inquiry comprehension. Enhanced the model's ability to provide accurate diagnoses, treatment recommendations, and medication information. Optimized for complex medical terminology and reliable responses, significantly improving utility for medical professionals and researchers.
Super resolution Astronomical Image Denoiser
Developed an SRGAN to remove noise from galaxy images, improving PSNR by 32.7% and reducing RMSE by 37.9%. Applied transfer learning techniques for astronomical data, enhancing celestial feature clarity by 27.3% and SSIM scores by 19.8%. Enables more precise analysis of astronomical imagery for research and space exploration.
AI-powered Virtual Fitness Trainer
Developed an AI model using Mediapipe library for real-time exercise tracking and body movement analysis. Detects key body landmarks and calculates body part angles to monitor exercises like pull-ups, push-ups, squats, and sit-ups. Provides real-time feedback and precise count tracking to maintain proper form and enhance workout effectiveness and safety.
Agentic Research Assistant
Multi-agent AI system that automates academic research, literature review, and research paper generation using advanced LLM agents. The system coordinates multiple specialized AI agents to handle different aspects of research workflow, from data collection to synthesis and document generation.
CBT-Copilot
CBT-Copilot is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct, specifically designed to simulate compassionate and supportive dialogues in the style of Cognitive Behavioral Therapy (CBT). This model provides empathetic, structured therapeutic conversations while maintaining professional boundaries.
Flash AI Search engine
Developed FlashSearch, an AI-powered search engine using Google's Gemini 2.0 Flash model with live web search results for fast, precise, and source-backed answers. Built with React, Vite, Tailwind frontend and Express.js backend, featuring real-time citations and follow-up capabilities. Demonstrates effective use of LLMs and search APIs to enhance user trust and information reliability.
Dynamic Benchmarking Framework for LLM Evaluation
Designed and implemented a dynamic benchmarking framework to evaluate LLM accuracy using real-time, location-specific data from WeatherAPI. Generates structured Q&A pairs from live weather, air quality, and astronomical events for rigorous model assessment. Built to be extendable and model-agnostic, supporting comparative evaluations across multiple LLMs with JSON-formatted outputs.