Unlocking Reasoning in Sub-3B Parameter LLMs
An exploration of techniques to enhance logical capabilities in small language models without increasing model size.
Building intelligent systems with rigorous research and scalable engineering
I'm a Machine Learning Engineer at Google, developing the ML platform for Google Pay. My work spans the full ML lifecycle, from foundational research to production systems at scale. I've been immersed in the generative AI space for quite a while and closely follow its rapid developments.
Previously at Jio AICoE, I led initiatives on improving reasoning in small language models and built real-time computer vision systems. I worked across the ML spectrum - from training models to deployment, be it as REST APIs or on edge devices while also building expertise in MLOps and ML infrastructure.
Before Jio, I was a Machine Learning Research Assistant at skit.ai, where I worked on text-to-speech systems. I also interned at Hike Messenger, developing a real-time 3D avatar system. I've also had stints at a few other startups, each adding something new to my ML toolkit.
I'm an active open source contributor, participating in Google Summer of Code both as a student and mentor, and contributing to Facebook's Pysa as an MLH Fellow. Hackathons have been my creative playground, with wins including Smart India Hackathon where I built solutions for Government of Goa.
I see machine learning as modern alchemy, transforming raw data into intelligence through mathematical transmutation. In this pursuit, I follow the principle of equivalent exchange: meaningful insights require rigorous work and careful thought. Yet I've learned that our models possess emergent behaviors that transcend their mathematical foundations - a kind of computational essence that defies complete explanation. This mysterious element is what transforms mere calculation into something that appears genuinely intelligent, reminding us that even in our most advanced formulas, there remains something we cannot fully quantify (yet).
Research on improving reasoning capabilities in small language models (0.5B-3B parameters) through novel decoding strategies.
View project →Scalable system processing 32 concurrent video streams per GPU for safety monitoring applications.
View project →Lightweight human activity recognition model (0.19MB) adapted for real-time performance on edge devices.
View project →End-to-end MLOps implementation on Kubernetes using Seldon, Docker, and Azure CI/CD pipelines.
View project →An exploration of techniques to enhance logical capabilities in small language models without increasing model size.
Key insights on building robust, scalable machine learning infrastructure in production environments.