Pulkit Mishra

ML Research & Engineering

Building intelligent systems with rigorous research and scalable engineering

About Me

I'm a Machine Learning Engineer at Google, developing the ML platform for Google Pay. My work spans the full ML lifecycle, from foundational research to production systems at scale. I've been immersed in the generative AI space for quite a while and closely follow its rapid developments.

Previously at Jio AICoE, I led initiatives on improving reasoning in small language models and built real-time computer vision systems. I worked across the ML spectrum - from training models to deployment, be it as REST APIs or on edge devices while also building expertise in MLOps and ML infrastructure.

Before Jio, I was a Machine Learning Research Assistant at skit.ai, where I worked on text-to-speech systems. I also interned at Hike Messenger, developing a real-time 3D avatar system. I've also had stints at a few other startups, each adding something new to my ML toolkit.

I'm an active open source contributor, participating in Google Summer of Code both as a student and mentor, and contributing to Facebook's Pysa as an MLH Fellow. Hackathons have been my creative playground, with wins including Smart India Hackathon where I built solutions for Government of Goa.

I see machine learning as modern alchemy, transforming raw data into intelligence through mathematical transmutation. In this pursuit, I follow the principle of equivalent exchange: meaningful insights require rigorous work and careful thought. Yet I've learned that our models possess emergent behaviors that transcend their mathematical foundations - a kind of computational essence that defies complete explanation. This mysterious element is what transforms mere calculation into something that appears genuinely intelligent, reminding us that even in our most advanced formulas, there remains something we cannot fully quantify (yet).

GitHub LinkedIn Email Twitter Resume

Featured Projects

Attention Entropy Optimization

Research on improving reasoning capabilities in small language models (0.5B-3B parameters) through novel decoding strategies.

View project →

Multi-stream Video Analytics

Scalable system processing 32 concurrent video streams per GPU for safety monitoring applications.

View project →

EfficientGCN Adaptation

Lightweight human activity recognition model (0.19MB) adapted for real-time performance on edge devices.

View project →

MLOps Infrastructure

End-to-end MLOps implementation on Kubernetes using Seldon, Docker, and Azure CI/CD pipelines.

View project →

See all projects →

Writing

Technical

Reflections

2025-03-15

Unlocking Reasoning in Sub-3B Parameter LLMs

An exploration of techniques to enhance logical capabilities in small language models without increasing model size.

2025-03-01

Architecting ML Systems at Scale: Lessons from Google Pay

Key insights on building robust, scalable machine learning infrastructure in production environments.

Read all technical posts →

ML Research & Engineering

About Me

Featured Projects

Attention Entropy Optimization

Multi-stream Video Analytics

EfficientGCN Adaptation

MLOps Infrastructure

Writing

Unlocking Reasoning in Sub-3B Parameter LLMs

Architecting ML Systems at Scale: Lessons from Google Pay

Before Sunrise: Conversations with AI

Digital Alchemy: From FMA to ML