Manu Gaur

Hi, I'm a research assistant at CVIT, IIIT-H under Dr. Makarand Tapaswi. I recently graduated from Delhi Technological University with a major in Applied Physics. I am originally from New Delhi, India.

At Google I've worked on Glass, Lens Blur, HDR+, VR, Portrait Mode, Portrait Light, and Maps. I did my PhD at UC Berkeley, where I was advised by Jitendra Malik. I've received the PAMI Young Researcher Award.

Email  /  Resume  /  Twitter  /  Google Scholar  /  Github

profile photo
News

Research

I am interested in self-supervised learning, multimodal models, generative modelling and reinforcement learning. My long term goal is ...

PontTuset No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur*, Darshan Singh, Makarand Tapaswi,
Under Review

MLLMs generate generic captions. We outperform MLLMs 30x size.

PontTuset D3: Evaluating MLLM's capacity for Finegrained Visual Discrimination through Self-Retrieval
Manu Gaur*, Darshan Singh, Makarand Tapaswi,
ECCV EVAL-FoMo Workshop, 2024

Closed source MLLMs struggle in capturing finegrained visual differences, with open-source models failing to outperform random guess.