Research

Our research interests lie at the intersection of Computer Vision, Deep Learning, and Natural Language Processing, with a focus on developing Artificial Intelligence (AI) systems that can ‘see’ (i.e., understand the contents of an image: who, what, where, doing what?) and ‘talk’ (i.e., communicate the understanding to humans in free-form natural language).

Research Topics

Below are some example research topics that are of interest to me in the space of vision-language:

Culturally aware and geo-diverse vision-language models
Data-efficient adaptation to new tasks
Robust automatic evaluation
Visio-linguistic reasoning (fine-grained, compositional, knowledge-based, etc.)
Generalization to out-of-distribution datasets

For more details, please check out my latest talks.

Recent Talks

Advancing multimodal vision-language learning

Area Chair Workshop @ CVPR (Jun 2024)

Visual-Language Learning

Tutorial on Visual Recognition Beyond the Comfort Zone: Adapting to Unseen Concepts on the Fly @ ICCV (Oct 2023)