Our research interests lie at the intersection of Computer Vision, Deep Learning, and Natural Language Processing, with a focus on developing Artificial Intelligence (AI) systems that can ‘see’ (i.e., understand the contents of an image: who, what, where, doing what?) and ‘talk’ (i.e., communicate the understanding to humans in free-form natural language).

Research Topics

Below are some example research topics that are of interest to me in the space of vision-language:

  • Culturally aware and geo-diverse vision-language models
  • Data-efficient adaptation to new tasks
  • Robust automatic evaluation
  • Visio-linguistic reasoning (fine-grained, compositional, knowledge-based, etc.)
  • Generalization to out-of-distribution datasets

For more details, please check out my latest talks.

Recent Talks

Advancing multimodal vision-language learning
Area Chair Workshop @ CVPR (Jun 2024)
Visual-Language Learning
Tutorial on Visual Recognition Beyond the Comfort Zone: Adapting to Unseen Concepts on the Fly @ ICCV (Oct 2023)