Hi! I am a Ph.D. student at the Univeristy of British Columbia (UBC) CS advised by Vered Shwartz and Raymond NG in the Natural Language Processing group. I frequently collaborate with Leonid Sigal in the Computer Vision group.

I work at the intersection of NLP, Vision and cognitively-inspired reasoning. The goal of my research is to develop robust and safe AI models that move beyond pattern recognition to achieve a deeper, context-sensitive understanding of the world. Specifically, I evaluate and improve critical reasoning capabilities across multiple modalities (text, images, videos) - which are fundamental for safe and effective deployment in real-world applications such as embodied agents and AR/VR.

Experience

Summer 2024

Research Intern, FAIR Communication and Language

Generating fine-grained facial expressions using semantically meaningful pose tokens, improving predictability and precise control crucial for controllable video generation.

Summer 2024

Research Intern, Microsoft Research

Evaluating 3D spatial reasoning abilities of Vision Language Models in ego-centric videos.

Summer 2023

Research Intern, Meta Reality Labs

Iterative-DPO and ranking methods for teaching creative abilities to smaller LMs.

2019 - 2021

AI Research Engineer, OpenML & TU/e

I worked as an AI research engineer in the Machine Learning group at Eindhoven University of Technology. Together with an amazing team supervised by Joaquin Vanschoren, I worked on research and development of open-source software for open and automated machine learning. I also worked with Mykola Pechenizkiy on active learning and interpretability of NLP models.

Research Focus

My focus is on the following interconnected areas:

➊ Robustness in VLMs
I have evaluated models on adapting to surprising or expectation-violating events, which is crucial for real-world safety. My recent work Black Swan evaluates models on abductive and defeasible reasoning in unpredictable video settings, testing whether models can revise their beliefs when presented with new evidence. In my ongoing work SPIKE, I am working on post-training methods that teach VLMs to be revise beliefs and become resilient to such scenarios.

➋ Grounded Reasoning
I have developed models that infer real-world events and object dynamics using world knowledge. This includes reasoning over interconnected events in text, event coreference, and grounded vision-language understanding models.

➌ Feedback Driven Learning
To scale complex reasoning capabilities such as creativity, which do not adhere to a step-by-step reasoning paradigm, I have worked on feedback-driven iterative DPO and showed that it improves performance. I am currently exploring the role of multi-aspect feedback in improving image and video generation.

➍ Inclusive Models
I have worked on evaluating VLMs on retrieval and grounding in multicultural settings and analyzing the understanding of cultural norms in LLMs (CulturalBench).

If you are interested in any of the above areas, and would like to collaborate feel free to reach out.

Sahithya