Jehanzeb Mirza

Jehanzeb Mirza

Jehanzeb Mirza

MIT, USA.

CV | LinkedIn | Google Scholar | Github | Email

Hi, I am Jehanzeb Mirza. I am a Staff Research Scientist at Xero, where I build agentic AI systems for financial applications (tool use, structured reasoning, evaluation, and optimization). Previously, I was a Postdoctoral Researcher at MIT CSAIL in the Spoken Language Systems Group led by Dr. James Glass. I received my Ph.D. in Computer Science (Computer Vision) from TU Graz, Austria, where I was advised by Professor Horst Bischof, and Professor Serge Belongie served as an external referee.

My research spans multimodal foundation models (vision, language, audio) and test-time learning, with an emphasis on robust reasoning and decision-making. I am particularly interested in building reliable AI agents that can interface with tools and operate over complex structured data.

Selected work has been featured by MIT News and CSAIL research spotlights. I’m always happy to connect with student collaborators and researchers working on multimodal learning, LLM/VLM reasoning, and agentic systems—feel free to email me for feedback or collaboration.

Contact

  • jmirza [at] mit.edu
  • Office: 32-G442.
  • MIT, Cambridge, USA.

Education

  • Ph.D. in Computer Vision (2021 - 2024)
    TU Graz, Austria.
  • MS in ETIT (2017 - 2020)
    KIT, Germany.
  • BS in EE (2013 - 2017)
    NUST, Pakistan

Recent News

10/25: Our recent ICCV work was covered by MIT News: Story.
10/25: I have been recognized as NeurIPS 2025 Exceptional Reviewer.
09/25: My recent research was covered by MIT-CSAIL: Blogpost | Video.
09/25: 1 paper accepted at NeurIPS, 2025.
08/25: 1 paper accepted at TMLR, 2025.
07/25: 1 paper accepted at COLM, 2025.
06/25: 2 paper accepted at ICCV, 2025.
04/25: Our workshops "Long Multi-Scene Video Foundations" and "MMFM" got accepted at ICCV 2025.
03/25: Talk at EI Seminar, MIT-CSAIL.
02/25: 2 paper accepted at CVPR, 2025 (workshops).
01/25: 3 papers accepted at ICLR, 2025.
12/24: Our workshop "What's Next in Multi-Modal Foundation Models" got accepted at CVPR 2025.
11/24: I joined MIT CSAIL as a Postdoctoral Researcher.
11/24: 1 paper accepted at 3DV, 2025.
09/24: 1 paper accepted at NeurIPS, 2024.
07/24: 1 paper accepted at BMVC, 2024.
07/24: 2 papers accepted at ECCV, 2024.
04/24: I successfully defended my Ph.D. thesis.
12/23: Our workshop "What's Next in Multi-Modal Foundation Models" got accepted at CVPR 2024.
10/23: Invited talk at Cohere.
10/23: Invited talk at VIS Lab, University of Amsterdam.
9/23: 1 paper accepted at NeurIPS, 2023.
9/23: Invited talk at Center for Robotics, Paris Tech.
7/23: 1 paper accepted at ICCV, 2023.
4/23: I will be attending ICVSS 2023.
3/23: 2 papers accepted at CVPR, 2023.
2/23: Reviewing for CVPR, ICCV, and TPAMI.
3/22: 2 papers accepted at CVPR, 2022.

Experience

  • Staff Research Scientist - Xero (USA): Agentic AI systems for financial applications: tool use, structured reasoning, evaluation and optimization. (01.26 - Present).
  • Postdoctoral Researcher - MIT CSAIL (Boston, USA): Multimodal learning with speech/audio, vision, and language. (11.24 - 12.25).
  • Research Assistant - TU Graz (Graz, Austria): Self-supervised learning, test-time adaptation, and vision-language understanding. (01.21 - 10.24).
  • Research Scientist Internship - Sony AI (Tokyo, Japan): Multimodal learning with vision, language, and audio. (05.24 - 08.24).
  • Internship - Intel (Karlsruhe, Germany): Robustness of 2D/3D perception systems in adverse conditions for autonomous driving. (03.19 - 08.20).
  • Selected Publications

    Publication thumbnail
    TTRV: Test-Time Reinforcement Learning for Vision Language Models
    CVPR 2026
    Publication thumbnail
    Teaching VLMs to Localize Specific Objects from In-context Examples
    ICCV 2025
    Publication thumbnail
    GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
    TMLR 2025
    Publication thumbnail
    Are Vision Language Models Texture or Shape Biased and Can We Steer Them?
    ICLR 2025
    [ Paper]
    Publication thumbnail
    Mining your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
    ICLR 2025
    [ Paper]
    Publication thumbnail
    ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
    NeurIPS 2024
    Publication thumbnail
    Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
    ECCV 2024
    Publication thumbnail
    Towards Multimodal In-Context Learning for Vision & Language Models
    ECCVW 2024
    [ Paper]
    Publication thumbnail
    LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections
    NeurIPS 2023
    Publication thumbnail
    MATE: Masked Autoencoders are Online 3D Test-Time Learners
    *M. Jehanzeb Mirza, *Inkyu Shin, *Wei Lin, Andreas Schriebl, Kunyang Sun, Jaesung Choe, Mateusz Kozinski, Horst Possegger, In So Kweon, Kun-Jin Yoon, Horst Bischof (*Equal Contribution)
    ICCV 2023
    Publication thumbnail
    ActMAD: Activation Matching to Align Distributions for Test-Time-Training
    CVPR 2023
    Publication thumbnail
    Video Test-Time Adaptation for Action Recognition
    *Wei Lin, *M. Jehanzeb Mirza, Mateusz Kozinski, Horst Possegger, Hilde Kuehne, Horst Bischof (*Equal Contribution)
    CVPR 2023
    [ Paper | Code]
    Publication thumbnail
    The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization
    CVPR 2022
    [ Paper | Code]