Hi, I am Jehanzeb Mirza. I am a Post doctoral Researcher at MIT CSAIL, in the Spoken Langugage Systems Group, led by Dr. James Glass. I received my PhD. in Computer Science (Computer Vision) from TU Graz, Austria, where I was advised by Professor Horst Bischof, and Professor Serge Belongie served as an external referee. I did my Masters from KIT, Germany in Electrical Engineering and Information Technology (ETIT) and received my Bachelors in Electrical Engineering (EE) from NUST, Pakistan.

I am particularly interested in self-supervised learning for uni-modal models and multi-modal learning for vision-language models, with a focus on improving fine-grained understanding. I am actively looking for collaborators in the area of multi-modal learning. Please do not hessitate to write me an email, even if you just want an opinion on your work! :)

Contact

  • jmirza [at] mit.edu

  • Boston, USA.

Education

  • Ph.D. in Computer Vision (2021 - 2024)

    TU Graz, Austria.

    MS in ETIT (2017 - 2020)

    KIT, Germany.

    BS in EE (2013 - 2017)

    NUST, Pakistan

Recent News

12/24: Our workshop "What's Next in Multi-Modal Foundation Models" got accepted at CVPR 2025.
11/24: I joined MIT CSAIL as a Postdoctoral Researcher.
11/24: 1 paper accepted at 3DV, 2025.
09/24: 1 paper accepted at NeurIPS, 2024.
07/24: 1 paper accepted at BMVC, 2024.
07/24: 2 papers accepted at ECCV, 2024.
04/24: I successfully defended my Ph.D. thesis.
12/23: Our workshop "What's Next in Multi-Modal Foundation Models" got accepted at CVPR 2024.
10/23: Invited talk at Cohere.
10/23: Invited talk at VIS Lab, University of Amsterdam.
9/23: 1 paper accepted at NeurIPS, 2023.
9/23: Invited talk at Center for Robotics, Paris Tech.
7/23: 1 paper accepted at ICCV, 2023.
4/23: I will be attending International Computer Vision Summer School.
3/23: 2 papers accepted at CVPR, 2023.
2/23: Reviewing for CVPR, ICCV and TPAMI.
3/22: 2 papers accepted at CVPR, 2022.

Experience

  • Postdoctoral Researcher - MIT (Boston, USA): Multi-modal Learning with Speech/Audio, Vision, and Language. (11.24 - Present).
  • Research Assistant - TU Graz (Graz, Austria): Self-supervised learning and vision-language understanding (01.21 - 10.24).
  • Research Scientist Internship - Sony AI (Tokyo, Japan): Multimodal vision-language understanding (05.24 - 8.24)
  • Internship - Intel (Karlsruhe, Germany): Evaluating robustness of object detectors in degrading weather (03.19 - 08.20).

Selected Publications

Supervised Student Works (Selected)

  • Bachelor Thesis: Test-Time Adaptation for Multi-Modal Vision-Language Models (Ongoing).
  • Master Thesis: Online Test-Time Training for 3D point clouds with Masked Autoencoders (Completed) [ paper @ ICCV ]
  • Bachelor Thesis: Online Domain Incremental Learning for Driving in Adverse Weather Conditions (Completed) [ paper @ IEEE IV (oral) ]
  • Master Thesis: How Much are Data Augmentations Worth for Representation Learning in 3D Point Clouds (Completed).