Avatar

Muhammad Jehanzeb Mirza

MIT, USA.


CV | Linkedin | Google Scholar | Github | Email


Hi, I am Jehanzeb Mirza. I am a Post doctoral Researcher at MIT CSAIL, in the Spoken Langugage Systems Group, led by Dr. James Glass. I received my PhD. in Computer Science (Computer Vision) from TU Graz, Austria, where I was advised by Professor Horst Bischof, and Professor Serge Belongie served as an external referee. I did my Masters from KIT, Germany in Electrical Engineering and Information Technology (ETIT) and received my Bachelors in Electrical Engineering (EE) from NUST, Pakistan.

My PhD. research mainly focused on learning from unlabeled data. During my Phd., I have worked with various data modalities including images, point clouds, videos, radar signals and most recently natural language. I am particularly interested in Self-Supervised learning for uni-modal models and also Multi-Modal Learning for vision-language models.

Contact

  • jmirza [at] mit.edu

  • Boston, USA.

Education

  • Ph.D. in Computer Vision (2021 - 2024)

    TU Graz, Austria.

    MS in ETIT (2017 - 2020)

    KIT, Germany.

    BS in EE (2013 - 2017)

    NUST, Pakistan

Recent News

09/24: 1 paper accepted at NeurIPS, 2024.
07/24: 1 paper accepted at BMVC, 2024.
07/24: 2 paper accepted at ECCV, 2024.
04/24: I successfully defended my Ph.D. thesis.
12/23: Our workshop "What's Next in Multi-Modal Foundation Models" got accepted at CVPR 2024.
10/23: Invited talk at Cohere.
10/23: Invited talk at VIS Lab, University of Amsterdam.
9/23: 1 paper accepted at NeurIPS, 2023.
9/23: Invited talk at Center for Robotics, Paris Tech.
7/23: 1 paper accepted at ICCV, 2023.
4/23: I will be attending International Computer Vision Summer School.
3/23: 2 papers accepted at CVPR, 2023.
2/23: Reviewing for CVPR, ICCV and TPAMI.
3/22: 2 papers accepted at CVPR, 2022.

Experience

  • Postdoctoral Researcher - MIT (Boston, USA): Multi-modal Learning with Speech/Audio, Vision, and Language. (11.24 - Present).
  • Research Assistant - TU Graz (Graz, Austria): Self-supervised learning and vision-language understanding (01.21 - 10.24).
  • Research Scientist Internship - Sony AI (Tokyo, Japan): Multimodal vision-language understanding (05.24 - 8.24)
  • Internship - Intel (Karlsruhe, Germany): Evaluating robustness of object detectors in degrading weather (03.19 - 08.20).

Selected Publications

Supervised Student Works (Selected)

  • Bachelor Thesis: Test-Time Adaptation for Multi-Modal Vision-Language Models (Ongoing).
  • Master Thesis: Online Test-Time Training for 3D point clouds with Masked Autoencoders (Completed) [ paper @ ICCV ]
  • Bachelor Thesis: Online Domain Incremental Learning for Driving in Adverse Weather Conditions (Completed) [ paper @ IEEE IV (oral) ]
  • Master Thesis: How Much are Data Augmentations Worth for Representation Learning in 3D Point Clouds (Completed).