bio photo



Google Scholar


Nice to meet you !!!😊

I am a MS student 🙂 at Nanjing University (2023.09—2026.06 if everything goes as expected), supervised by Prof. Limin Wang. Previously, I received a bachelor degree from Chongqing University in 2023.06 (Major in computer science and technology 🖥️, GPA: 3.9/4.0 overall rank: 1/295). I am also working as a research intern in Shanghai AI lab (2023.07—present). Before that, I worked as an intern in SenseTime (2022.10—2023.06).

Research Interests

My research interests lies at the computer vision and multimodal:

  • Vision-Language Representation Learning
  • Vision Foundation Models
  • Multimodal Instruction-following Agents
  • Parameter-Efficient Transfer Learning

I am particularly interested in the progress of the above direction in the field of video understanding. Currently, I am researching large-scale video-text pre-training and the establishment of video foundation models.

News and Updates

  • Jul 2024: Three papers accepted by ECCV2024.
  • Mar 2024: We present InternVideo2, the currently largest (6B parameters) and most powerful video foundation model.
  • Mar 2024: We present VideoMamba, an efficient video backbone architecture with the potential to serve as an alternative to the video transformer architecture.
  • Jan 2024: One paper accepted by ICLR2024 (spotlight).
  • Dec 2023: Accepted by 🐼.
  • June 2023 Happy to graduate from Chongqing University. Thank you to all my classmates and teachers.