bio photo

Email

Github

Google Scholar

2024

VideoMamba: State Space Model for Efficient Video Understanding. [Paper][Code]

Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, Limin Wang, Yu Qiao

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding. [Paper][Code]

Yi Wang*, Kunchang Li*, Xinhao Li*, Jiashuo Yu*, Yinan He*, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, SongZe Li, hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang (* Equal contribution)

2023

ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video. [Paper][Code]

Xinhao Li, Limin Wang

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation. [Paper][Code]

Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu, Xin Ma, Xinhao Li, Guo Chen, Xinyuan Chen, Yaohui Wang, Ping Luo, Ziwei Liu, Yali Wang, Limin Wang, Yu Qiao (ICLR2024 spotlight)