bio photo

Email

Github

Google Scholar

Selected

* equal contribution

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding. [Paper][Code]

Yi Wang*, Kunchang Li*, Xinhao Li*, Jiashuo Yu*, Yinan He*, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, SongZe Li, hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang (* Equal contribution) (ECCV2024)

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling [Paper][Code]

Xinhao Li*, Yi Wang*, Jiashuo Yu*, Xiangyu Zeng, Yuhan Zhu, Haian Huang, Jianfei Gao, Kunchang Li, Yinan He, Chenting Wang, Yu Qiao, Yali Wang, Limin Wang

Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method [Paper]

Zhenpeng Huang*, Xinhao Li*, Jiaqi Li*, Jing Wang, Xiangyu Zeng, Cheng Liang, Tao Wu, Xi Chen, Liang Li, Limin Wang (CVPR2025)

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling [Paper][Code]

Yi Wang*, Xinhao Li*, Ziang Yan*, Yinan He*, Jiashuo Yu*, Xiangyu Zeng, Chenting Wang, Changlian Ma, Haian Huang, Jianfei Gao, Min Dou, Kai Chen, Wenhai Wang, Yu Qiao, Yali Wang, Limin Wang

VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model. [Paper][Code]

Xinhao Li, Zhenpeng Huang, Jing Wang, Kunchang Li, Limin Wang

ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video. [Paper][Code]

Xinhao Li, Yuhan Zhu, Limin Wang (ECCV2024)

All

2025

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling [Paper][Code]

Xinhao Li*, Yi Wang*, Jiashuo Yu*, Xiangyu Zeng, Yuhan Zhu, Haian Huang, Jianfei Gao, Kunchang Li, Yinan He, Chenting Wang, Yu Qiao, Yali Wang, Limin Wang

Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method [Paper]

Zhenpeng Huang*, Xinhao Li*, Jiaqi Li* Jing Wang, Xiangyu Zeng, Cheng Liang, Tao Wu, Xi Chen, Liang Li, Limin Wang (CVPR2025)

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling [Paper]

Yi Wang*, Xinhao Li*, Ziang Yan*, Yinan He*, Jiashuo Yu*, Xiangyu Zeng, Chenting Wang, Changlian Ma, Haian Huang, Jianfei Gao, Min Dou, Kai Chen, Wenhai Wang, Yu Qiao, Yali Wang, Limin Wang

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment. [Paper][Code]

Ziang Yan, Zhilin Li, Yinan He, Chenting Wang, Kunchang Li, Xinhao Li, Xiangyu Zeng, Zilei Wang, Yali Wang, Yu Qiao, Limin Wang, Yi Wang (CVPR2025)

Fine-grained Video-Text Retrieval: A New Benchmark and Method [Paper]

Yifan Xu, Xinhao Li, Yichun Yang, Rui Huang, Limin Wang

2024

VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model. [Paper][Code]

Xinhao Li, Zhenpeng Huang, Jing Wang, Kunchang Li, Limin Wang

VideoMamba: State Space Model for Efficient Video Understanding. [Paper][Code]

Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, Limin Wang, Yu Qiao (ECCV2024)

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding. [Paper][Code]

Yi Wang*, Kunchang Li*, Xinhao Li*, Jiashuo Yu*, Yinan He*, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, SongZe Li, hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang (ECCV2024)

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning. [Paper]

Xiangyu Zeng, Kunchang Li, Chenting Wang, Xinhao Li, Tianxiang Jiang, Ziang Yan, Songze Li, Yansong Shi, Zhengrong Yue, Yi Wang, Yali Wang, Yu Qiao, Limin Wang (ICLR2025)

2023

ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video. [Paper][Code]

Xinhao Li, Yuhan Zhu, Limin Wang (ECCV2024)

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation. [Paper][Code]

Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu, Xin Ma, Xinhao Li, Guo Chen, Xinyuan Chen, Yaohui Wang, Ping Luo, Ziwei Liu, Yali Wang, Limin Wang, Yu Qiao (ICLR2024 spotlight)