bio photo

Email

Github

Google Scholar

Selected

* equal contribution

InternVideo2

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding. [Paper][Code]

Yi Wang*, Kunchang Li*, Xinhao Li*, Jiashuo Yu*, Yinan He*, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, SongZe Li, hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang (ECCV2024)

InternVideo2.5

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling [Paper][Code]

Yi Wang*, Xinhao Li*, Ziang Yan*, Yinan He*, Jiashuo Yu*, Xiangyu Zeng, Chenting Wang, Changlian Ma, Haian Huang, Jianfei Gao, Min Dou, Kai Chen, Wenhai Wang, Yu Qiao, Yali Wang, Limin Wang

VideoChat-Flash

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling [Paper][Code]

Xinhao Li*, Yi Wang*, Jiashuo Yu*, Xiangyu Zeng, Yuhan Zhu, Haian Huang, Jianfei Gao, Kunchang Li, Yinan He, Chenting Wang, Yu Qiao, Yali Wang, Limin Wang

VideoChat-Online

Online Video Understanding: OVBench and VideoChat-Online [Paper]

Zhenpeng Huang*, Xinhao Li*, Jiaqi Li*, Jing Wang, Xiangyu Zeng, Cheng Liang, Tao Wu, Xi Chen, Liang Li, Limin Wang (CVPR2025)

VideoChat-R1

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning. [Paper][Code]

Xinhao Li*, Ziang Yan*, Desen Meng*, Lu Dong, Xiangyu Zeng, Yinan He, Yali Wang, Yu Qiao, Yi Wang, Limin Wang

VideoChat-R1.5

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception. [Paper][Code]

Ziang Yan*, Xinhao Li*, Yinan He*, Zhengrong Yue, Xiangyu Zeng, Yali Wang, Yu Qiao, Limin Wang, Yi Wang (NIPS2025)

VideoEval

VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model. [Paper][Code]

Xinhao Li, Zhenpeng Huang, Jing Wang, Kunchang Li, Limin Wang

ZeroI2V

ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video. [Paper][Code]

Xinhao Li, Yuhan Zhu, Limin Wang (ECCV2024)

Kimi-VL

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities [Paper][Code]

Kimi Team

All

See my Google Scholar.