Biography

Nice to meet you !!!😊 Feel free to contact me! My email is xinhaoli00@outlook.com. WeChat ID is lxh18470435117.
I am a MS student 🙂 at Nanjing University (2023.09—2026.06 if everything goes as expected), supervised by Prof. Limin Wang. Previously, I received a bachelor degree from Chongqing University in 2023.06 (Major in computer science and technology 🖥️, GPA: 3.9/4.0 overall rank: 1/295). I am working as a intern in ByteDance Seed. Before that, I worked as a research intern in Moonshot AI (2025.04—2026.01) and Shanghai AI lab (2023.07—2025.04) and SenseTime (2022.10—2023.06).
Research Interests
Building and Evaluation of Video Foundation Models 🚀
- Scaling Video-Language Data: InternVid and Video Foundation Models: InternVideo2
- Efficient Model Architecture of Video Foundation Models: ZeroI2V, VideoMamba
- Evaluation of Video Foundation Models: VideoEval
Building and Evaluation of Video Multimodal Large Language Models (MLLMs) 🦜
- I completed the architecture design and training of the Vision Encoder (MoonViT3d) for Kimi K2.5 and constructed the video pre-training and post-training datasets.
- Fine-Grained and Long Video Understanding: VideoChat-Flash, InternVideo2.5, TimeSuite, TPO
- Model and Benchmark for Online Video Understanding: VideoChat-Online/OVBench, StreamForest/ODV-Bench
- Benchmark for Video Caption and Retrieval: CaReBench and Complex Video Reasoning: VideoReasonBench
Post-Finetuning for Video MLLMs with Reinforcement Learning 💪
- GRPO for Spatial-Temporal Perception: VideoChat-R1, Visual Test-Time Scaling: VideoChat-R1.5/VTTS, and Video Caption: VideoCap-R1.
News and Updates
- Jan 2026: Heartfelt congratulations on the release of Kimi K2.5.
- Jan 2026: Three papers are accepted by ICLR2026.
- Feb 2025: Three papers are accepted by NIPS2025 and one are selected as spotlight.
- Apr 2025: 🔥🔥🔥We present VideoChat-R1, a new attempt of R1-style training for Video MLLM.
-
Feb 2025: One paper is accepted by ICLR2025 and two papers are accepted by CVPR2025.
- Jan 2025: 🔥🔥🔥We present VideoChat-Flash and VideoChat-Online, new video mllms and benchmarks for long video understanding.
- Jul 2024: Three papers are accepted by ECCV2024.
- Mar 2024: 🔥🔥🔥We present InternVideo2, the currently largest (6B parameters) and most powerful video foundation model.
- Mar 2024: We present VideoMamba, an efficient video backbone architecture with the potential to serve as an alternative to the video transformer architecture.
- Jan 2024: One paper is accepted by ICLR2024 (spotlight).
- Dec 2023: Accepted by 🐼.
- June 2023 Happy to graduate from Chongqing University. Thank you to all my classmates and teachers.