About me

I am an Associate Researcher at the School of Computer Science and Technology, East China Normal University. I received my Ph.D. in Computer Application Technology from Fudan University (2019.09-2024.06) in FVL Lab, where I was supervised by Prof. Yu-Gang Jiang and Prof. Jingjing Chen.

My recent research focuses on Multimodal Learning, Video Understanding, and Embodied AI. More specifically, I am interested in egocentric and streaming video understanding with MLLMs, with an emphasis on video question answering, spatio-temporal grounding, and embodied agents for real-world interaction, such as vision-language navigation. I am also interested in AI for Science, including emerging topics such as brain signal (e.g., EEG and MEG) decoding.

Opening Positions

团队长期招收博士生、硕士生和本科实习生,欢迎对多模态视频理解、具身智能等方向有热情的同学加入(申请邮件请附上个人简历)。希望你具备一定的 AI 与编程基础,或者在数学理论、工程实现、系统搭建等方面有自己的长处。科研没有标准模板,我们期待的不是“成绩好”,而是好奇心、主动性和执行力

News

  • Apr. 2026

    StreamingEval was accepted to ACL 2026 Findings.

  • Feb. 2026

    Seven papers were accepted to CVPR 2026 (6 Main, 1 Findings), covering egoventric & streaming video understanding, vision-language navigation, and related topics.

  • Jan. 2026

    EgoNight was accepted to ICLR 2026.

  • Jan. 2026

    SeGDP was accepted to ACM ToMM.

  • Nov. 2025

    Our work on EEG to text decoding was accepted to Expert Systems with Applications.

  • Oct. 2025

    Look Before You Decide (MARS-Bench) was accepted to ACM MM 2025.

  • Aug. 2025

    EgoCross was accepted to AAAI 2026.

  • Jun. 2025

    Domain-RAG was accepted to NeurIPS 2025.

  • Jun. 2025

    NeighborRetr was accepted to CVPR 2025.

  • Apr. 2025

    HSACNet was accepted to ICME 2025.

  • Jan. 2024

    NuScenes-QA was accepted to AAAI 2024; preprint and code are now online.

  • Jun. 2023

    Our work on long-term video unserstanding (Locate before Answering) was accepted by IEEE Transactions on Multimedia.

  • Jul. 2022

    ViGA for video moment retrieval was presented at ACM SIGIR 2022.

  • Jun. 2022

    Wrapped up my multimodal research internship at Bilibili AI-Lab.

  • Mar. 2022

    Scene Graph Refinement Network for VQA was published in IEEE Transactions on Multimedia.

Publications

* joint first author, # corresponding author

CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning

Kailing Li, Qi'ao Xu, Tianwen Qian#, Yuqian Fu, Yang Jiao, Xiaoling Wang

CVPR, 2026

Bridging the 2D-3D Gap: A Hierarchical Semantic-Geometric Map for Vision Language Navigation

Kailing Li, Tianwen Qian#, Lijin Yang, Yuqian Fu, Jingyu Gong, Xiaoling Wang, Liang He

CVPR, 2026 (to appear online)

Omni-Supervised Motion Editing: Balancing Change and Invariance through Positive-Negative Learning

Zhenwu Shi, Jingyu Gong, Peiwei Wang, Xingzan Wang, Tianwen Qian, Wenxi Li, Yuan Fang, Jiao Xie, Lizhuang Ma, Shaohui Lin

CVPR, 2026 (to appear online)

Think, Then Verify: A Hypothesis-Verification Multi-Agent Framework for Long Video Understanding

Zheng Wang, Haoran Chen, Haoxuan Qin, Zhipeng Wei, Tianwen Qian, Cong Bai

CVPR, 2026

EgoSound: Benchmarking Sound Understanding in Egocentric Videos

Bingwen Zhu, Yuqian Fu, Qiaole Dong, Guolei Sun, Tianwen Qian, Yuzheng Wu, Danda Pani Paudel, Xiangyang Xue, Yanwei Fu

CVPR, 2026

V2-SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence

Jiancheng Pan, Runze Wang, Tianwen Qian, Mohammad Mahdi, Yanwei Fu, Xiangyang Xue, Xiaomeng Huang, Luc Van Gool, Danda Pani Paudel, Yuqian Fu

CVPR, 2026

StreamEQA: Towards Streaming Video Understanding for Embodied Scenarios

Yifei Wang, Zhenkai Li, Tianwen Qian#, Huanran Zheng, Zheng Wang, Yuqian Fu, Xiaoling Wang

CVPR Findings, 2026

EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark

Deheng Zhang, Yuqian Fu, Runyi Yang, Yang Miao, Tianwen Qian, Xu Zheng, Guolei Sun, Ajad Chhatkuli, Xuanjing Huang, Yu-Gang Jiang, Luc Van Gool, Danda Pani Paudel

ICLR, 2026

SeGDP: Source-free Cross-domain Few-shot Learning via Semantic Guided Diversity Prompting

Linhai Zhuo, Zheng Wang, Tianwen Qian, Yuqian Fu

ACM Transactions on Multimedia Computing, Communications, and Applications (ToMM), 2026

Egocross: Benchmarking multimodal large language models for cross-domain egocentric video question answering

Yanjun Li, Yuqian Fu, Tianwen Qian#, Qi'ao Xu, Silong Dai, Danda Pani Paudel, Luc Van Gool, Xiaoling Wang

AAAI, 2026

Experiences

East China Normal University, School of Computer Science and Technology

Associate Researcher

Mar. 2025 – Present

Bosch (China) Investment Ltd., Central Research

AI Algorithm Researcher

Jul. 2024 – Nov. 2024

Bilibili AI-Lab

Research intern with the topic of Multimodal Learning, Large-scale Video Pre-training, Video Localization.

Jun. 2021 – Jun. 2022

Dalian University of Technology

Research assistant of the Smart Ocean Lab with the topic of visual obstacle avoidance for unmanned ships.

Sep. 2018 – Jun. 2019

Education

Fudan University

Ph.D., Computer Application Technology

Sep. 2019 – Jun. 2024

Dalian University of Technology

Bachelor of Engineering

Sep. 2015 – Jun. 2019

Awards

上海市计算机学会优秀博士学位论文

Mar. 2025

上海市优秀毕业生

Jun. 2024

Academic Services

  • Conference Reviewer for ICML / NeurIPS / CVPR / ECCV / AAAI / ACM MM.
  • Journal Reviewer for TMM / ToMM / PR / Neurocomputing.