Siqi Fan
Logo Researcher @ AIR, Tsinghua University

I'm Siqi Fan (范嗣祺), a researcher at Institute for AI Industry Research, Tsinghua University (AIR, THU). Previously, I received my M.S. degree from Institute of Automation, Chinese Academy of Sciences (CASIA) in 2022, and my B.E. degree from Shanghai Jiao Tong University (SJTU) in 2019.

I am broadly interested in Representation Learning in Complex Systems, from macro physical world to micro biological world, aiming to create AI agents that perceive like or beyond human. Driven by the dedication to innovation, I aspire to advance the fields of autonomous driving and biomedical discovery, pushing the boundaries of technology to create impactful products.

I love music and visual arts.


Education
  • Shanghai Jiao Tong University (SJTU)
    Shanghai Jiao Tong University (SJTU)
    School of Electronic Information and Electrical Engineering
    B.E. in Automation
    Sep. 2015 - Jul. 2019
  • University of Chinese Academy of Sciences (UCAS)
    University of Chinese Academy of Sciences (UCAS)
    Institute of Automation
    M.S. in Automation
    Sep. 2019 - Jul. 2022
Research Experience
  • Autonomous System Group, Intel Labs China (ILC)
    Autonomous System Group, Intel Labs China (ILC)
    Research Intern
    Aug. 2020 - Dec. 2021
  • Institute of Automation, Chinese Academy of Sciences (CASIA)
    Institute of Automation, Chinese Academy of Sciences (CASIA)
    Student Researcher
    Sep. 2019 - Jul. 2022
  • Institute for AI Industry Research, Tsinghua University (AIR, THU)
    Institute for AI Industry Research, Tsinghua University (AIR, THU)
    Researcher
    Jul. 2022 - Present
Academic Service
Honors & Awards
  • National Scholarship
    2021
  • Pan Deng First-class Scholarship, CAS
    2022
  • Excellent Scholarship, SJTU
    2018
  • China Industrial Intelligence Challenge, State-level Outstanding Award, CAA
    2018
News
2024
Our workshop proposal Multi-Agent Embodied Intelligent Systems Meet Generative-AI Era: Opportunities, Challenges and Futures is accepted as a full day workshop @ CVPR'25. Call for Papers
Dec 21
The AI-agent system project PharmAID is launched @ FUSON PHARMA. News
Oct 23
Serve as Area Chair for 1st Workshop on Cooperative Intelligence for Embodied AI @ ECCV'24 News
Oct 12
2023
Release the 1st real-world large-scale dataset for roadside cooperative perception RCooper News
Dec 25
Our ChatDD-FM-100B ranks 1st in all four medical disciplines in C-Eval Benchmark and is the only model with an average score of more than 90. News
Sep 21
Release the 1st commercial-friendly multimodal biomedical foundation model BioMedGPT-10B. News
Aug 18
2022
Give a talk on Traffic Scenes Understanding and Simulation Testing @ ITSC'22.
Sep 18
Selected Publications (view all )
BioMedGPT: An Open Multimodal Large Language Model for BioMedicine
BioMedGPT: An Open Multimodal Large Language Model for BioMedicine

Yizhen Luo, Jiahuan Zhang, Siqi Fan, Kai Yang, Massimo Hong, Yushuai Wu, Mu Qiao, Zaiqing Nie

IEEE Journal of Biomedical and Health Informatics (J-BHI) 2024

Recent advances in large language models (LLMs) like ChatGPT have shed light on the development of knowledgeable and versatile AI research assistants in various scientific domains. However, they fall short in biomedical applications due to a lack of proprietary biomedical knowledge and deficiencies in handling biological sequences for molecules and proteins. To address these issues, we present BioMedGPT, a multimodal large language model for assisting biomedical research. We first incorporate domain expertise into LLMs by incremental pre-training on large-scale biomedical literature. Then, we harmonize 2D molecular graphs, protein sequences, and natural language within a unified, parameter-efficient fusion architecture by fine-tuning on multimodal question-answering datasets. Through comprehensive experiments, we show that BioMedGPT performs on par with human experts in comprehending biomedical documents and answering research questions. It also exhibits promising capability in analyzing intricate functions and properties of novel molecules and proteins, surpassing state-of-the-art LLMs by 17.1% and 49.8% absolute gains respectively in ROUGE-L on molecule and protein question-answering.

BioMedGPT: An Open Multimodal Large Language Model for BioMedicine

Yizhen Luo, Jiahuan Zhang, Siqi Fan, Kai Yang, Massimo Hong, Yushuai Wu, Mu Qiao, Zaiqing Nie

IEEE Journal of Biomedical and Health Informatics (J-BHI) 2024

Recent advances in large language models (LLMs) like ChatGPT have shed light on the development of knowledgeable and versatile AI research assistants in various scientific domains. However, they fall short in biomedical applications due to a lack of proprietary biomedical knowledge and deficiencies in handling biological sequences for molecules and proteins. To address these issues, we present BioMedGPT, a multimodal large language model for assisting biomedical research. We first incorporate domain expertise into LLMs by incremental pre-training on large-scale biomedical literature. Then, we harmonize 2D molecular graphs, protein sequences, and natural language within a unified, parameter-efficient fusion architecture by fine-tuning on multimodal question-answering datasets. Through comprehensive experiments, we show that BioMedGPT performs on par with human experts in comprehending biomedical documents and answering research questions. It also exhibits promising capability in analyzing intricate functions and properties of novel molecules and proteins, surpassing state-of-the-art LLMs by 17.1% and 49.8% absolute gains respectively in ROUGE-L on molecule and protein question-answering.

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception
RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

Ruiyang Hao*, Siqi Fan*, Yingru Dai, Zhenlin Zhang, Chenxi Li, Yuntian Wang, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie (* equal contribution)

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

Orienting a comprehensive understanding of a traffic area, we need Roadside Cooperative Perception (RCooper) to achieve area-coverage roadside perception. Rcooper has its own domain-specific challenges, but further exploration is hindered due to the lack of datasets. We hence release the first real-world, large-scale RCooper dataset to bloom the research on practical roadside cooperative perception, including detection and tracking. The manually annotated dataset comprises 50k images and 30k point clouds, including two representative traffic scenes (i.e., intersection and corridor). The constructed benchmarks prove the effectiveness of roadside cooperation perception and demonstrate the direction of further research.

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

Ruiyang Hao*, Siqi Fan*, Yingru Dai, Zhenlin Zhang, Chenxi Li, Yuntian Wang, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie (* equal contribution)

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

Orienting a comprehensive understanding of a traffic area, we need Roadside Cooperative Perception (RCooper) to achieve area-coverage roadside perception. Rcooper has its own domain-specific challenges, but further exploration is hindered due to the lack of datasets. We hence release the first real-world, large-scale RCooper dataset to bloom the research on practical roadside cooperative perception, including detection and tracking. The manually annotated dataset comprises 50k images and 30k point clouds, including two representative traffic scenes (i.e., intersection and corridor). The constructed benchmarks prove the effectiveness of roadside cooperation perception and demonstrate the direction of further research.

QUEST: Query Stream for Practical Cooperative Perception
QUEST: Query Stream for Practical Cooperative Perception

Siqi Fan, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie

IEEE International Conference on Robotics and Automation (ICRA) 2024

Aiming at interpretable and flexible cooperative perception, we propose the concept of query cooperation in this paper, which enables instance-level feature interaction among agents via the query stream. To specifically describe the query cooperation, a representative cooperative perception framework (QUEST) is proposed. It performs cross-agent query interaction by fusion and complementation, which are designed for co-aware objects and unaware objects respectively. Taking camera-based vehicle-infrastructure cooperative perception as a typical scenario, we generate the camera-centric cooperation labels of DAIR-V2X-Seq and evaluate the proposed framework on it. The experimental results not only demonstrate the effectiveness but also show the advantages of transmission flexibility and robustness to packet dropout. In addition, we discuss the pros and cons of query cooperation paradigm from the possible extensions and foreseeable limitations.

QUEST: Query Stream for Practical Cooperative Perception

Siqi Fan, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie

IEEE International Conference on Robotics and Automation (ICRA) 2024

Aiming at interpretable and flexible cooperative perception, we propose the concept of query cooperation in this paper, which enables instance-level feature interaction among agents via the query stream. To specifically describe the query cooperation, a representative cooperative perception framework (QUEST) is proposed. It performs cross-agent query interaction by fusion and complementation, which are designed for co-aware objects and unaware objects respectively. Taking camera-based vehicle-infrastructure cooperative perception as a typical scenario, we generate the camera-centric cooperation labels of DAIR-V2X-Seq and evaluate the proposed framework on it. The experimental results not only demonstrate the effectiveness but also show the advantages of transmission flexibility and robustness to packet dropout. In addition, we discuss the pros and cons of query cooperation paradigm from the possible extensions and foreseeable limitations.

Calibration-free BEV Representation for Infrastructure Perception
Calibration-free BEV Representation for Infrastructure Perception

Siqi Fan, Zhe Wang, Xiaoliang Huo, Yan Wang, Jingjing Liu

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023

Addressing the practical challenges of various installation postures and calibration noises caused by inevitable natural factors (e.g., wind and snow), we point out the significant performance degradation of calibration-based BEV detection approach under calibration noise, and propose the Calibration-free BEV Representation network (CBR) for infrastructure perception. CBR achieves feature view standardization via decoupled feature reconstruction. The perspective view features are decoupled to front view and bird-eye view via MLPs without any calibration parameters, and orthogonal feature fusion is similarity-based without additional depth supervision. It is evaluated on the large-scale real-world dataset DAIR-V2X, and achieves a better accuracy-robustness balance.

Calibration-free BEV Representation for Infrastructure Perception

Siqi Fan, Zhe Wang, Xiaoliang Huo, Yan Wang, Jingjing Liu

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023

Addressing the practical challenges of various installation postures and calibration noises caused by inevitable natural factors (e.g., wind and snow), we point out the significant performance degradation of calibration-based BEV detection approach under calibration noise, and propose the Calibration-free BEV Representation network (CBR) for infrastructure perception. CBR achieves feature view standardization via decoupled feature reconstruction. The perspective view features are decoupled to front view and bird-eye view via MLPs without any calibration parameters, and orthogonal feature fusion is similarity-based without additional depth supervision. It is evaluated on the large-scale real-world dataset DAIR-V2X, and achieves a better accuracy-robustness balance.

Conservative-Progressive Collaborative Learning for Semi-supervised Semantic Segmentation
Conservative-Progressive Collaborative Learning for Semi-supervised Semantic Segmentation

Siqi Fan, Fenghua Zhu, Zunlei Feng, Yisheng Lv, Mingli Song, Fei-Yue Wang

IEEE Transactions on Image Processing (T-IP) 2022

We proposed a novel semi-supervised learning approach for semantic segmentation, termed Conservative-Progressive Collaborative Learning (CPCL), to not only take the advantage of the high-quality labels but also make the full use of the large quantity of the unlabeled data. CPCL is realized via the intersection and union pseudo supervision, which cooperate with each other and achieve the collaboration of conservative evolution and progressive exploration. In addition, a confidence-based dynamic loss is proposed to reduce the pseudo supervision noise. CPCL is simple, efficient and flexible. It is evaluated on Cityscapes and PASCAL VOC 2012, and achieves SOTA performance for semi-supervised semantic segmentation, especially in low-data regime.

Conservative-Progressive Collaborative Learning for Semi-supervised Semantic Segmentation

Siqi Fan, Fenghua Zhu, Zunlei Feng, Yisheng Lv, Mingli Song, Fei-Yue Wang

IEEE Transactions on Image Processing (T-IP) 2022

We proposed a novel semi-supervised learning approach for semantic segmentation, termed Conservative-Progressive Collaborative Learning (CPCL), to not only take the advantage of the high-quality labels but also make the full use of the large quantity of the unlabeled data. CPCL is realized via the intersection and union pseudo supervision, which cooperate with each other and achieve the collaboration of conservative evolution and progressive exploration. In addition, a confidence-based dynamic loss is proposed to reduce the pseudo supervision noise. CPCL is simple, efficient and flexible. It is evaluated on Cityscapes and PASCAL VOC 2012, and achieves SOTA performance for semi-supervised semantic segmentation, especially in low-data regime.

SCF-Net: Learning Spatial Contextual Features for Large-Scale Point Cloud Segmentation
SCF-Net: Learning Spatial Contextual Features for Large-Scale Point Cloud Segmentation

Siqi Fan, Qiulei Dong, Fenghua Zhu, Yisheng Lv, Peijun Ye, Fei-Yue Wang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021

We proposed a systematic approach for learning the spatial contextual feature, including the local spatial contextual information representation, the local spatial contextual feature learning, and the global spatial contextual feature learning. On the basis of that, a corresponding module for spatial contextual learning is designed. The module could be easily embedded into various network architectures for point cloud segmentation, naturally resulting in a new 3D semantic segmentation network with an encoder-decoder architecture, called SCF-Net. It is evaluated on S3DIS and Semantic3D, and performs better than several SOTA methods in most cases.

SCF-Net: Learning Spatial Contextual Features for Large-Scale Point Cloud Segmentation

Siqi Fan, Qiulei Dong, Fenghua Zhu, Yisheng Lv, Peijun Ye, Fei-Yue Wang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021

We proposed a systematic approach for learning the spatial contextual feature, including the local spatial contextual information representation, the local spatial contextual feature learning, and the global spatial contextual feature learning. On the basis of that, a corresponding module for spatial contextual learning is designed. The module could be easily embedded into various network architectures for point cloud segmentation, naturally resulting in a new 3D semantic segmentation network with an encoder-decoder architecture, called SCF-Net. It is evaluated on S3DIS and Semantic3D, and performs better than several SOTA methods in most cases.

All publications