About

I’m a Researcher at Institute for AI Industry Research, Tsinghua University (AIR, THU). I obtained the M.S. degree in Automation from Institute of Automation, Chinese Academy of Sciences (CASIA) in 2022, and the B.E. degree in Automation from Shanghai Jiao Tong University (SJTU) in 2019.

My research interests lie in AI for Transportation & AI for Life Science.

More information in my resume.pdf.
Some of my projects are open-sourced in Github.

Research

oogle Scholar

QUEST: Query Stream for Practical Cooperative Perception

S. Fan, H. Yu, W. Yang, et al. [pdf]
Aiming at interpretable and flexible cooperative perception, we propose the concept of query cooperation in this paper, which enables instance-level feature interaction among agents via the query stream. To specifically describe the query cooperation, a representative cooperative perception framework (QUEST) is proposed. It performs cross-agent query interaction by fusion and complementation, which are designed for co-aware objects and unaware objects respectively. Taking camera-based vehicle-infrastructure cooperative perception as a typical scenario, we generate the camera-centric cooperation labels of DAIR-V2X-Seq and evaluate the proposed framework on it. The experimental results not only demonstrate the effectiveness but also show the advantages of transmission flexibility and robustness to packet dropout. In addition, we discuss the pros and cons of query cooperation paradigm from the possible extensions and foreseeable limitations.

[ ICRA 2024 | Paper ]


Calibration-free BEV Representation for Infrastructure Perception

S. Fan, Z. Wang, X. Huo, et al. [pdf]
Addressing the practical challenges of various installation postures and calibration noises caused by inevitable natural factors (e.g., wind and snow), we point out the significant performance degradation of calibration-based BEV detection approach under calibration noise, and propose the Calibration-free BEV Representation network (CBR) for infrastructure perception. CBR achieves feature view standardization via decoupled feature reconstruction. The perspective view features are decoupled to front view and bird-eye view via MLPs without any calibration parameters, and orthogonal feature fusion is similarity-based without additional depth supervision. It is evaluated on the large-scale real-world dataset DAIR-V2X, and achieves a better accuracy-robustness balance.

[ IROS 2023 | Paper | Code]


SpiderMesh: Spatial-aware Demand-guided Recursive Meshing for RGB-T Semantic Segmentation

S. Fan, Z. Wang, Y. Wang, et al. [pdf]
We proposed a systematic multimodal learning approach for practical RGB-T (thermal) segmentation, termed Spatial-aware Demand-guided Recursive Meshing (SpiderMesh), to leverage the additional thermal signals in a proactive manner. SpiderMesh (1) proactively compensates inadequate contextual semantics in optically-impaired regions via a demand-guided target masking algorithm and (2) refines multimodal semantic features with recursive meshing to improve pixel-level semantic analysis performance. We further introduce an asymmetric data augmentation technique M-CutOut, and enable semi-supervised learning to fully utilize RGB-T labels only sparsely available in practical use. It is evaluated on MFNet and PST900 datasets, and achieves SOTA performance on standard RGB-T segmentation benchmarks.

[ Arxiv | Paper | Code]


Conservative-Progressive Collaborative Learning for Semi-supervised Semantic Segmentation

S. Fan, F. Zhu, Z. Feng, et al. [pdf]
We proposed a novel semi-supervised learning approach for semantic segmentation, termed Conservative-Progressive Collaborative Learning (CPCL), to not only take the advantage of the high-quality labels but also make the full use of the large quantity of the unlabeled data. CPCL is realized via the intersection and union pseudo supervision, which cooperate with each other and achieve the collaboration of conservative evolution and progressive exploration. In addition, a confidence-based dynamic loss is proposed to reduce the pseudo supervision noise. CPCL is simple, efficient and flexible. It is evaluated on Cityscapes and PASCAL VOC 2012, and achieves SOTA performance for semi-supervised semantic segmentation, especially in low-data regime.

[ IEEE Transactions on Image Processing | Paper | Code]


SCF-Net: Learning Spatial Contextual Features for Large-Scale Point Cloud Segmentation

S. Fan, Q. Dong, F. Zhu, et al. [pdf]
We proposed a systematic approach for learning the spatial contextual feature, including the local spatial contextual information representation, the local spatial contextual feature learning, and the global spatial contextual feature learning. On the basis of that, a corresponding module for spatial contextual learning is designed. The module could be easily embedded into various network architectures for point cloud segmentation, naturally resulting in a new 3D semantic segmentation network with an encoder-decoder architecture, called SCF-Net. It is evaluated on S3DIS and Semantic3D, and performs better than several SOTA methods in most cases.

[ CVPR 2021 | Paper | Code]


FII-CenterNet: An Anchor-Free Detector With Foreground Attention for Traffic Object Detection

S. Fan, F. Zhu, et al. [pdf]
We proposed a foreground segmentation approach for anchor-free object detection, which could alleviate the background influences under the complex traffic environment with little extra computation cost. On the basis of that, a novel traffic object detection network with foreground attention, called FII-CenterNet, is developed. It is evaluated on KITTI and PASCAL VOC, and achieves the SOTA performance in both accuracy and efficiency.

[ IEEE Transactions on Vehicular Technology | Paper | Code]


Improving Road Detection Results Based on Ensemble Learning and Key Samples Focusing

S. Fan, F. Zhu, et al. [pdf]
We proposed a road detection network, which integrates the classification results based on different feature combinations by weighted voting. To focus on key samples, a novel loss function is proposed. The loss function can attach importance to hard samples and pay different attention to missed detection and false detection. The method is evaluated on KITTI, and its effectiveness is verified.

[ ITSC 2020 | Paper]


Timeline

  • JUL 2022 - TODAY

    Researcher
    @ Institute for AI Industry Research, Tsinghua University (AIR)

  • Sept 2019 - Jun 2022

    Student Researcher
    @ Institute of Automation, Chinese Academy of Sciences (CASIA)

  • Sept 2019 - Jun 2022

    M.S. in Automation
    @ School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS)

  • Aug 2020 - Dec 2021

    Research Intern
    @ Intel Labs China (ILC)

  • Sept 2015 - Jun 2019

    B.E. in Automation
    @ School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University (SJTU)

Contact

Drop me an email if you are interested in my research or have opportunities