Abstract
Sensor fusion is an important method for achieving robust perception systems in autonomous driving, Internet of things, and robotics. Most multi-modal 3D detection models assume the data is synchronized between the sensors and do not necessarily have real-time capabilities. We propose RCF-TP, an asynchronous, modular, real-time multi-modal architecture, to fuse cameras and radars for 3D object detection, with sensor fault mitigation and extreme weather conditions handling. Our dedicated feature extractors can be trained assuming either a regular or an irregular bird's-eye-view grid or with different grid resolutions, such that the fusion module is agnostic to both. These extracted features are correlated to the other modality features or to another sensor of the same modality, and eventually a detection head that exploits rich multi-modal features could be applied at any time to produce bounding box predictions. Experimental results show the effectiveness of our fusion module. It improves detection performance for higher radar grid resolution, can operate under sensor faults without performance degradation, and improves pedestrian detection when our dataset combination strategy is implemented during training.
Original language | English |
---|---|
Pages (from-to) | 127212-127223 |
Number of pages | 12 |
Journal | IEEE Access |
Volume | 12 |
DOIs | |
State | Published - 2024 |
Bibliographical note
Publisher Copyright:© 2024 The Authors.
Keywords
- 3D object detection
- Sensor fusion
- self-supervised learning
- sensor dropout
ASJC Scopus subject areas
- General Computer Science
- General Materials Science
- General Engineering