Abstract
As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems. We introduce a method to extract nuanced spatial features from transformers and the incorporation of latent space simulation for improved training and policy debugging. We use pixel/patch-aligned feature descriptors to expand foundational model capabilities to create an end-to-end multimodal driving model, demonstrating unparalleled results in diverse tests. Our solution combines language with visual perception and achieves significantly greater robustness on out-of-distribution situations.
Original language | English |
---|---|
Title of host publication | 2024 IEEE International Conference on Robotics and Automation, ICRA 2024 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 6687-6694 |
Number of pages | 8 |
ISBN (Electronic) | 9798350384574 |
DOIs | |
State | Published - 2024 |
Externally published | Yes |
Event | 2024 IEEE International Conference on Robotics and Automation, ICRA 2024 - Yokohama, Japan Duration: 13 May 2024 → 17 May 2024 |
Publication series
Name | Proceedings - IEEE International Conference on Robotics and Automation |
---|---|
ISSN (Print) | 1050-4729 |
Conference
Conference | 2024 IEEE International Conference on Robotics and Automation, ICRA 2024 |
---|---|
Country/Territory | Japan |
City | Yokohama |
Period | 13/05/24 → 17/05/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Electrical and Electronic Engineering
- Artificial Intelligence