Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

Tsun Hsuan Wang, Alaa Maalouf, Wei Xiao, Yutong Ban, Alexander Amini, Guy Rosman, Sertac Karaman, Daniela Rus

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems. We introduce a method to extract nuanced spatial features from transformers and the incorporation of latent space simulation for improved training and policy debugging. We use pixel/patch-aligned feature descriptors to expand foundational model capabilities to create an end-to-end multimodal driving model, demonstrating unparalleled results in diverse tests. Our solution combines language with visual perception and achieves significantly greater robustness on out-of-distribution situations.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Robotics and Automation, ICRA 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6687-6694
Number of pages8
ISBN (Electronic)9798350384574
DOIs
StatePublished - 2024
Externally publishedYes
Event2024 IEEE International Conference on Robotics and Automation, ICRA 2024 - Yokohama, Japan
Duration: 13 May 202417 May 2024

Publication series

NameProceedings - IEEE International Conference on Robotics and Automation
ISSN (Print)1050-4729

Conference

Conference2024 IEEE International Conference on Robotics and Automation, ICRA 2024
Country/TerritoryJapan
CityYokohama
Period13/05/2417/05/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models'. Together they form a unique fingerprint.

Cite this