Follow Anything: Open-Set Detection, Tracking, and Following in Real-Time

Alaa Maalouf, Ninad Jadhav, Krishna Murthy Jatavallabhula, Makram Chahine, Daniel M. Vogt, Robert J. Wood, Antonio Torralba, Daniela Rus

Research output: Contribution to journalArticlepeer-review

Abstract

Tracking and following objects of interest is critical to several robotics use cases, ranging from industrial automation to logistics and warehousing, to healthcare and security. In this letter, we present a robotic system to detect, track, and follow any object in real-time. Our approach, dubbed follow anything (FAn), is an open-vocabulary and multimodal model - it is not restricted to concepts seen at training time and can be applied to novel classes at inference time using text, images, or click queries. Leveraging rich visual descriptors from large-scale pre-trained models (foundation models), FAn can detect and segment objects by matching multimodal queries (text, images, clicks) against an input image sequence. These detected and segmented objects are tracked across image frames, all while accounting for occlusion and object re-emergence. We demonstrate FAn on a real-world robotic system (a micro aerial vehicle), and report its ability to seamlessly follow the objects of interest in a real-time control loop. FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second. To enable rapid adoption, deployment, and extensibility, we open-source our code on our project webpage. We also encourage the reader to watch our 5-minute explainer video.

Original languageEnglish
Pages (from-to)3283-3290
Number of pages8
JournalIEEE Robotics and Automation Letters
Volume9
Issue number4
DOIs
StatePublished - 1 Apr 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2016 IEEE.

Keywords

  • AI-enabled robotics
  • object detection
  • segmentation and categorization
  • semantic scene understanding

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Biomedical Engineering
  • Human-Computer Interaction
  • Mechanical Engineering
  • Computer Vision and Pattern Recognition
  • Computer Science Applications
  • Control and Optimization
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Follow Anything: Open-Set Detection, Tracking, and Following in Real-Time'. Together they form a unique fingerprint.

Cite this