Learning to detect and retrieve objects from unlabeled videos

Elad Amrani, Rami Ben-Ari, Tal Hakim, Alex Bronstein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Learning an object detection or retrieval system requires a large data set with manual annotations. Such data sets are expensive and time consuming to create and therefore difficult to obtain on a large scale. In this work, we propose to exploit the natural correlation in narrations and the visual presence of objects in video, to learn an object detector and retrieval without any manual labeling involved. We pose the problem as weakly supervised learning with noisy labels, and propose a novel object detection paradigm under these constraints. We handle the background rejection by using contrastive samples and confront the high level of label noise with a new clustering score. Our evaluation is based on a set of 11 manually annotated objects in over 5000 frames. We show comparison to a weakly-supervised approach as baseline and provide a strongly labeled upper bound.

Original languageEnglish
Title of host publicationProceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3713-3717
Number of pages5
ISBN (Electronic)9781728150239
DOIs
StatePublished - Oct 2019
Externally publishedYes
Event17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019 - Seoul, Korea, Republic of
Duration: 27 Oct 201928 Oct 2019

Publication series

NameProceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019

Conference

Conference17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019
Country/TerritoryKorea, Republic of
CitySeoul
Period27/10/1928/10/19

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Learning to detect and retrieve objects from unlabeled videos'. Together they form a unique fingerprint.

Cite this