Abstract
Learning an object detection or retrieval system requires a large data set with manual annotations. Such data sets are expensive and time consuming to create and therefore difficult to obtain on a large scale. In this work, we propose to exploit the natural correlation in narrations and the visual presence of objects in video, to learn an object detector and retrieval without any manual labeling involved. We pose the problem as weakly supervised learning with noisy labels, and propose a novel object detection paradigm under these constraints. We handle the background rejection by using contrastive samples and confront the high level of label noise with a new clustering score. Our evaluation is based on a set of 11 manually annotated objects in over 5000 frames. We show comparison to a weakly-supervised approach as baseline and provide a strongly labeled upper bound.
Original language | English |
---|---|
Title of host publication | Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 3713-3717 |
Number of pages | 5 |
ISBN (Electronic) | 9781728150239 |
DOIs | |
State | Published - Oct 2019 |
Externally published | Yes |
Event | 17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019 - Seoul, Korea, Republic of Duration: 27 Oct 2019 → 28 Oct 2019 |
Publication series
Name | Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019 |
---|
Conference
Conference | 17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019 |
---|---|
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 27/10/19 → 28/10/19 |
Bibliographical note
Publisher Copyright:© 2019 IEEE.
ASJC Scopus subject areas
- Computer Science Applications
- Computer Vision and Pattern Recognition