Abstract
Standardised benchmarks have been instrumental in driving the recent progress in computer vision. However, most benchmarks are designed for general-purpose tasks, covering multiple different topics and classes but are limited to the needs of specialised tasks. For example, when performing 3D reconstruction of corals, researchers need to record footage of coral with multiple camera angles. Due to the limited availability of such videos in standard datasets, the ability to reconstruct 3D coral models from public videos would alleviate this problem since it would allow researchers to tap into the vast scope of online content. Thus, one could use machine learning to sift through the immense amounts of content and automatically identify suitable videos for 3D reconstruction. In this work, we introduce a new benchmark that uses amateur footage queried from the YouTube-8 M dataset where each video has been manually labelled for undersea, coral, and multiple camera angles. Furthermore, we construct a three-stage pipeline of machine learning models with the purpose of identifying suitable videos for the 3D reconstruction of coral from the public domain. We instantiate the pipeline with state-of-the-art video classification methods and evaluate their performance on the benchmark, identifying their shortcomings and avenues for future research.
Original language | English |
---|---|
Journal | International Journal of Image and Data Fusion |
DOIs | |
State | Accepted/In press - 2024 |
Bibliographical note
Publisher Copyright:© 2024 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
Keywords
- Benchmark
- computer vision
- coral
- deep learning
- transformers
- underwater
- underwater object detection
- underwater video classification
ASJC Scopus subject areas
- Computer Science Applications
- General Earth and Planetary Sciences