Abstract
For TRECVID 2011 MED task, the GENIE system incorporated two late-fusion approaches where multiple discriminative base-classifiers are built per feature, then, combined later through discriminative fusion techniques. All of our fusion and base classifiers are formulated as one-vs-all detectors per event class along with threshold estimation capabilities during cross-validation. Total of five different types of features were extracted from data, which include both audio or visual features: HOG3D, Object Bank, Gist, MFCC, and acoustic segment models (ASMs). Features such as HOG3D and MFCC are low-level features while Object Bank and ASMs are more semantic. In our work, event-specific feature adaptations or manual annotations were deliberately avoided, to establish a strong baseline results. Overall, the results were competitive in the MED11 evaluation, and shows that standard machine learning techniques can yield fairly good results even on a challenging dataset.
Original language | English |
---|---|
State | Published - 2011 |
Externally published | Yes |
Event | TREC Video Retrieval Evaluation, TRECVID 2011 - Gaithersburg, MD, United States Duration: 5 Dec 2011 → 7 Dec 2011 |
Conference
Conference | TREC Video Retrieval Evaluation, TRECVID 2011 |
---|---|
Country/Territory | United States |
City | Gaithersburg, MD |
Period | 5/12/11 → 7/12/11 |
ASJC Scopus subject areas
- Computer Graphics and Computer-Aided Design
- Computer Vision and Pattern Recognition
- Human-Computer Interaction
- Software