Semi-automated labeling of video using active learning for object detection
Labeling video sequences is a critical task that is required for a wide range of supervised learning applications. In general, manually labeling videos is an extremely repetitive and time- consuming task. Often, the process is sped up by sharing the workload across multiple workers, but this can create other problems, such as varying quality and consistency of labels. Meanwhile, the area of active learning has been proposed for assisting in the labeling of images for classification and object detection tasks. However, minimal prior work is centered around the utility of active learning for video labeling. In this thesis, we attempt to address the gap in prior efforts by proposing a Semi-Automated Labeling of Video (SALV) framework using active learning to support supervised object detection applications. Firstly, we propose a general architecture for the SALV framework that is built on intra-video training and testing. The proposed SALV architecture exploits the fact that labeling video provides a unique opportunity where training and testing can be performed on consecutive frames that contain highly correlated information. Secondly, we incorporate traditional active learning methods that utilize the confidence values produced by detections to select important frames for the next iteration. Thirdly, we propose two strategies for active learning of video labeling: minimal-Distance Iterative Active Learning (min-DIAL) and maximal-Distance Iterative Active Learning (max- DIAL). Lastly, we explore information theory to select frames with the most diversity using the Jensen-Shannon divergence to calculate the difference between certain frames based on the location of detections. We analyze the performance of the proposed SALV architecture in terms of the time taken to complete the labeling of the video sequences and present our results using the popular KITTI Tracking dataset. We show that our proposed max-DIAL framework is the most efficient method and can reduce the time taken to label video by a factor of 10.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Muntaner Whitley, Roberto
- Thesis Advisors
-
Radha, Hayder
- Committee Members
-
Morris, Daniel
Bopardikar, Shaunuk
- Date
- 2023
- Subjects
-
Computer science
Engineering
- Program of Study
-
Electrical and Computer Engineering - Master of Science
- Degree Level
-
Masters
- Language
-
English
- Pages
- 43 pages
- ISBN
-
9798379593278
- Permalink
- https://doi.org/doi:10.25335/6nj2-xt64