Sparse and redundant models for data mining and consumer video summarization

This dissertation develops new data mining and representative selection techniques for consumervideo data using sparse and redundant models. Extracting key frames1 and key excerptsfrom video has important roles in many applications, such as to facilitate browsing a large videocollection, to support automatic video retrieval, video search, video compression, etc. In addition,a set of key frames or video summarization in general helps users to quickly access importantsections (in semantic meaning) in a video sequence, and hence enable rapid viewing.The current literature on video summarization has focused mainly on certain types of videosthat conform to well-defined structures and characteristics that facilitates key frame extraction.Some of these typical types of videos include sports, news, TV drama, movie dialog, documentaryvideos, and medical video. The prior techniques on well-defined structured/professional videoscannot be applied into consumer (or personal generated) videos acquired from digital cameras.Meanwhile, consumer video is increasing rapidly due to the popularity of handheld consumerdevices, on-line social networks and multimedia sharing websites.Consumer video has no particular structure or well-defined theme. The mixed sound trackcoming from multiple sound sources, along with severe noise make it difficult to identify semanticallymeaningful audio segments for key frames. In addition, consumer videos typically have onelong shot with low quality visuals due to various factors such as camera shake and poor lightingalong with no fixed features (subtitles, text captions) that could be exploited for further information to evaluate the importance of frames or segments. For many of these reasons, consumer-videosummarization is still a very challenging problem area.In this dissertation, we present new frameworks based on sparse and redundant models of imageand video dataset toward solving the consumer video summarization problem. In particular, in thisdissertation, we investigate three different models of image and video data for summarization.1. Sparse representation of video framesWe exploit the self-expressiveness property to create `1 norm sparse graph, which is applicablefor huge high dimensional dataset. A spectral clustering algorithm has been appliedinto the sparse graph for the selection of a set of clusters. Our work analyzes each cluster asone point in a Grassmann manifold and then selects an optimal set of clusters. The final representativeis evaluated using a graph centrality technique for the sub-graph correspondingwith each selected cluster.2. Sparse and low rank model for video framesA novel key frame extraction framework based on Robust Principal Component Analysis isproposed to automatically select a set of maximally informative frames from an input video.The framework is developed from a novel perspective of low rank and sparse components,in which the low rank component of a video frame reveals the relationship of that frameto the whole video sequence, and the sparse component indicates the distinct informationof particular frames. A set of key frames are identified by solving an `1 norm based nonconvexoptimization problem where the solution minimizes the reconstruction errors of thewhole dataset for a given set of selected key frames and maximizes the sum of distinctinformation. Moreover, the algorithm provides a mechanism for adapting new observations,and consequently, updating new set of key frames.3. Sparse/redundant representation for a single video frameWe propose a new patch-based image/video analysis approach. Using the new model, wecreate a new feature that we refer to as the heterogeneity image patch (HIP) index of an imageor a video frame. The HIP index, which is evaluated using patch-based image/video analysis,provides a measure for the level of heterogeneity (and hence the amount of redundancy) thatexists among patches of an image/video frame. We apply the proposed HIP frameworkto solve both of the video summarization problem areas: key frame extraction and videoskimming.

Read

In Collections: Electronic Theses & Dissertations

Copyright Status: In Copyright

Material Type: Theses

Authors: Dang, Chinh Trung

Thesis Advisors: Radha, Hayder

Committee Members: Hall, Jonathan
Aviyente, Selin
Deller, John

Date Published: 2015

Subjects: Automatic abstracting
Data mining
Database management
Digital video
Image processing--Digital techniques
Video recordings

Program of Study: Electrical Engineering - Doctor of Philosophy

Degree Level: Doctoral

Language: English

Pages: xii, 117 pages

ISBN: 9781321712711
1321712715

Permalink: https://doi.org/doi:10.25335/9ehh-yd92

Sparse and redundant models for data mining and consumer video summarization

Full text