Skip to main content

2014 | Buch

Multimedia Database Retrieval

Technology and Applications

insite
SUCHEN

Über dieses Buch

This book explores multimedia applications that emerged from computer vision and machine learning technologies. These state-of-the-art applications include MPEG-7, interactive multimedia retrieval, multimodal fusion, annotation, and database re-ranking. The application-oriented approach maximizes reader understanding of this complex field. Established researchers explain the latest developments in multimedia database technology and offer a glimpse of future technologies. The authors emphasize the crucial role of innovation, inspiring users to develop new applications in multimedia technologies such as mobile media, large scale image and video databases, news video and film, forensic image databases and gesture databases. With a strong focus on industrial applications along with an overview of research topics, Multimedia Database Retrieval: Technology and Applications is an indispensable guide for computer scientists, engineers and practitioners involved in the development and use of multimedia systems. It also serves as a secondary text or reference for advanced-level students interested in multimedia technologies.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
The ever-increasing volume of multimedia data being generated in the world has led to much research interest in multimedia database retrieval since the early years of the twentieth century. Computer vision and machine learning technologies have been developed, forming a solid research foundation for the creation of stage-of-the-art applications, such as MPEG-7, interactive multimedia retrieval, multimodal fusion, annotation, and database re-ranking. The time has come to explore the consequences of these multimedia applications. Multimedia Database Retrieval: Technology and Application is an application-oriented book, borne out of established researchers in this emerging field. It covers the latest developments and important applications in multimedia database technology, and offers a glimpse of future technologies.
Paisarn Muneesawang, Ning Zhang, Ling Guan
Chapter 2. Kernel-Based Adaptive Image Retrieval Methods
Abstract
This chapter presents machine learning methods for adaptive image retrieval. In a retrieval session, a nonlinear kernel is applied to measure image relevancy. Various new learning procedures are covered and applied specifically for adaptive image retrieval applications. These include the adaptive radial basis function (RBF) network, short term learning with the gradient-decent method, and the fuzzy RBF network. These methods constitute the likelihood estimation corresponding to visual content in a short-term relevance feedback (STRF). The STRF component can be further incorporated in a fusion module with contextual information in long-term relevance feedback (LTRF) using the Bayesian framework. This substantially increases retrieval accuracy.
Paisarn Muneesawang, Ning Zhang, Ling Guan
Chapter 3. Self-adaptation in Image and Video Retrieval
Abstract
This chapter explores the automatic methods for implementing pseudo-relevance feedback for retrieval of images and videos. The automation is based on dynamic self-organization, the self-organizing tree map that is capable of identification of relevance in place of human users. The automation process leads to the avoidance of errors in excessive human involvement, and enlarging the size of training set, as compared to traditional relevance feedback. The automatic retrieval system applies for image retrieval in compressed domains (i.e., JPEG and wavelet based coders). In addition, the system incorporates knowledge-based learning to acquire a suitable weighting scheme for unsupervised relevance identification. In the video domain, the pseudo-relevance feedback is implemented by an adaptive cosine network than enhances retrieval accuracy through the network’s forward–backward signal propagation, without user input.
Paisarn Muneesawang, Ning Zhang, Ling Guan
Chapter 4. Interactive Mobile Visual Search and Recommendation at Internet Scale
Abstract
Mobile-based visual search and recognition has been an emerging topic for both research and engineering communities. Among various methods, visual search has its merit in providing an alternative solution, where text/voice searches are not applicable. Combining the Bag-of-word (BoW) model with advanced retrieval algorithms, a mobile-based visual search and social activity recommendation system is presented at internet scale. The merit of the BoW model in large-scale image retrieval is integrated with the flexible user interface provided by the mobile platform. Instead of text or voice input, the system takes visual images captured from the built-in camera and attempts to understand users’ intents through interactions. Subsequently, such intents are recognized through a retrieval mechanism using the BoW model. Finally, visual results are mapped onto contextually relevant information and entities (i.e. local business) for social task suggestions. Hence, the system offers users the ability to search information and make decisions on-the-go.
Paisarn Muneesawang, Ning Zhang, Ling Guan
Chapter 5. Mobile Landmark Recognition
Abstract
In recent years, landmark image recognition has been a developing application for computers. In order to improve the recognition rate for mobile landmark recognition systems, this chapter presents a re-ranking method. The query feature vector is modified, identifying important features and non-important features. These are performed on the ranked feature vectors according to feature selection criteria using an unsupervised wrapper approach. Positive and negative weighting schemes are applied for the modification of the query to recognize the target landmark image. The experimental results show that the re-ranking method can improve the recognition rate, as compared to previously proposed methods that utilize saliency weighting and scalable vocabulary tree encoding.
Paisarn Muneesawang, Ning Zhang, Ling Guan
Chapter 6. Image Retrieval from a Forensic Cartridge Case Database
Abstract
This chapter presents a content-based image retrieval method for firearm identification. The reference and the corresponding cartridge base case images are aligned according to the phase-correlation criterion on the transform domain. The informative segments of the breech face marks are identified by a cross-covariance coefficient in a window located locally in the image space. Measurements of edge density for these segments are made to compute effective correlation areas for image matching. This image matching system can attain significant improvement in image-correlation results, compared with traditional image-matching methods for firearm identification. The system will enable forensic science to compile a large-scale image database to perform a correlation of cartridge case bases, in order to identify firearms that involve pairwise alignments and comparisons.
Paisarn Muneesawang, Ning Zhang, Ling Guan
Chapter 7. Indexing, Object Segmentation, and Event Detection in News and Sports Videos
Abstract
A video parsing algorithm in the compressed domain is first introduced in this chapter. The algorithm is based on the conventional solution, where energy histograms of DC coefficients are used to calculate the distance between consecutive I/P frames, and the DC coefficients of the P-frames are obtained by frame conversion. The detection results are enhanced by using the ratio between two sliding windows to amplify the transitional regions. Secondly, in order to index news video at various levels, a template-frequency model is utilized to characterize the spatio-temporal information of news stories. The system employing this indexing structure is highly applicable for news-on-demand applications. Thirdly, a method for video object segmentation using Graph Cut and histogram of oriented gradients is presented. This method enhances the segmentation of objects that do not segment well, due to either poor luminance distribution, weak edges, or backgrounds with similar color and movement. Fourthly, the chapter presents an automatic and robust method to detect human faces from video sequences that combines feature extraction and face detection based on local normalization, Gabor wavelet transform, and AdaBoost algorithm. Finally, an application system is presented for the classification of American Football videos according to events of interest. The system consists of two stages. The first stage is responsible for play event localization and the latter stage is responsible for feature mapping and classification. The first stage employs MPEG-7 motion activity descriptors to detect the starting point of a play event, whereas the second stage uses MPEG-7 motion and audio descriptors along with Mel Frequency Cepstrum Coefficient features to classify the events using Fisher’s LDA.
Paisarn Muneesawang, Ning Zhang, Ling Guan
Chapter 8. Adaptive Retrieval in a P2P Cloud Datacenter
Abstract
This chapter presents indexing and retrieval methods for image and video on cloud datacenters. The application is based on a peer-to-peer (P2P) network in both structured and unstructured network organizations. Firstly, a cluster-identification search system is developed on the Chord layers to organize nodes as a structured peer-to-peer network. The system derives automatic clustering for the organization of nodes in a distributed hash table for effective node searching and retrieval of multimedia objects. Secondly, pseudo-relevance feedback using the self-organizing tree map is implemented for image database retrieval on a P2P network. The query processing is carried out on an unstructured P2P network, through the discovery of a community of neighbors and by performing automatic retrieval within the nodes of the community. Thirdly, based on the unstructured P2P network, the adaptive cosine network is also implemented for video database retrieval.
Paisarn Muneesawang, Ning Zhang, Ling Guan
Chapter 9. Scalable Video Genre Classification and Event Detection
Abstract
This chapter focuses on a systematic and generic approach which is experimented on scalable video genre classification and event detection. The system aims at the event detection scenario of an input video with an orderly sequential process. Initially, domain-knowledge independent local descriptors are extracted homogeneously from the input video sequence. Then the video representation is created by adopting a Bag-of-word (BoW) model. The video’s genre is firstly identified by applying the k-nearest neighbor (k-NN) classifiers on the initially obtained video representation. Various dissimilarity measures are assessed and evaluated analytically. Then, at the high-level event detection, a hidden conditional random field (HCRF) structured prediction model is utilized for interesting event detection. The input of this event detection relies on middle-level view agents in characterizing each frame of video sequence into one of four view groups, namely closed-up-view, mid-view, long-view and outer-field-view. Unsupervised probabilistic latent semantic analysis (PLSA) based approach is employed at the histogram-based video representation to achieve these middle-level view groups. The framework demonstrates the efficiency and generality in processing voluminous video collection and achieves various tasks in video analysis. The affectiveness of the framework is justified by extensive experimentation. Results are compared with benchmarks and state of the art algorithms. Limited human expertise and effort is involved in both domain-knowledge independent video representation and annotation free unsupervised view labeling. As a result, such a systematic and scalable approach can be widely applied in processing massive videos generically.
Paisarn Muneesawang, Ning Zhang, Ling Guan
Chapter 10. Audio-Visual Fusion for Film Database Retrieval and Classification
Abstract
This chapter presents the techniques for the characterization and fusion of audio and visual content in videos, and demonstrates their applications in movie database retrieval. In the audio domain, a study is conducted on the peaky nature of the distribution of wavelet coefficients of an audio signal, which cannot be effectively modeled by a single distribution. Thus, a new modeling method based on a Laplacian mixture model is studied for analyzing audio content and extracting audio features. The dimension of the indexed features is low, which is important for the retrieval efficiency of the system in terms of response time. Together with the audio feature, the visual feature is extracted by template frequency modeling. Both features are referred to as perceptual features. Then, a learning algorithm for audiovisual fusion is presented. Specifically, the two features are fused at the late fusion stage and input into a support vector machine to learn semantic concepts from a given video database. Based on the experimental results, the current system implementing the support vector machine-based fusion technique achieves high classification accuracy when applied to a large volume database containing Hollywood movies.
Paisarn Muneesawang, Ning Zhang, Ling Guan
Chapter 11. Motion Database Retrieval with Application to Gesture Recognition in a Virtual Reality Dance Training System
Abstract
This chapter presents gesture recognition methods and their application to a dance training system in an instructional, virtual reality (VR) setting. The proposed system is based on the unsupervised parsing of dance movement into a structured posture space using the spherical self-organizing map (SSOM). A unique feature descriptor is obtained from the gesture trajectories through posture space on the SSOM. For recognition, various methods are explored for trajectory analysis, which include sparse coding, posture occurrence, posture transition, and the hidden Markov model. Within the system, the dance sequence of a student can be segmented online and cross-referenced against a library of gestural components performed by the teacher. This facilitates the assessment of the student dance, as well as provides visual feedback for effective training.
Paisarn Muneesawang, Ning Zhang, Ling Guan
Backmatter
Metadaten
Titel
Multimedia Database Retrieval
verfasst von
Paisarn Muneesawang
Ning Zhang
Ling Guan
Copyright-Jahr
2014
Electronic ISBN
978-3-319-11782-9
Print ISBN
978-3-319-11781-2
DOI
https://doi.org/10.1007/978-3-319-11782-9

Neuer Inhalt