nach oben

2005 | Buch

Kapitel lesen Erstes Kapitel lesen

Intelligent Multimedia Processing with Soft Computing

herausgegeben von: Prof. Yap-Peng Tan, Prof. Kim Hui Yap, Prof. Lipo Wang

Verlag: Springer Berlin Heidelberg

Buchreihe : Studies in Fuzziness and Soft Computing

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Soft computing represents a collection of techniques, such as neural networks, evolutionary computation, fuzzy logic, and probabilistic reasoning. As - posed to conventional "hard" computing, these techniques tolerate impre- sion and uncertainty, similar to human beings. In the recent years, successful applications of these powerful methods have been published in many dis- plines in numerous journals, conferences, as well as the excellent books in this book series on Studies in Fuzziness and Soft Computing. This volume is dedicated to recent novel applications of soft computing in multimedia processing. The book is composed of 21 chapters written by experts in their respective fields, addressing various important and timely problems in multimedia computing such as content analysis, indexing and retrieval, recognition and compression, processing and filtering, etc. In the chapter authored by Guan, Muneesawang, Lay, Amin, and Lee, a radial basis function network with Laplacian mixture model is employed to perform image and video retrieval. D. Androutsos, P. Androutsos, Plataniotis, and Venetsanopoulos investigate color image indexing and retrieval within a small-world framework. Wu and Yap develop a framework of fuzzy relevance feedback to model the uncertainty of users' subjective perception in image retrieval.

Inhaltsverzeichnis

Frontmatter

Human-Centered Computing for Image and Video Retrieval

Abstract

In this chapter, we present retrieval techniques using content-based and concept-based technologies, for digital image and video database applications. We first deal with the state-of-the-art methods in a content-based framework including: Laplacian mixture model for content characterization, nonlinear relevance feedback, combining audio and visual features for video retrieval, and designing automatic relevance feedback in distributed digital libraries. We then take an elevated post, to review the defining characteristic and usefulness of the current content-based approaches and to articulate any required extension in order to support semantic queries.

L. Guan, P. Muneesawang, J. Lay, T. Amin, I. Lee

Vector Color Image Indexing and Retrieval within A Small-World Framework

Abstract

In this chapter, we present a novel and robust scheme for extracting, indexing and retrieving color image data. We use color segmentation to extract regions of prominent and perceptually relevant color and use representative vectors from these extracted regions in the image indices. Our similarity measure for retrieval is based on the angular distance between query color vectors and the indexed representative vectors. Furthermore, we extend small world theory and present an alternative approach to centralized image indices using a distributed rationale where images are not restricted to reside locally but can be located anywhere on a network.

D. Androutsos, P. Androutsos, K. N. Plataniotis, A. N. Venetsanopoulos

A Perceptual Subjectivity Notion in Interactive Content-Based Image Retrieval Systems

Abstract

This chapter presents a new framework called fuzzy relevance feedback in interactive content-based image retrieval (CBIR) systems. Conventional binary labeling scheme in relevance feedback requires a hard-decision to be made on the relevance of each retrieved image. This is inflexible as user interpretation varies with respect to different information needs and perceptual subjectivity. In addition, users tend to learn from the retrieval results to further refine their information priority. It is, therefore, inadequate to describe the users’ fuzzy perception of image similarity with crisp logic. In view of this, a fuzzy framework is introduced to integrate the users’ imprecise interpretation of visual contents into relevance feedback. An efficient learning approach is developed using a fuzzy radial basis function network (FRBFN). The network is constructed based on hierarchical clustering algorithm. The underlying network parameters are optimized by adopting a gradient-descent-based training strategy due to its computational efficiency. Experimental results using a database of 10,000 images demonstrate the effectiveness of the proposed method.

Kui Wu, Kim-Hui Yap

A Scalable Bootstrapping Framework for Auto-Annotation of Large Image Collections

Abstract

Image annotation aims to assign semantic concepts to images based on their visual contents. It has received much attention recently as huge dynamic collections of images/videos become available on the Web. Most recent approaches employ supervised learning techniques, which have the limitation that a large set of labeled training samples is required for effective learning. This is both tedious and time consuming to obtain. This chapter explores the use of a bootstrapping framework to tackle this problem by employing three complementary strategies. First, we train two “view independent” classifiers based on probabilistic SVM using two orthogonal sets of content features and incorporate the classifiers in the co-training framework to annotate regions. Second, at the image level, we employ two different segmentation methods to segment the image into different sets of possibly overlapping regions and devise a contextual model to disambiguate the concepts learned from different regions. Third, we incorporate active learning in order to ensure that the framework is scalable to large image collections. Our experiments on a mid-sized image collection demonstrate that our bootstrapping cum active learning framework is effective. As compared to the traditional supervised learning approach, it is able to improve the accuracy of annotation by over 4% in F₁ measure without active learning, and by over 18% when active learning is incorporated. Most importantly, the bootstrapping framework has the added benefit that it requires only a small set of training samples to kick start the learning process, making it suitable to practical applications.

Tat-Seng Chua, Huamin Feng

Moderate Vocabulary Visual Concept Detection for the TRECVID 2002

Abstract

The explosion in multimodal content availability underlines the necessity for content management at a semantic level. We have cast the problem of detecting semantics in multimedia content as a pattern classification problem and the problem of building models of multimodal semantics as a learning problem. Recent trends show increasing use of statistical machine learning providing a computational framework for mapping low level media features to high level semantic concepts. In this chapter we expose the challenges that these techniques face. We show that if a lexicon of visual concepts is identified a priori, a statistical framework can be used to build visual feature models for the concepts in the lexicon. Using support vector machine (SVM) classification we build models for 34 semantic concepts for the TREC 2002 benchmark corpus. We study the effect of number of examples available for training with respect to their impact on detection. We also examine low level feature fusion as well as parameter sensitivity with SVM classifiers.

Milind R. Naphade, John R. Smith

Automatic Visual Concept Training Using Imperfect Cross-Modality Information

Abstract

In this chapter, we show an autonomous learning scheme to automatically build visual semantic concept models from video sequences or the searched data of Internet search engines without any manual labeling work. First of all, system users specify some specific concept models to be learned automatically. Example videos or images can be obtained from the large video databases based on the result of keyword search on the automatic speech recognition transcripts. Another alternative method is to gather them by using the Internet search engines. Then, we propose to model the searched results as a term of “Quasi-Positive Bags” in the Multiple-Instance Learning (MIL). We call this as the generalized MIL (GMIL). In some of the scenarios, there is also no “Negative Bags” in the GMIL. We propose an algorithm called “Bag K-Means” to find out the maximum Diverse Density (DD) without the existence of negative bags. A cost function is found as K-Means with special “Bag Distance”. We also show a solution called “Uncertain Labeling Density” (ULD) which describes the target density distribution of instances in the case of quasi-positive bags. A “Bag Fuzzy K-Means” is presented to get the maximum of ULD. Utilizing this generalized MIL with ULD framework, the model for a particular concept can then be learned through general supervised learning methods. Experiments show that our algorithm get correct models for the concepts we are interested in.

Xiaodan Song, Ching-Yung Lin, Ming-Ting Sun

Audio-Visual Event Recognition with Application in Sports Video

Abstract

We summarize our recent work on “highlight” events detection and recognition in sports video. We have developed two different joint audio-visual fusion frameworks for this task, namely “audio-visual coupled hidden Markov model” and “audio classification then visual hidden Markov model verification”. Our comparative study of these two frameworks shows that the second approach outperforms the first approach by a large margin. Our study also suggests the importance of modeling the so-called middle-level features such as audience reactions and camera patterns in sports video.

Ziyou Xiong, Regunathan Radhakrishnan, Ajay Divakaran, Thomas S. Huang

Fuzzy Logic Methods for Video Shot Boundary Detection and Classification

Abstract

A fuzzy logic system for the detection and classification of shot boundaries in uncompressed video sequences is presented. It integrates multiple sources of information and knowledge of editing procedures to detect shot boundaries. Furthermore, the system classifies the editing process employed to create the shot boundary into one of the following categories: abrupt cut, fade-in, fade-out, or dissolve. This system was tested on a database containing a wide variety of video classes. It achieved combined recall and precision rates that significantly exceed those of existing threshold-based techniques, and it correctly classified a high percentage of the detected boundaries.

Ralph M. Ford

Rate-Distortion Optimal Video Summarization and Coding

Abstract

The demand for video summarization originates from a viewing time constraint as well as bit budget constraint from communication and storage limitations, in security, military, and entertainment applications. In this chapter we formulate and solve the video summarization problems as rate-distortion optimization problems. Effective new summarization distortion metric is developed. Several optimal algorithms are presented along with some effective heuristic solutions.

Zhu Li, Aggelos K. Katsaggelos, Guido M. Schuster

Video Compression by Neural Networks

Abstract

In this chapter a general overview of most common approaches to video compression is first provided. Standardization issues are briefly discussed and most recent neural compression techniques reviewed. In addition, a particularly effective novel neural paradigm is introduced and described. The new approach is based on a proper quad-tree segmentation of video frames and is capable to yield a considerable improvement with respect to existing standards in high quality video compression. Experimental tests are described to demonstrate the efficacy of the proposed solution.

Daniele Vigliano, Raffaele Parisi, Aurelio Uncini

Knowledge Extraction in Stereo Video Sequences Using Adaptive Neural Networks

Abstract

In this chapter, an adaptive neural network architecture is proposed for efficient knowledge extraction in video sequences. The system is focused on video object segmentation and tracking in stereoscopic video sequences. The proposed scheme includes: (a) a retraining algorithm for adapting the network weights to current conditions, (b) a semantically meaningful object extraction module for creating a retraining set and (c) a decision mechanism, which detects the time instances when a new network retraining is activated. The retraining algorithm optimally adapts network weights by exploiting information of the current conditions with a minimal deviation of the network weights. The algorithm results in the minimization of a convex function subject to linear constraints, and thus, one minimum exists. Description of current conditions is provided by a segmentation fusion scheme, which appropriately combines color and depth information. Experimental results on real-life video sequences are presented to indicate the promising performance of the proposed adaptive neural network-based scheme.

Anastasios Doulamis

An Efficient Genetic Algorithm for Small Search Range Problems and Its Applications

Abstract

Genetic algorithms have been applied to many optimization and search problems and shown to be very efficient. However, the efficiency of genetic algorithms is not guaranteed in those applications where the search space is small, such as the block motion estimation in video coding applications, or equivalently the chromosome length is relatively short, less than 5 for example. Since the characteristics of these small search space applications are far away from that of the conventional search problems in which the common genetic algorithms worked well, new treatments of genetic algorithms for dealing with the small range search problems are therefore of interest. In this paper, the efficiency of the genetic operations of common genetic algorithms, such as crossover and mutation, is analyzed for this special situation. As expected, the so-obtained efficiency/performance of the genetic operations is quite different from that of their traditional counterparts. To fill this gap, a lightweight genetic search algorithm is presented to provide an efficient way for generating near optimal solutions for these kinds of applications. The control overheads of the lightweight genetic search algorithm are very low as compared with that of the conventional genetic algorithms. It is shown by simulations that many computations can be saved by applying the newly proposed algorithm while the search results are still well acceptable.

Ja-Ling Wu, Chun-Hung Lin, Chun-Hsiang Huang

Manifold Learning and Applications in Recognition

Abstract

Great amount of data under varying intrinsic features are empirically thought of as high-dimensional nonlinear manifold in the observation space. With respect to different categories, we present two recognition approaches, i.e. the combination of manifold learning algorithm and linear discriminant analysis (MLA+LDA), and nonlinear auto-associative modeling (NAM). For similar object recognition, e.g. face recognition, MLA + LDA is used. Otherwise, NAM is employed for objects from largely different categories. Experimental results on different benchmark databases show the advantages of the proposed approaches.

Junping Zhang, Stan Z. Li, Jue Wang

Face Recognition Using Discrete Cosine Transform and RBF Neural Networks

Abstract

In this chapter, an efficient method for face recognition based on the Discrete Cosine Transform (DCT), the Fisher’s Linear Discriminant (FLD) and Radial Basis Function (RBF) neural networks is presented. First, the dimensionality of the original face image is reduced by using the DCT and large area illumination variations are alleviated by discarding the first few low-frequency DCT coefficients. Next, the truncated DCT coefficient vectors are clustered using the proposed clustering algorithm. This process makes the subsequent FLD more efficient. After implementing the FLD, the most discriminating and invariant facial features are maintained and the training samples are clustered well. As a consequence, further parameter estimation for the RBF neural networks is fulfilled easily which facilitates fast training in the RBF neural networks. Simulation results show that the proposed system achieves excellent performance with high training and recognition speed and recognition rate as well as very good illumination robustness.

Weilong Chen, Meng Joo Er, Shiqian Wu

Probabilistic Reasoning for Closed-Room People Monitoring

Abstract

In this chapter, we present a probabilistic reasoning approach to recognizing people entering and leaving a closed room by exploiting low-level visual features and high-level domain-specific knowledge. Specifically, people in the view of a monitoring camera are first detected and tracked so that their color and facial features can be extracted and analyzed. Then, recognition of people is carried out using a mapped feature similarity measure and exploiting the temporal correlation and constraints among each sequence of observations. The optimality of recognition is achieved in the sense of maximizing the joint posterior probability of the multiple observations. Experimental results of real and synthetic data are reported to show the effectiveness of the proposed approach.

Ji Tao, Yap-Peng Tan

Human-Machine Communication by Audio-Visual Integration

Abstract

The use of audio-visual information is inevitable in human communication. Complementary usage of audio-visual information enables more accurate, robust, natural, and friendly human communication in real environments. These types of information are also required for computers to realize natural and friendly interfaces, which are currently unreliable and unfriendly.

In this chapter, we focus on synchronous multi-modalities, specifically audio information of speech and image information of a face for audio-visual speech recognition, synthesis and translation. Human audio speech and visual speech information both originate from movements of the speech organs triggered by motor commands from the brain. Accordingly, such speech signals represent the information of an utterance in different ways. Therefore, these audio and visual speech modalities have strong correlations and complementary relationships. There is indeed a very strong demand to improve current speech recognition performance. The performance in real environments drastically degrades when speech is exposed to acoustic noise, reverberation and speaking style differences. The integration of audio and visual information is expected to make the system robust and reliable and improve the performance. On the other hand, there is also a demand to improve speech synthesis intelligibility as well. The multi-modal speech synthesis of audio speech and lip-synchronized talking face images can improve intelligibility and naturalness. This chapter first describes audio-visual speech detection and recognition which aim to improve the robustness of speech recognition performance in actual noisy environments in Section 1. Second, a talking face synthesis system based on a 3-D mesh model and an audio-visual speech translation system are introduced. The audiovisual speech translation system recognizes input speech in an original language, translates it into a target language and synthesizes output speech in a target language in Section 2.

Satoshi Nakamura, Tatsuo Yotsukura, Shigeo Morishima

Probabilistic Fusion of Sorted Score Sequences for Robust Speaker Verification

Abstract

Fusion techniques have been widely used in multi-modal biometric authentication systems. While these techniques are mainly applied to combine the outputs of modality-dependent classifiers, they can also be applied to fuse the decisions or scores from a single modality. The idea is to consider the multiple samples extracted from a single modality as independent but coming from the same source. In this chapter, we propose a single-source, multi-sample data-dependent fusion algorithm for speaker verification. The algorithm is data-dependent in that the fusion weights are dependent on the verification scores and the prior score statistics of claimed speakers and background speakers. To obtain the best out of the speaker’s scores, scores from multiple utterances are sorted before they are probabilistically combined. Evaluations based on 150 speakers from a GSM-transcoded corpus are presented. Results show that data-dependent fusion of speaker’s scores is significantly better than the conventional score averaging approach. It was also found that the proposed fusion algorithm can be further enhanced by sorting the score sequences before they are probabilistically combined.

Ming-Cheung Cheung, Man-Wai Mak, Sun-Yuan Kung

Adaptive Noise Cancellation Using Online Self-Enhanced Fuzzy Filters with Applications to Multimedia Processing

Abstract

Adaptive noise cancellation is a significant research issue in multimedia signal processing, which is a widely used technique in teleconference systems, hands-free mobile communications, acoustical echo and feedback cancellation and so on. For the purpose of implementing real-time applications in nonlinear environments, an online self-enhanced fuzzy filter for solving adaptive noise cancellation is proposed. The proposed online self-enhanced fuzzy filter is based on radial-basis-function networks and functionally is equivalent to the Takagi-Sugeno-Kang fuzzy system. As a prominent feature of the online self-enhanced fuzzy filter, the system is hierarchically constructed and self-enhanced during the training process using a novel online clustering strategy for structure identification. In the process of system construction, instead of selecting the centers and widths of membership functions arbitrarily, an online clustering method is applied to ensure reasonable representation of input terms. It not only ensures proper feature representation, but also optimizes the structure of the filter by reducing the number of fuzzy rules. Moreover, the filter is adaptively tuned to be optimal by the proposed hybrid sequential algorithm for parameters determination. Due to online self-enhanced system construction and hybrid learning algorithm, low computation load and less memory requirements are achieved. This is beneficial for applications in real-time multimedia signal processing.

Meng Joo Er, Zhengrong Li

Image Denoising Using Stochastic Chaotic Simulated Annealing

Abstract

In this Chapter, we present a new approach to image denoising based on a novel optimization algorithm called stochastic chaotic simulated annealing. The original Bayesian framework of image denoising is reformulated into a constrained optimization problem using continuous relaxation labeling. To solve this optimization problem, we then use a noisy chaotic neural network (NCNN), which adds noise and chaos into the Hopfield neural network (HNN) to facilitate efficient searching and to avoid local minima. Experimental results show that this approach can offer good quality solutions to image denoising.

Lipo Wang, Leipo Yan, Kim-Hui Yap

Soft Computation of Numerical Solutions to Differential Equations in EEG Analysis

Abstract

Computational localization and modeling of functional activity within the brain, based on multichannel electroencephalographic (EEG) data are important in basic and clinical neuroscience. One of the key problems in analyzing EEG data is to evaluate surface potentials of a theoretical volume conductor model in response to an internally located current dipole with known parameters. Traditionally, this evaluation has been performed by means of either finite boundary or finite element methods which are computationally demanding. This paper presents a soft computing approach using an artificial neural network (ANN). Off-line training is performed for the ANN to map the forward solutions of the spherical head model to those of a class of spheroidal head models. When the ANN is placed on-line and a set of potential values of the spherical model are presented at the input, the ANN generalizes the knowledge learned during the training phase and produces the potentials of the selected spheroidal model with a desired eccentricity. In this work we investigate theoretical aspects of this soft-computing approach and show that the numerical computation can be formulated as a machining learning problem and implemented by a supervised function approximation ANN. We also show that, for the case of the Poisson equation, the solution is unique and continuous with respect to boundary surfaces. Our experiments demonstrate that this soft-computing approach produces highly accurate results with only a small fraction of the computational cost required by the traditional methods.

Mingui Sun, Xiaopu Yan, Robert J. Sclabassi

Providing Common Time and Space in Distributed AV-Sensor Networks by Self-Calibration

Abstract

Array audio-visual signal processing algorithms require time-synchronized capture of AV-data on distributed platforms. In addition, the geometry of the array of cameras, microphones, speakers and displays is often required. In this chapter we present a novel setup involving network of wireless computing platforms with sensors and actuators onboard, and algorithms that can provide both synchronized I/O and self-localization of the I/O devices in 3D space. The proposed algorithms synchronize input and output for a network of distributed multi-channel audio sensors and actuators connected to general purpose computing platforms (GPCs) such as laptops, PDAs and tablets. IEEE 802.11 wireless network is used to deliver the global clock to distributed GPCs, while the interrupt timestamping mechanism is employed to distribute the clock between I/O devices. Experimental results demonstrate a precision in A/D D/A synchronization precision better than 50 μs (a couple of samples at 48 kHz). We also present a novel algorithm to automatically determine the relative 3D positions of the sensors and actuators connected to GPCs. A closed form approximate solution is derived using the technique of metric multidimensional scaling, which is further refined by minimizing a non-linear error function. Our formulation and solution account for the errors in localization, due to lack of temporal synchronization among different platforms. The performance limit for the sensor positions is analyzed with respect to the number of sensors and actuators as well as their geometry. Simulation results are reported together with a discussion of the practical issues in a real-time system.

R. Lienhart, I. Kozintsev, D. Budnikov, I. Chikalov, V. C. Raykar

Titel: Intelligent Multimedia Processing with Soft Computing
herausgegeben von: Prof. Yap-Peng Tan
Prof. Kim Hui Yap
Prof. Lipo Wang
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-32367-9
Print ISBN: 978-3-540-23053-3
DOI: https://doi.org/10.1007/3-540-32367-8

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Human-Centered Computing for Image and Video Retrieval

Vector Color Image Indexing and Retrieval within A Small-World Framework

A Perceptual Subjectivity Notion in Interactive Content-Based Image Retrieval Systems

A Scalable Bootstrapping Framework for Auto-Annotation of Large Image Collections

Moderate Vocabulary Visual Concept Detection for the TRECVID 2002

Automatic Visual Concept Training Using Imperfect Cross-Modality Information

Audio-Visual Event Recognition with Application in Sports Video

Fuzzy Logic Methods for Video Shot Boundary Detection and Classification

Rate-Distortion Optimal Video Summarization and Coding

Video Compression by Neural Networks

Knowledge Extraction in Stereo Video Sequences Using Adaptive Neural Networks

An Efficient Genetic Algorithm for Small Search Range Problems and Its Applications

Manifold Learning and Applications in Recognition

Face Recognition Using Discrete Cosine Transform and RBF Neural Networks

Probabilistic Reasoning for Closed-Room People Monitoring

Human-Machine Communication by Audio-Visual Integration

Probabilistic Fusion of Sorted Score Sequences for Robust Speaker Verification

Adaptive Noise Cancellation Using Online Self-Enhanced Fuzzy Filters with Applications to Multimedia Processing

Image Denoising Using Stochastic Chaotic Simulated Annealing

Soft Computation of Numerical Solutions to Differential Equations in EEG Analysis

Providing Common Time and Space in Distributed AV-Sensor Networks by Self-Calibration

Premium Partner