Skip to main content

2006 | Buch

Multimedia Content Representation, Classification and Security

International Workshop, MRCS 2006, Istanbul, Turkey, September 11-13, 2006. Proceedings

herausgegeben von: Bilge Gunsel, Anil K. Jain, A. Murat Tekalp, Bülent Sankur

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

We would like to welcome you to the proceedings of MRCS 2006, Workshop on Multimedia Content Representation, Classi?cation and Security, held Sept- ber 11–13, 2006, in Istanbul, Turkey. The goal of MRCS 2006 was to provide an erudite but friendly forum where academic and industrial researchers could interact, discuss emerging multimedia techniques and assess the signi?cance of content representation and security techniques within their problem domains. We received more than 190 submissions from 30 countries. All papers were subjected to thorough peer review. The ?nal decisions were based on the cri- cisms and recommendations of the reviewers and the relevance of papers to the goals of the conference. Only 52% of the papers submitted were accepted for inclusion in the program. In addition to the contributed papers, four distinguished researchers agreed to deliver keynote speeches, namely: – Ed Delp on multimedia security – Pierre Moulin on data hiding – John Smith on multimedia content-based indexing and search – Mar´ ?o A. T. Figueiredo on semi-supervised learning.

Inhaltsverzeichnis

Frontmatter

Invited Talk

Multimedia Security: The Good, the Bad, and the Ugly

In this talk I will described issues related to securing multimedia content. In particular I will discuss why tradition security methods, such as cryptography, do not work. I believe that perhaps too has been promised and not enough has been delivered with respect to multimedia security. I will overview research issues related to data hiding, digital rights management systems, media forensics, and describe how various applications scenarios impact security issues.

Edward J. Delp

Biometric Recognition

Generation and Evaluation of Brute-Force Signature Forgeries

We present a procedure to create brute-force signature forgeries. The procedure is supported by

Sign4J

, a dynamic signature imitation training software that was specifically built to help people learn to imitate the dynamics of signatures. The main novelty of the procedure lies in a feedback mechanism that is provided to let the user know how good the imitation is and on what part of the signature the user has still to improve. The procedure and the software are used to generate a set of brute-force signatures on the MCYT-100 database. This set of forged signatures is used to evaluate the rejection performance of a baseline dynamic signature verification system. As expected, the brute-force forgeries generate more false acceptation in comparison to the random and low-force forgeries available in the MCYT-100 database.

Alain Wahl, Jean Hennebert, Andreas Humm, Rolf Ingold
The Quality of Fingerprint Scanners and Its Impact on the Accuracy of Fingerprint Recognition Algorithms

It is well-known that in any biometric systems the quality of the input data has a strong impact on the accuracy that the system may provide. The quality of the input depends on several factors, such as: the quality of the acquisition device, the intrinsic quality of the biometric trait, the current conditions of the biometric trait, the environment, the correctness of user interaction with the device, etc. Much research is being carried out to quantify and measure the quality of biometric data [1] [2]. This paper focuses on the quality of fingerprint scanners and its aim is twofold: i) measuring the correlation between the different characteristics of a fingerprint scanner and the performance they can assure; ii) providing practical ways to measure such characteristics.

Raffaele Cappelli, Matteo Ferrara, Davide Maltoni
Correlation-Based Similarity Between Signals for Speaker Verification with Limited Amount of Speech Data

In this paper, we present a method for speaker verification with limited amount (2 to 3 secs) of speech data. With the constraint of limited data, the use of traditional vocal tract features in conjunction with statistical models becomes difficult. An estimate of the glottal flow derivative signal which represents the excitation source information is used for comparing two signals. Speaker verification is performed by computing normalized correlation coefficient values between signal patterns chosen around high SNR regions (corresponding to the instants of significant excitation), without having to extract any further parameters. The high SNR regions are detected by locating peaks in the Hilbert envelope of the LP residual signal. Speaker verification studies are conducted on clean microphone speech (TIMIT) as well as noisy telephone speech (NTIMIT), to illustrate the effectiveness of the proposed method.

N. Dhananjaya, B. Yegnanarayana
Human Face Identification from Video Based on Frequency Domain Asymmetry Representation Using Hidden Markov Models

In this paper we introduce a novel human face identification scheme from video data based on a frequency domain representation of facial asymmetry. A Hidden Markov Model (HMM) is used to learn the temporal dynamics of the training video sequences of each subject and classification of the test video sequences is performed using the likelihood scores obtained from the HMMs. We apply this method to a video database containing 55 subjects showing extreme expression variations and demonstrate that the HMM-based method performs much better than identification based on the still images using an Individual PCA (IPCA) classifier, achieving more than 30% improvement.

Sinjini Mitra, Marios Savvides, B. V. K. Vijaya Kumar
Utilizing Independence of Multimodal Biometric Matchers

The problem of combining biometric matchers for person verification can be viewed as a pattern classification problem, and any trainable pattern classification algorithm can be used for score combination. But biometric matchers of different modalities possess a property of the statistical independence of their output scores. In this work we investigate if utilizing this independence knowledge results in the improvement of the combination algorithm. We show both theoretically and experimentally that utilizing independence provides better approximation of score density functions, and results in combination improvement.

Sergey Tulyakov, Venu Govindaraju

Invited Talk

Discreet Signaling: From the Chinese Emperors to the Internet

For thousands of years, humans have sought means to secretly communicate. Today, ad hoc signaling methods are used in applications as varied as digital rights management for multimedia, content identification, authentication, steganography, transaction tracking, and networking. This talk will present an information-theoretic framework for analyzing such problems and designing provably good signaling schemes. Key ingredients of the framework include models for the signals being communicated and the degradations, jammers, eavesdroppers and codebreakers that may be encountered during transmission.

Pierre Moulin

Multimedia Content Security: Steganography/Watermarking/Authentication

Real-Time Steganography in Compressed Video

An adaptive and large capacity steganography method applicable to compressed video is proposed. Unlike still images, video steganography technology must meet the real-time requirement. In this work, embedding and detection are both done entirely in the variable length code (VLC) domain with no need for full or even partial decompression. Also, embedding is guided by several so-called A/S trees adaptively. All of the A/S trees are generated from the main VLC table given in the ISO/IEC13818-2:1995 standard. Experimental results verify the excellent performance of the proposed scheme.

Bin Liu, Fenlin Liu, Bin Lu, Xiangyang Luo
A Feature Selection Methodology for Steganalysis

This paper presents a methodology to select features before training a classifier based on Support Vector Machines (SVM). In this study 23 features presented in [1] are analysed. A feature ranking is performed using a fast classifier called K-Nearest-Neighbours combined with a forward selection. The result of the feature selection is afterward tested on SVM to select the optimal number of features. This method is tested with the Outguess steganographic software and 14 features are selected while keeping the same classification performances. Results confirm that the selected features are efficient for a wide variety of embedding rates. The same methodology is also applied for Steghide and F5 to see if feature selection is possible on these schemes.

Yoan Miche, Benoit Roue, Amaury Lendasse, Patrick Bas
Multiple Messages Embedding Using DCT-Based Mod4 Steganographic Method

This paper proposes an extension of DCT-based Mod4 steganographic method to realize

m

ultiple

m

essages

e

mbedding (MME). To implement MME, we utilize the structural feature of Mod4 that uses vGQC (

v

alid

g

roup of 2 ×2 adjacent quantized DCT

c

oefficients) as message carrier. vGQC’s can be partitioned into several disjoint sets by differentiating the parameters where each set could serve as an individual secret communication channel. A maximum number of 14 independent messages can be embedded into a cover image without interfering one message and another. We can generate stego images with image quality no worse than conventional Mod4. Results for blind steganalysis are also shown.

KokSheik Wong, Kiyoshi Tanaka, Xiaojun Qi
SVD Adapted DCT Domain DC Subband Image Watermarking Against Watermark Ambiguity

In some Singular Value Decomposition (SVD) based watermarking techniques, singular values (SV) of the cover image are used to embed the SVs of the watermark image. In detection, singular vectors of the watermark image are used to construct the embedded watermark. A problem faced with this approach is to obtain the resultant watermark as the image whose singular vectors are used for restoring the watermark, namely, what is searched that is found. In this paper, we propose a Discrete Cosine Transform (DCT) DC subband watermarking technique in SVD domain against this ambiguity by embedding the singular vectors of the watermark image, too, as a control parameter. We give the experimental results of the proposed technique against some attacks.

Erkan Yavuz, Ziya Telatar
3D Animation Watermarking Using PositionInterpolator

For real-time animation, keyframe animation that consists of translation, rotation, scaling interpolator nodes is used widely in 3D graphics. This paper presents 3D keyframe animation watermarking based on vertex coordinates in CoordIndex node and keyvalues in PositionInterpolator node for VRML animation. Experimental results verify that the proposed algorithm has the robustness against geometrical attacks and timeline attacks as well as the invisibility.

Suk-Hwan Lee, Ki-Ryong Kwon, Gwang S. Jung, Byungki Cha
Color Images Watermarking Based on Minimization of Color Differences

In this paper we propose a scheme of watermarking which embeds into a color image a color watermark from the

L

*

a

*

b

*

color space. The scheme resists geometric attacks (e.g., rotation, scaling, etc.,) and, within some limits, JPEG compression. The scheme uses a secret binary pattern to modify the chromatic distribution of the image.

Gaël Chareyron, Alain Trémeau
Improved Pixel-Wise Masking for Image Watermarking

Perceptual watermarking in the wavelet domain has been proposed for a blind spread spectrum technique, taking into account the noise sensitivity, texture and the luminance content of all the image subbands. In this paper, we propose a modified perceptual mask that models the human visual system behavior in a better way. The texture content is appreciated with the aid of the local standard deviation of the original image, which is further compressed in the wavelet domain. Since the approximation image of the last level contains too little information, we choose to appreciate the luminance content using a higher resolution level approximation subimage. The effectiveness of the new perceptual mask is appreciated by comparison with the old watermarking system.

Corina Nafornita, Alexandru Isar, Monica Borda
Additive vs. Image Dependent DWT-DCT Based Watermarking

We compare our earlier additive and image dependent watermarking schemes for digital images and videos. Both schemes employ DWT followed by DCT. Pseudo-random watermark values are added to mid-frequency DWT-DCT coefficients in the additive scheme. In the image dependent scheme, the watermarking coefficients are modulated with original mid-frequency DWT-DCT coefficients to increase the efficiency of the watermark embedding. Schemes are compared to each other and comparison results including Stirmark 3.1 benchmark tests are presented.

Serkan Emek, Melih Pazarci
A Robust Blind Audio Watermarking Using Distribution of Sub-band Signals

In this paper, we propose a statistical audio watermarking scheme based on DWT (Discrete Wavelet Transform). The proposed method selectively classifies high frequency band coefficients into two subsets, referring to low frequency ones. The coefficients in the subsets are modified such that one subset has bigger (or smaller) variance than the other according to the watermark bit to be embedded. As the proposed method modifies the high frequency band coefficients that have higher energy in low frequency band, it can achieve good performances both in terms of the robustness and transparency of watermark. Besides, our watermark extraction process is not only quite simple but also blind method.

Jae-Won Cho, Hyun-Yeol Chung, Ho-Youl Jung
Dirty-Paper Writing Based on LDPC Codes for Data Hiding

We describe a new binning technic for informed data hiding problem. In information theoretical point of view, the blind watermarking problem can be seen as transmitting a secret message

M

through a noisy channel on top of an interfered host signal

S

that is available only at the encoder. We propose an embedding scheme based on Low Density Parity Check(LDPC) codes, in order to quantize the host signal in an intelligent manner so that the decoder can extract the hidden message with a high probability. A mixture of erasure and symmetric error channel is realized for the analysis of the proposed method.

Çagatay Dikici, Khalid Idrissi, Atilla Baskurt
Key Agreement Protocols Based on the Center Weighted Jacket Matrix as a Symmetric Co-cyclic Matrix

In [1], a key agreement protocol between two users, based on the co-cyclic Jacket matrix, was proposed. We propose an improved version of that, based on the same center weighted Jacket matrix but at the point of view that it is a symmetric matrix as well as a co-cyclic matrix. Our new proposal has the same level of the performance of the protocol in [1], and can be used among three users.

Chang-hui Choe, Gi Yean Hwang, Sung Hoon Kim, Hyun Seuk Yoo, Moon Ho Lee
A Hardware-Implemented Truly Random Key Generator for Secure Biometric Authentication Systems

Recent advances in information security requires strong keys which are randomly generated. Most of the keys are generated by the softwares which use software-based random number generators. However, implementing a True Random Number Generator (TRNG) without using a hardware-supported platform is not reliable. In this paper, a biometric authentication system using a FPGA-based TRNG to produce a private key that encrypts the face template of a person is presented. The designed hardware can easily be mounted on standard or embedded PC via its PCI interface to produce random number keys. Random numbers forming the private key is guaranteed to be true because it passes a two-level randomness test. The randomness test is evaluated first on the hardware then on the PC by applying the full NIST test suite. The whole system implements an AES-based encryption scheme to store the person’s secret safely. Assigning a private key which is generated by our TRNG guarantees a unique and truly random password. The system stores the Wavelet Fourier-Mellin Transform (WFMT) based face features in a database with an index number that might be stored on a smart or glossary card. The objective of this study is to present a practical application integrating any biometric technology with a hardware-implemented TRNG.

Murat Erat, Kenan Danışman, Salih Ergün, Alper Kanak

Classification for Biometric Recognition

Kernel Fisher LPP for Face Recognition

Subspace analysis is an effective approach for face recognition. Locality Preserving Projections (LPP) finds an embedding subspace that preserves local structure information, and obtains a subspace that best detects the essential manifold structure. Though LPP has been applied in many fields, it has limitations to solve recognition problem. In this paper, a novel subspace method, called Kernel Fisher Locality Preserving Projections (KFLPP), is proposed for face recognition. In our method, discriminant information with intrinsic geometric relations is preserved in subspace in term of Fisher criterion. Furthermore, complex nonlinear variations of face images, such as illumination, expression, and pose, are represented by nonlinear kernel mapping. Experi-mental results on ORL and Yale database show that the proposed method can improve face recognition performance.

Yu-jie Zheng, Jing-yu Yang, Jian Yang, Xiao-jun Wu, Wei-dong Wang
Tensor Factorization by Simultaneous Estimation of Mixing Factors for Robust Face Recognition and Synthesis

Facial images change appearance due to multiple factors such as poses, lighting variations, facial expressions, etc. Tensor approach, an extension of conventional matrix, is appropriate to analyze facial factors since we can construct multilinear models consisting of multiple factors using tensor framework. However, given a test image, tensor factorization, i.e., decomposition of mixing factors, is a difficult problem especially when the factor parameters are unknown or are not in the training set. In this paper, we propose a novel tensor factorization method to decompose the mixing factors of a test image. We set up a tensor factorization problem as a least squares problem with a quadratic equality constraint, and solve it using numerical optimization techniques. The novelty in our approach compared to previous work is that our tensor factorization method

does not

require any knowledge or assumption of test images. We have conducted several experiments to show the versatility of the method for both face recognition and face synthesis.

Sung Won Park, Marios Savvides
A Modified Large Margin Classifier in Hidden Space for Face Recognition

Considering some limitations of the existing large margin classifier (LMC) and support vector machines (SVMs), this paper develops a modified linear projection classification algorithm based on the margin, termed modified large margin classifier in hidden space (MLMC). MLMC can seek a better classification hyperplane than LMC and SVMs through integrating the within-class variance into the objective function of LMC. Also, the kernel functions in MLMC are not required to satisfy the Mercer’s condition. Compared with SVMs, MLMC can use more kinds of kernel functions. Experiments on the FERET face database confirm the feasibility and effectiveness of the proposed method.

Cai-kou Chen, Qian-qian Peng, Jing-yu Yang
Recognizing Two Handed Gestures with Generative, Discriminative and Ensemble Methods Via Fisher Kernels

Use of gestures extends Human Computer Interaction (HCI) possibilities in multimodal environments. However, the great variability in gestures, both in time, size, and position, as well as interpersonal differences, makes the recognition task difficult. With their power in modeling sequence data and processing variable length sequences, modeling hand gestures using Hidden Markov Models (HMM) is a natural extension. On the other hand, discriminative methods such as Support Vector Machines (SVM), compared to model based approaches such as HMMs, have flexible decision boundaries and better classification performance. By extracting features from gesture sequences via Fisher Kernels based on HMMs, classification can be done by a discriminative classifier. We compared the performance of this combined classifier with generative and discriminative classifiers on a small database of two handed gestures recorded with two cameras. We used Kalman tracking of hands from two cameras using center-of-mass and blob tracking. The results show that (i) blob tracking incorporates general hand shape with hand motion and performs better than simple center-of-mass tracking, and (ii) in a stereo camera setup, even if 3D reconstruction is not possible, combining 2D information from each camera at feature level decreases the error rates, and (iii) Fisher Score methodology combines the powers of generative and discriminative approaches and increases the classification performance.

Oya Aran, Lale Akarun
3D Head Position Estimation Using a Single Omnidirectional Camera for Non-intrusive Iris Recognition

This paper proposes a new method of estimating 3D head positions using a single omnidirectional camera for non-intrusive biometric systems; in this case, non-intrusive iris recognition. The proposed method has two important advantages over previous research. First, previous researchers used the harsh constraint that the ground plane must be orthogonal to the camera’s optical axis. However, the proposed method can detect 3D head positions even in non-orthogonal cases. Second, we propose a new method of detecting head positions in an omnidirectional camera image based on a circular constraint. Experimental results showed that the error between the ground-truth and the estimated 3D head positions was 14.73 cm with a radial operating range of 2-7.5 m.

Kwanghyuk Bae, Kang Ryoung Park, Jaihie Kim
A Fast and Robust Personal Identification Approach Using Handprint

Recently, handprint-based personal identification is widely being researched. Existing identification systems are nearly based on peg or peg-free stretched gray handprint images and most of them only using single feature to implement identification. In contrast to existing systems, color handprint images with incorporate gesture based on peg-free are captured and both hand shape features and palmprint texture features are used to facilitate coarse-to-fine dynamic identification. The wavelet zero-crossing method is first used to extract hand shape features to guide the fast selection of a small set of similar candidates from the database. Then, a modified LoG filter which is robust against brightness is proposed to extract the texture of palmprint. Finally, both global and local texture features of the ROI are extracted for determining the final output from the selected set of similar candidates. Experimental results show the superiority and effectiveness of the proposed approach.

Jun Kong, Miao Qi, Yinghua Lu, Xiaole Liu, Yanjun Zhou
Active Appearance Model-Based Facial Composite Generation with Interactive Nature-Inspired Heuristics

The aim of this study is to automatically generate facial composites in order to match a target face, by using the active appearance model (AAM). The AAM generates a statistical model of the human face from a training set. The model parameters control both the shape and the texture of the face. We propose a system in which a human user interactively tries to optimize the AAM parameters such that the parameters generate the target face. In this study, the optimization problem is handled through using nature-inspired approaches. Experiments with interactive versions of different nature-inspired heuristics are performed. In the interactive versions of these heuristics, users participate in the experiments either by quantifying the solution quality or by selecting the most similar faces. The results of the initial experiments are promising which promote further study.

Binnur Kurt, A. Sima Etaner-Uyar, Tugba Akbal, Nildem Demir, Alp Emre Kanlikilicer, Merve Can Kus, Fatma Hulya Ulu
Template Matching Approach for Pose Problem in Face Verification

In this paper we propose a template matching approach to address the pose problem in face verification, which neither synthesizes the face image, nor builds a model of the face image. Template matching is performed using edginess-based representation of face images. The edginess-based representation of face images is computed using one-dimensional (1-D) processing of images. An approach is proposed based on autoassociative neural network (AANN) models to verify the identity of a person using score obtained from template matching.

Anil Kumar Sao, B. Yegnanaarayana
PCA and LDA Based Face Recognition Using Feedforward Neural Network Classifier

Principal component analysis (PCA) and Linear Discriminant Analysis (LDA) techniques are among the most common feature extraction techniques used for the recognition of faces. In this paper, two face recognition systems, one based on the PCA followed by a feedforward neural network (FFNN) called PCA-NN, and the other based on LDA followed by a FFNN called LDA-NN, are developed. The two systems consist of two phases which are the PCA or LDA preprocessing phase, and the neural network classification phase. The proposed systems show improvement on the recognition rates over the conventional LDA and PCA face recognition systems that use Euclidean Distance based classifier. Additionally, the recognition performance of LDA-NN is higher than the PCA-NN among the proposed systems.

Alaa Eleyan, Hasan Demirel
Online Writer Verification Using Kanji Handwriting

This paper investigates writer verification using handwritten kanji characters on a digitizing tablet. Features representing individuality, which are derived from the knowledge of document examiners, are automatically extracted and then the features effective in writer verification are selected from the extracted features. Two classifiers based on frequency distribution of deviations of the selected features are proposed and evaluated by verification experiments. The experimental results show that the proposal methods are effective in writer verification.

Yoshikazu Nakamura, Masatsugu Kidode
Image Quality Measures for Fingerprint Image Enhancement

Fingerprint image quality is an important factor in the performance of Automatic Fingerprint Identification Systems(AFIS). It is used to evaluate the system performance, assess enrollment acceptability, and evaluate fingerprint sensors. This paper presents a novel methodology for fingerprint image quality measurement. We propose limited ring-wedge spectral measure to estimate the global fingerprint image features, and inhomogeneity with directional contrast to estimate local fingerprint image features. Experimental results demonstrate the effectiveness of our proposal.

Chaohong Wu, Sergey Tulyakov, Venu Govindaraju

Digital Watermarking

A Watermarking Framework for Subdivision Surfaces

This paper presents a robust watermarking scheme for 3D subdivision surfaces. Our proposal is based on a frequency domain decomposition of the subdivision control mesh and on spectral coefficients modulation. The compactness of the cover object (the coarse control mesh) has led us to optimize the trade-off between watermarking redundancy (which insures robustness) and imperceptibility by introducing two contributions: (1) Spectral coefficients are perturbed according to a new modulation scheme analyzing the spectrum shape and (2) the redundancy is optimized by using error correcting codes. Since the watermarked surface can be attacked in a subdivided version, we have introduced a so-called

synchronization

algorithm to retrieve the control polyhedron, starting from a subdivided, attacked version. Through the experiments, we have demonstrated the high robustness of our scheme against both geometry and connectivity alterations.

Guillaume Lavoué, Florence Denis, Florent Dupont, Atilla Baskurt
Naïve Bayes Classifier Based Watermark Detection in Wavelet Transform

Robustness is the one of the essential properties of watermarking schemes. It is the ability to detect the watermark after attacks. A DWT-based semi-blind image watermarking scheme leaves out the low pass band, and embeds a pseudo random number (PRN) sequence (i.e., the watermark) in the other three bands into the coefficients that are higher than a given threshold

T

1

. During watermark detection, all the high pass coefficients above another threshold

T

2

(

T

2

T

1

) are used in correlation with the original watermark. In this paper, we embed a PRN sequence using the same procedure. In detection, however, we apply the Naïve Bayes Classifier, which can predict class membership probabilities, such as the probability that a given image belongs to class

“Watermark Present”

or

“Watermark Absent”.

Experimental results show that the Naïve Bayes Classifier gives very promising results for gray scale images in the wavelet domain watermark detection.

Ersin Elbasi, Ahmet M. Eskicioglu
A Statistical Framework for Audio Watermark Detection and Decoding

This paper introduces an integrated GMM-based blind audio watermark (WM) detection and decoding scheme that eliminates the decision threshold specification problem which constitutes drawback of the conventional decoders. The proposed method models the statistics of watermarked and original audio signals by Gaussian mixture models (GMM) with

K

components. Learning of the WM data is achieved in wavelet domain and a Maximum Likelihood (ML) classifier is designed for the WM decoding. Dimension of the learning space is optimized by PCA transformation. Robustness to compression, additive noise and the Stirmark benchmark attacks has been evaluated. It is shown that both WM decoding and detection performance of the introduced integrated scheme outperforms conventional correlation-based decoders. Test results demonstrate that learning in the wavelet domain improves robustness to attacks while reducing complexity. Although performance of the proposed GMM-modeling is slightly better than the SVM-based decoder introduced in [1], significant decrease in computational complexity makes the new method appealing.

Bilge Gunsel, Yener Ulker, Serap Kirbiz
Resampling Operations as Features for Detecting LSB Replacement and LSB Matching in Color Images

We show that changes to the color distribution statistics induced by resampling operations on color images present useful features for the detection and estimation of embeddings due to LSB steganography. The resampling operations considered in our study are typical operations like zoom-in, zoom-out, rotations and distortions. We show experimental evidence that the features computed from these resampling operations form distinct clusters in pattern space for different levels of embeddings and are amenable to classification using a pattern classifier like SVM. Our method works well not only for LSB Replacement Steganography but also for the LSB Matching approach.

V. Suresh, S. Maria Sophia, C. E. Veni Madhavan
A Blind Watermarking for 3-D Dynamic Mesh Model Using Distribution of Temporal Wavelet Coefficients

In this paper, we present a watermarking method for 3-D mesh sequences with a fixed connectivity. The main idea is to transform each coordinate of vertex with the identical connectivity index along temporal axis using wavelet transform and modify the distribution of wavelet coefficients in temporally high (or middle)-frequency frames according to watermark bit to be embedded. Due to the use of the distribution, our method can retrieve the hidden watermark without any information about original mesh sequences in the process of watermark detection. To increase the watermark capacity, all vertices are divided into groups, namely bins, using the distribution of scaling coefficients in low-frequency frames. As the vertices with the identical connectivity index over whole frames belong to one bin, their wavelet coefficients are also assigned into the same bin. Then, the watermark is embedded into each axis of the wavelet coefficients. Through simulations we show that the proposed is fairly robust against various attacks that are probably concerned in copyright protection of 3-D mesh sequences.

Min-Su Kim, Rémy Prost, Hyun-Yeol Chung, Ho-Youl Jung
Secure Data-Hiding in Multimedia Using NMF

This paper presents a novel data-hiding scheme for multimedia data using non-negative matrix factorization (NMF). Nonnegative feature space (basis matrix) is estimated using the NMF-framework from the sample set of multimedia objects. Subsequently, using a secret key a subspace (basis vector) of the estimated basis matrix is used to decompose the host data for information embedding and detection. Binary dither modulation is used to embed/detect the information into the host signal coefficients. To ensure the fidelity of the embedded information for a given robustness, host media coefficients are selected for information embedding according to the estimated masking threshold. Masking threshold is estimated using the human visual/auditory system (HVS/HAS) and host media. Simulation results show that the proposed NMF-based scheme provides flexible control over robustness and capacity for imperceptible embedding.

Hafiz Malik, Farhan Baqai, Ashfaq Khokhar, Rashid Ansari

Content Analysis and Representation

Unsupervised News Video Segmentation by Combined Audio-Video Analysis

Segmenting news video into stories is among key issues for achieving efficient treatment of news-based digital libraries. In this paper we present a novel unsupervised algorithm that combines audio and video information for automatic partitioning news videos into stories. The proposed algorithm is based on the detection of anchor shots within the video. In particular, a set of audio/video templates of anchorperson shots is first extracted in an unsupervised way, then shots are classified by comparing them to the templates using both video and audio similarity. Finally, a story is obtained by linking each anchor shot with all successive shots until another anchor shot, or the end of the news video, occurs. Audio similarity is evaluated by means of a new index and helps to achieve better performance in anchor shot detection than pure video approach. The method has been tested on a wide database and compared with other state-of-the-art algorithms, demonstrating its effectiveness with respect to them.

M. De Santo, G. Percannella, C. Sansone, M. Vento
Coarse-to-Fine Textures Retrieval in the JPEG 2000 Compressed Domain for Fast Browsing of Large Image Databases

In many applications, the amount and resolution of digital images have significantly increased over the past few years. For this reason, there is a growing interest for techniques allowing to efficiently browse and seek information inside such huge data spaces. JPEG 2000, the latest compression standard from the JPEG committee, has several interesting features to handle very large images. In this paper, these features are used in a coarse-to-fine approach to retrieve specific information in a JPEG 2000 code-stream while minimizing the computational load required by such processing. Practically, a cascade of classifiers exploits the bit-depth and resolution scalability features intrinsically present in JPEG 2000 to progressively refine the classification process. Comparison with existing techniques is made in a texture-retrieval task and shows the efficiency of such approach.

Antonin Descampe, Pierre Vandergheynst, Christophe De Vleeschouwer, Benoit Macq
Labeling Complementary Local Descriptors Behavior for Video Copy Detection

This paper proposes an approach for indexing large collections of videos, dedicated to content-based copy detection. The visual description chosen involves local descriptors based on interest points. Firstly, we propose the joint use of different natures of spatial supports for the local descriptors. We will demonstrate that this combination provides a more representative and then a more informative description of each frame. As local supports, we use the classical Harris detector, added to a detector of local symmetries which is inspired by pre-attentive human vision and then expresses a strong semantic content. Our second contribution consists in enriching such descriptors by characterizing their dynamic behavior in the video sequence: estimating the trajectories of the points along frames allows to highlight trends of behaviors, and then to assign a label of behavior to each local descriptor. The relevance of our approach is evaluated on several hundred hours of videos, with severe attacks. The results obtained clearly demonstrate the richness and the compactness of the new spatio-temporal description proposed.

Julien Law-To, Valérie Gouet-Brunet, Olivier Buisson, Nozha Boujemaa
Motion-Based Segmentation of Transparent Layers in Video Sequences

We present a method for segmenting moving transparent layers in video sequences. We assume that the images can be divided into areas containing at most two moving transparent layers. We call this configuration (which is the mostly encountered one)

bi-distributed transparency

. The proposed method involves three steps: initial block-matching for two-layer transparent motion estimation, motion clustering with 3D Hough transform, and joint transparent layer segmentation and parametric motion estimation. The last step is solved by the iterative minimization of a MRF-based energy function. The segmentation is improved by a mechanism detecting areas containing one single layer. The framework is applied to various image sequences with satisfactory results.

Vincent Auvray, Patrick Bouthemy, Jean Liénard
From Partition Trees to Semantic Trees

This paper proposes a solution to bridge the gap between semantic and visual information formulated as a structural pattern recognition problem. Instances of semantic classes expressed by Description Graphs are detected on a region-based representation of visual data expressed with a Binary Partition Tree. The detection process builds instances of Semantic Trees on the top of the Binary Partition Tree using an encyclopedia of models organised as a hierarchy. At the leaves of the Semantic Tree, classes are defined by perceptual models containing a list of low-level descriptors. The proposed solution is assessed in different environments to show its flexibility.

Xavier Giro, Ferran Marques

3D Object Retrieval and Classification

A Comparison Framework for 3D Object Classification Methods

3D shape classification plays an important role in the process of organizing and retrieving models in large databases. Classifying shapes means to assign a query model to the most appropriate class of objects: knowledge about the membership of models to classes can be very useful to speed up and improve the shape retrieval process, by allowing the reduction of the candidate models to compare with the query.

The main contribution of this paper is the setting of a framework to compare the effectiveness of different query-to-class membership measures, defined independently of specific shape descriptors. The classification performances are evaluated against a set of popular 3D shape descriptors, using a dataset consisting of 14 classes made up of 20 objects each.

S. Biasotti, D. Giorgi, S. Marini, M. Spagnuolo, B. Falcidieno
Density-Based Shape Descriptors for 3D Object Retrieval

We develop a probabilistic framework that computes 3D shape descriptors in a more rigorous and accurate manner than usual histogram-based methods for the purpose of 3D object retrieval. We first use a numerical analytical approach to extract the shape information from each mesh triangle in a better way than the sparse sampling approach. These measurements are then combined to build a probability density descriptor via kernel density estimation techniques, with a rule-based bandwidth assignment. Finally, we explore descriptor fusion schemes. Our analytical approach reveals the true potential of density-based descriptors, one of its representatives reaching the top ranking position among competing methods.

Ceyhun Burak Akgül, Bülent Sankur, Francis Schmitt, Yücel Yemez
ICA Based Normalization of 3D Objects

In this paper, we present a new 3D object normalization technique based on Independent Component Analysis (ICA). Translation and scale are eliminated by first using standard PCA whitening. ICA and the third order moments are then employed for rotation and reflection normalization. The performance of the proposed approach has been tested with range data subjected to noise and other uncertainties. Our method can be used either as a preprocessing for object modelling, or it can directly be used for 3D recognition.

Sait Sener, Mustafa Unel
3D Facial Feature Localization for Registration

Accurate automatical localization of fiducial points in face images is an important step in registration. Although statistical methods of landmark localization reach high accuracies with 2D face images, their performances rapidly deteriorate under illumination changes. 3D information can assist this process by either removing the illumination effects from the 2D image, or by supplying robust features based on depth or curvature. We inspect both approaches for this problem. Our results indicate that using 3D features is more promising than illumination correction with the help of 3D. We complement our statistical feature detection scheme with a structural correction scheme and report our results on the FRGC face dataset.

Albert Ali Salah, Lale Akarun

Representation, Analysis and Retrieval in Cultural Heritage

Paper Retrieval Based on Specific Paper Features: Chain and Laid Lines

This paper presents paper retrieval using the specific paper features chain and laid lines. Paper features are detected in digitized paper images and they are represented such that they could be used for retrieval. Optimal retrieval performance is achieved by means of a trainable similarity measure for a given set of paper features. By means of these methods a retrieval system is developed that art experts could use real-time in order to speed up their paper research.

M. van Staalduinen, J. C. A. van der Lubbe, E. Backer, P. Paclík
Feature Selection for Paintings Classification by Optimal Tree Pruning

In assessing the authenticity of art work it is of high importance from the art expert point of view to understand the reasoning behind it. While complex data mining tools accompanied by large feature sets extracted from the images can bring accuracy in paintings authentication, it is very difficult or not possible to understand their underlying logic. A small feature set linked to a minor classification error seems to be the key to understanding and interpreting the obtained results. In this study the selection of a small feature set for painting classification is done by the means of building an optimal pruned decision tree. The classification accuracy and the possibility of extracting knowledge for this method are analyzed. The results show that a simple small interpretable feature set can be selected by building an optimal pruned decision tree.

Ana Ioana Deac, Jan van der Lubbe, Eric Backer
3D Data Retrieval for Pottery Documentation

Motivated by the requirements of the present archaeology, we are developing an automated system for archaeological classification and reconstruction of ceramics. This paper shows different acquisition techniques in order to get 3D data of pottery and to compute the profile sections of fragments. With the enhancements shown in this paper, archaeologists get a tool to do archaeological documentation of pottery in an automated way.

Martin Kampel

Invited Talk

Multimedia Content-Based Indexing and Search: Challenges and Research Directions

New digital multimedia content is being generated at a tremendous rate. At the same time, the growing variety of distributions channels, e.g., Web, wireless/mobile, cable, IPTV, satellite, is increasing users’ expectations for accessibility and searchability of digital multimedia content. However, users are still finding it difficult to find relevant content and indexing & search are not keeping up with the explosion of content. Recent advances in multimedia content analysis are helping to more effectively tag multimedia content to improve searching, retrieval, repurposing and delivering of relevant content. We are currently developing a system called Marvel that uses statistical machine learning techniques and semantic concept ontologies to model, index and search content using audio, speech and visual content. The benefit is a reduction in manual processing for tagging multimedia content and enhanced ability to unlock the value of large multimedia repositories.

John R. Smith

Content Representation, Indexing and Retrieval

A Framework for Dialogue Detection in Movies

In this paper, we investigate a novel framework for dialogue detection that is based on indicator functions. An indicator function defines that a particular actor is present at each time instant. Two dialogue detection rules are developed and assessed. The first rule relies on the value of the cross-correlation function at zero time lag that is compared to a threshold. The second rule is based on the cross-power in a particular frequency band that is also compared to a threshold. Experiments are carried out in order to validate the feasibility of the aforementioned dialogue detection rules by using ground-truth indicator functions determined by human observers from six different movies. A total of 25 dialogue scenes and another 8 non-dialogue scenes are employed. The probabilities of false alarm and detection are estimated by cross-validation, where 70% of the available scenes are used to learn the thresholds employed in the dialogue detection rules and the remaining 30% of the scenes are used for testing. An almost perfect dialogue detection is reported for every distinct threshold.

Margarita Kotti, Constantine Kotropoulos, Bartosz Ziólko, Ioannis Pitas, Vassiliki Moschou
Music Driven Real-Time 3D Concert Simulation

Music visualization has always attracted interest from people and it became more popular in the recent years after PCs and MP3 songs emerged as an alternative to existing audio systems. Most of the PC-based music visualization tools employ visual effects such as bars, waves and particle animations. In this work we define a new music visualization scheme that aims to create life-like interactive virtual environment which simulates concert arena by combining different research areas such as crowd animation, facial animation, character modeling and audio analysis.

Erdal Yılmaz, Yasemin Yardımcı Çetin, Çiğdem Eroğlu Erdem, Tanju Erdem, Mehmet Özkan
High-Level Description Tools for Humanoids

This paper presents a proposal for description tools, following the MPEG-7 standard, for the high-level description of humanoids. Given the almost complete lack of high-level description tools for 3D graphics content in the current MPEG-7 specification, we propose descriptions aimed at describing virtual humanoids, both for indexing and query support (no extraction tools are presented here), and also for the generation of personalized humanoids using high-level descriptions via a simple GUI instead of complex authoring tools. This later application, which is the focus of the work presented here, is related with the Authoring 744 initiative that targets the creation of content from descriptions that are authored in a user friendly (natural) way. This work is under development within the EU-funded research project OLGA, where the description tools should provide the means for the creation and modification of humanoids inside an on-line 3D gaming environment, but our description tools are generic enough to be used in the future in many different applications: robot portrait, indexing/searching, etc.

Víctor Fernández-Carbajales, José María Martínez, Francisco Morán
Content Adaptation Capabilities Description Tool for Supporting Extensibility in the CAIN Framework

This paper presents Adaptation Capabilities Description proposed for easing the extensibility within the Content Adaptation Integrator (CAIN), an extensible multi-format content adaptation module aimed to provide audiovisual content adaptation based on user, network and platform requirements. This module aims to work transparently and efficiently with several content adaptation approaches such as transcoding or scalable content adaptation.

Víctor Valdés, José M. Martínez
Automatic Cartoon Image Re-authoring Using SOFM

According to the growth of the mobile industry, a lot of on/off-line contents are being converted into mobile contents. Although the cartoon contents especially are one of the most popular mobile contents, it is difficult to provide users with the existing on/off-line contents without any considerations due to the small size of the mobile screen. In existing methods to overcome the problem, the cartoon contents on mobile devices are manually produced by computer software such as Photoshop. In this paper, we automatically produce the cartoon contents fitting for the small screen, and introduce a clustering method useful for variety types of cartoon images as a prerequisite stage for preserving semantic meaning. Texture information which is useful for gray-scale image segmentation gives us a good clue for semantic analysis and self-organizing feature maps (SOFM) is used to cluster similar texture information. Besides we automatically segment the clustered SOFM outputs using agglomerative clustering. In our experimental results, combined approaches show good results of clustering in several cartoons.

Eunjung Han, Anjin Park, Keechul Jung
JPEG-2000 Compressed Image Retrieval Using Partial Entropy Decoding

In this paper, we propose an efficient image retrieval method that extracts features through partial entropy decoding from JPEG-2000 compressed images. Main idea of the proposed method is to exploit the context information that is generated during context-based arithmetic encoding/decoding with three bit-plane coding passes. In the framework of JPEG-2000, the context of a current coefficient is determined depending on pattern of the significance and/or sign of its neighbors. One of nineteen contexts is at least assigned to each bit of wavelet coefficients starting from MSB (most significant bit) to LSB (least significant bit). As the context contains the directional variation of the corresponding coefficient’s neighbors, it represents the local property of image. In the proposed method, the similarity of given two images is measured by the difference between their context histograms in bit-planes. Through simulations, we demonstrate that our method achieves good performance in terms of the retrieval accuracy as well as the computational complexity.

Ha-Joong Park, Ho-Youl Jung
Galois’ Lattice for Video Navigation in a DBMS

Digital visual media encounters many problems related to storage, representation, querying and visual presentation. In this paper, we propose a technique for the retrieval of video from a database on the basis of video shots classified by a Galois’ lattice. The result is a kind of hypermedia that combines both classification and visualization properties in order to navigate between key frames and video segments.

Ibrahima Mbaye, José Martinez, Rachid Oulad Haj Thami
MPEG-7 Based Music Metadata Extensions for Traditional Greek Music Retrieval

The paper presents definition extensions to MPEG-7 metadata ones, mainly related to audio descriptors, introduced to efficiently describe traditional Greek music data features in order to further enable efficient music retrieval. A number of advanced content-based retrieval scenarios have been defined such as query by music rhythm, query by example and humming based on newly introduced music features, query by traditional Greek music genre, chroma-based query. MPEG-7 DDL extensions and appropriate traditional greek music genre dictionary entries are deemed necessary for accounting for the specificities of the music data under question and for attaining efficiency in music information retrieval. The reported work has been undertaken within the framework of an R&D project targeting an advanced music portal offering a number of content-based music retrieval services, namely POLYMNIA [12].

Sofia Tsekeridou, Athina Kokonozi, Kostas Stavroglou, Christodoulos Chamzas

Content Analysis

Recognizing Events in an Automated Surveillance System

Event recognition is probably the ultimate purpose of an automated surveillance system. In this paper, hidden Markov models (HMM) are utilized to recognize the nature of an event occurring in a scene. For this purpose, object trajectories, which are obtained through a successful track, are obtained as a sequence of flow vectors that contain instantaneous velocity and location information. These vectors are clustered by K-means algorithm to obtain a prototype representation. HMMs are trained with sequences obtained from usual motion patterns and abnormality is detected by measuring distances to these models. In order to specify the number of models automatically, a novel approach is proposed which utilizes the clues provided by centroid clustering. Preliminary experimental results are promising for detecting abnormal events.

Birant Örten, A. Aydın Alatan, Tolga Çiloğlu
Support Vector Regression for Surveillance Purposes

This paper addresses the problem of applying powerful statistical pattern classification algorithm based on kernel functions to target tracking on surveillance systems. Rather than directly adapting a recognizer, we develop a localizer directly using the regression form of the Support Vector Machines (SVM). The proposed approach considers to use dynamic model together as feature vectors and makes the hyperplane and the support vectors follow the changes in these features. The performance of the tracker is demonstrated in a sensor network scenario with a constant velocity moving target on a plane for surveillance purpose.

Sedat Ozer, Hakan A. Cirpan, Nihat Kabaoglu
An Area-Based Decision Rule for People-Counting Systems

In this paper, we propose an area-based decision rule for counting the number of people that pass through a given ROI (Region of Interest). This decision rule divides obtained images into 72 sectors and the size of the person is trained to calculate the mean and variance values for each divided sector. These values are then stored in table form and can be used to count people in the future. We also analyze various movements that people perform in the real world. For instance, during busy hours, people frequently merge and split with each other. Therefore, we propose a system for counting the number of passing people more accurately and a way of discovering the direction of their paths.

Hyun Hee Park, Hyung Gu Lee, Seung-In Noh, Jaihie Kim
Human Action Classification Using SVM_2K Classifier on Motion Features

In this paper, we study the human action classification problem based on motion features directly extracted from video. In order to implement a fast classification system, we select simple features that can be obtained from non-intensive computation. We also introduce the new SVM_2K classifier that can achieve improved performance over a standard SVM by combining two types of motion feature vector together. After learning, classification can be implemented very quickly because SVM_2K is a linear classifier. Experimental results demonstrate the method to be efficient and may be used in real-time human action classification systems.

Hongying Meng, Nick Pears, Chris Bailey
Robust Feature Extraction of Speech Via Noise Reduction in Autocorrelation Domain

This paper presents a new algorithm for noise reduction in noisy speech recognition in autocorrelation domain. The autocorrelation domain is an appropriate domain for speech feature extraction due to its pole preserving and noise separation features. Therefore, we have investigated this domain for robust speech recognition.

In our proposed algorithm we have tried to suppress the effect of noise before using this domain for feature extraction. This suppression is carried out by noise autocorrelation sequence estimation from the first few frames in each utterance and subtracting it from the autocorrelation sequence of noisy signal. We tested our method on the Aurora 2 noisy isolated-word task and found its performance superior to that of other autocorrelation-based methods applied to this task.

G. Farahani, S. M. Ahadi, M. M. Homayounpour
Musical Sound Recognition by Active Learning PNN

In this work an active learning PNN was used to recognize instru-mental sounds. LPC and MFCC coefficients with different orders were used as features. The best analysis orders were found by using passive PNNs and these sets were used with active learning PNNs. By realizing some experiments, it was shown that the entire performance was improved by using the active learning algorithm.

Bülent Bolat, Ünal Küçük
Post-processing for Enhancing Target Signal in Frequency Domain Blind Source Separation

The performance of blind source separation (BSS) using independent component analysis (ICA) declines significantly in a reverberant environment. The degradation is mainly caused by the residual crosstalk components derived from the reverberation of the interference signal. A post-processing method is proposed in this paper which uses a approximated Wiener filter using short-time magnitude spectra in the spectral domain. The speech signals have a sparse characteristic in the spectral domain, hence the approximated Wiener filtering can be applied by endowing the difference weights to the other signal components. The results of the experiments show that the proposed method improves the noise reduction ratio(NRR) by about 3dB over conventional FDICA. In addition, the proposed method is compared to the other post-processing algorithm using NLMS algorithm for post-processor [6], and show the better performances of the proposed method.

Hyuntae Kim, Jangsik Park, Keunsoo Park

Feature Extraction and Classification

Role of Statistical Dependence Between Classifier Scores in Determining the Best Decision Fusion Rule for Improved Biometric Verification

Statistical dependence between classifier scores has been shown to affect the verification accuracy for certain decision fusion rules (e.g.,‘majority’, ‘and’, ‘or’). In this paper, we investigate what are the best decision fusion rules for various statistical dependences between classifiers and check whether the best accuracy depends on the statistical dependence. This is done by evaluating accuracy of decision fusion rules on three jointly Gaussian scores with various covariances. It is found that the best decision fusion rule for any given statistical dependence is one of the three major rules – ‘majority’,‘and’, ‘or’. The correlation coefficient between the classifier scores can be used to predict the best decision fusion rule, as well as for evaluation of how well-designed the classifiers are. This can be applied to biometric verification; and it is shown using the NIST 24 fingerprint database and the AR face database that the prediction and evaluation agree.

Krithika Venkataramani, B. V. K. Vijaya Kumar
A Novel 2D Gabor Wavelets Window Method for Face Recognition

This paper proposed a novel algorithm named 2D Gabor Wavelets Window (GWW) method. The GWW scans the image top left to bottom right to extract the local feature vectors (LFVs). A parametric feature vector is derived by downsampling and concatenating these LFVs for face representation and recognition. Compared with the Gabor Wavelets representation of the whole image, the total cost is reduced by maximum of 39% whilst the performance achieved better than the conventional PCA method when experimented on both the ORL and XM2VTSDB databases without any preprocessing.

Lin Wang, Yongping Li, Hongzhou Zhang, Chengbo Wang
An Extraction Technique of Optimal Interest Points for Shape-Based Image Classification

In this paper, we propose an extraction method of optimal interest points to support shape-based image classification and indexing for image database by applying a dynamic threshold that reflects the characteristics of a shape contour. The threshold is dynamically determined by comparing the contour length ratio of the original shape and the approximated polygon while the algorithm is running. Because our algorithm considers the characteristics of the shape contour, it can minimize the number of interest points. For a shape with

n

contour points, this algorithm has the time complexity

O

(

n

log

n

). Our experiments show the average optimization ratio up to 0.92. We expect that features of shapes extracted from the proposed method are used for shape-based image classification, indexing, and similarity search.

Kyhyun Um, Seongtaek Jo, Kyungeun Cho
Affine Invariant Gradient Based Shape Descriptor

This paper presents an affine invariant shape descriptor which could be applied to both binary and gray-level images. The proposed algorithm uses gradient based features which are extracted along the object boundaries. We use two-dimensional steerable G-Filters [1] to obtain gradient information at different orientations. We aggregate the gradients into a shape signature. The signatures derived from rotated objects are shifted versions of the signatures derived from the original object. The shape descriptor is defined as the Fourier transform of the signature. We also provide a distance definition for the proposed descriptor taking shifted property of the signature into account. The performance of the proposed descriptor is evaluated over a database containing license plate characters. The experiments show that the devised method outperforms other well-known Fourier-based shape descriptors such as centroid distance and boundary curvature.

Abdulkerim Çapar, Binnur Kurt, Muhittin Gökmen
Spatial Morphological Covariance Applied to Texture Classification

Morphological covariance, one of the most frequently employed texture analysis tools offered by mathematical morphology, makes use of the sum of pixel values, i.e. “volume” of its input. In this paper, we investigate the potential of alternative measures to volume, and extend the work of Wilkinson (ICPR’02) in order to obtain a new covariance operator, more sensitive to spatial details, namely the

spatial covariance

. The classification experiments are conducted on the publicly available Outex 14 texture database, where the proposed operator leads not only to higher classification scores than standard covariance, but also to the best results reported so far for this database when combined with an adequate illumination invariance model.

Erchan Aptoula, Sébastien Lefèvre

Multimodal Signal Processing

Emotion Assessment: Arousal Evaluation Using EEG’s and Peripheral Physiological Signals

The arousal dimension of human emotions is assessed from two different physiological sources: peripheral signals and electroencephalographic (EEG) signals from the brain. A complete acquisition protocol is presented to build a physiological emotional database for real participants. Arousal assessment is then formulated as a classification problem, with classes corresponding to 2 or 3 degrees of arousal. The performance of 2 classifiers has been evaluated, on peripheral signals, on EEG’s, and on both. Results confirm the possibility of using EEG’s to assess the arousal component of emotion, and the interest of multimodal fusion between EEG’s and peripheral physiological signals.

Guillaume Chanel, Julien Kronegg, Didier Grandjean, Thierry Pun
Learning Multi-modal Dictionaries: Application to Audiovisual Data

This paper presents a methodology for extracting meaningful synchronous structures from multi-modal signals. Simultaneous processing of multi-modal data can reveal information that is unavailable when handling the sources separately. However, in natural high-dimensional data, the statistical dependencies between modalities are, most of the time, not obvious. Learning fundamental multi-modal patterns is an alternative to classical statistical methods. Typically, recurrent patterns are shift invariant, thus the learning should try to find the best matching filters. We present a new algorithm for iteratively learning multi-modal generating functions that can be shifted at all positions in the signal. The proposed algorithm is applied to audiovisual sequences and it demonstrates to be able to discover underlying structures in the data.

Gianluca Monaci, Philippe Jost, Pierre Vandergheynst, Boris Mailhe, Sylvain Lesage, Rémi Gribonval
Semantic Fusion for Biometric User Authentication as Multimodal Signal Processing

Today the application of multimodal biometric systems is a common way to overcome the problems, which come with unimodal systems, such as noisy data, attacks, overlapping of similarities, and non-universality of biometric characteristics. In order to fuse multiple identification sources simultaneously, fusion strategies can be applied on different levels. This paper presents a theoretical concept of a methodology to improve those fusions and strategies independently of their application levels. By extracting and merging certain semantic information and integrating it as additional knowledge (e.g. metadata) into the process the fusion can be potentially improved. Thus, discrepancies and irregularities of one biometric trait can be verified by another one and signal errors can be identified and corrected.

Andrea Oermann, Tobias Scheidat, Claus Vielhauer, Jana Dittmann
Study of Applicability of Virtual Users in Evaluating Multimodal Biometrics

A new approach of enlarging fused biometric databases is presented. Fusion strategies based upon matching score are applied on active biometrics verification scenarios. Consistent biometric data of two traits are used in test scenarios of handwriting and speaker verification. The fusion strategies are applied on multimodal biometrics of two different user types. The

real users

represent two biometric traits captured from one person. The

virtual users

are considered as the combination of two traits captured from two discrete users. These virtual users are implemented for database enlargement. In order to investigate the impact of these virtual users, test scenarios using three different semantics of handwriting and speech are accomplished. The results of fused handwriting and speech of exclusively real users and additional virtual users are compared and discussed.

Franziska Wolf, Tobias Scheidat, Claus Vielhauer

3D Video and Free Viewpoint Video

Accelerating Depth Image-Based Rendering Using GPU

In this paper, we propose a practical method for hardware-accelerated rendering of the depth image-based representation (DIBR) object, which is defined in MPEG-4 Animation Framework eXtension (AFX). The proposed method overcomes the drawbacks of the conventional rendering,

i.e

. it is slow since it is hardly assisted by graphics hardware and surface lighting is static. Utilizing the new features of modern graphic processing unit (GPU) and programmable shader support, we develop an efficient hardware-accelerated rendering algorithm of depth image-based 3D object. Surface rendering in response of varying illumination is performed inside the vertex shader while adaptive point splatting is performed inside the fragment shader. Experimental results show that the rendering speed increases considerably compared with the software-based rendering and the conventional OpenGL-based rendering method.

Man Hee Lee, In Kyu Park
A Surface Deformation Framework for 3D Shape Recovery

We present a surface deformation framework for the problem of 3D shape recovery. A spatially smooth and topologically plausible surface mesh representation is constructed via a surface evolution based technique, starting from an initial model. The initial mesh, representing the bounding surface, is refined or simplified where necessary during surface evolution using a set of local mesh transform operations so as to adapt local properties of the object surface. The final mesh obtained at convergence can adequately represent the complex surface details such as bifurcations, protrusions and large visible concavities. The performance of the proposed framework which is in fact very general and applicable to any kind of raw surface data, is demonstrated on the problem of shape reconstruction from silhouettes. Moreover, since the approach we take for surface deformation is Lagrangian, that can track changes in connectivity and geometry of the deformable mesh during surface evolution, the proposed framework can be used to build efficient time-varying representations of dynamic scenes.

Yusuf Sahillioğlu, Yücel Yemez
Fast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation

A novel approach is presented in order to reject correspondence outliers between frames using the parallax-based rigidity constraint for epipolar geometry estimation. In this approach, the invariance of 3-D relative projective structure of a stationary scene over different views is exploited to eliminate outliers, mostly due to independently moving objects of a typical scene. The proposed approach is compared against a well-known RANSAC-based algorithm by the help of a test-bed. The results showed that the speed-up, gained by utilization of the proposed technique as a preprocessing step before RANSAC-based approach, decreases the execution time of the overall outlier rejection, significantly.

Engin Tola, A. Aydın Alatan
Interactive Multi-view Video Delivery with View-Point Tracking and Fast Stream Switching

We present a 3-D Multi-view video delivery system where each user receives only the streams required for rendering their viewpoint. This paper proposes a novel method to alleviate adverse effects of the unavoidable delay between the time a client requests a new stream and the time it becomes available. To this effect, lower bit-rate versions of a set of adjacent views are also streamed to the viewer in addition to the currently required views. This ensures that the viewer has a low quality version of a view ready and decodable when an unpredicted viewpoint change occurs until the high quality stream arrives. Bandwidth implications and PSNR improvements are reported for various low quality streams encoded at different bit-rates. Performance comparisons of the proposed system with respect to transmitting all views using MVC and only two views with no low quality neighbors are presented.

Engin Kurutepe, M. Reha Civanlar, A. Murat Tekalp
A Multi-imager Camera for Variable-Definition Video (XDTV)

The enabling technologies of increasing PC bus bandwidth, multicore processors, and advanced graphics processors combined with a high-perform-ance multi-image camera system are leading to new ways of considering video. We describe scalable varied-resolution video capture, presenting a novel method of generating multi-resolution dialable-shape panoramas, a line-based calibration method that achieves optimal multi-imager global registration across possibly disjoint views, and a technique for recasting mosaicking homographies for arbitrary planes. Results show synthesis of a 7.5 megapixel (MP) video stream from 22 synchronized uncompressed imagers operating at 30 Hz on a single PC.

H. Harlyn Baker, Donald Tanguay

Invited Talk

On Semi-supervised Learning

In recent years, there has been considerable interest in non-standard learning problems, namely in the so-called semi-supervised learning scenarios. Most formulations of semisupervised learning see the problem from one of two (dual) perspectives: supervised learning (namely, classification) with missing labels; unsupervised learning (namely, clustering) with additional information. In this talk, I will review recent work in these two areas, with special emphasis on our own work. For semi-supervised learning of classifiers, I will describe an approach which is able to incorporate unlabelled data as a regularizer for a (maybe kernel) classifier. Unlike previous approaches, the method is non-transductive, thus computationally inexpensive to use on future data. For semisupervised clustering, I will present a new method, which is able to incorporate pairwise prior information in a computationally efficient way. Finally, I will review recent, as well as potential, applications of semi-supervised learning techniques in multimedia problems.

Mário A. T. Figueiredo

Multimedia Content Transmission and Classification

Secure Transmission of Video on an End System Multicast Using Public Key Cryptography

An approach for securing video transmission on an end system multicast session is described. Existing solutions use encryption techniques that require the use of a shared key. Although they can achieve efficient encryption/decryption and meet the demands of real-time video, a publicly available service needing only the integrity and non-repudiation of the message is not considered. In this study, we offer such a method using public key cryptography. This method can be used in an end system multicast infrastructure where video originates from one source, but spreads with the help of receiving peers. Two different methods are described and compared: 1) Encryption of the entire packet. 2) Encryption of the unique digest value of the transmitted packet (i.e. digitally signing). The receivers then check the integrity of the received packets using the public key provided by the sender. Also, this way the non-repudiation of the transmitted video is provided.

Istemi Ekin Akkus, Oznur Ozkasap, M. Reha Civanlar
DRM Architecture for Mobile VOD Services

This study proposes DRM architecture for VOD streaming services in a mobile environment. The proposed system architecture consists of DRM Client Manager, in which core components for client services are independently constructed to be used in a mobile environment, and DRM server, which provides DRM services. DRM Client Manager independently exists in the client to maximize efficiency and processing capacity in such a mobile environment, and consists of user interface, license management and player. DRM server consists of streaming server for VOD streaming, contents server, license server, and packager. The proposed system has an architecture suitable for a mobile environment that is difficult to process in the existing DRM architecture and considers the process of super-distribution through an independent manager in the client.

Keyword:

DRM(Digital Rights Management), Client Manager, Mobile Environments.

Yong-Hak Ahn, Myung-Mook Han, Byung-Wook Lee
An Information Filtering Approach for the Page Zero Problem

In this paper we present a new approach for interacting with visual document collections. We propose to model user preferences related to visual documents in order to recommend relevant content according to the user’s profile. We have formulated problem as prediction problem and we propose VC-Aspect our flexible mixture model which handles implicit associations between users and the visual features of images. We have implemented the model within a CBIR system and results showed that such approach reduced greatly the page zero problem especially for small devices such as smart-phones and PDAs.

Djemel Ziou, Sabri Boutemedjet
A Novel Model for the Print-and-Capture Channel in 2D Bar Codes

Several models for the print-and-scan channel are available in the literature. We describe a new channel model specifically tuned to the transmission of two-dimensional bar codes and which is suitable not only for scanners, but also for time/space-variant scenarios including web cameras or those embedded in mobile phones. Our model provides an analytical expression for accurately representing the output of the print-and-capture channel, with the additional advantage of directly estimating its parameters from the available captured image, and thus eliminating the need of painstaking training. A full communication system with a two-dimensional bar code has been implemented to experimentally validate the accuracy of the proposed model and the feasibility of reliable transmissions. These experiments confirm that the results obtained with our method outperform those obtained with existing models.

Alberto Malvido, Fernando Pérez-González, Armando Cousiño
On Feature Extraction for Spam E-Mail Detection

Electronic mail is an important communication method for most computer users. Spam e-mails however consume bandwidth resource, fill-up server storage and are also a waste of time to tackle.The general way to label an e-mail as spam or non-spam is to set up a finite set of discriminative features and use a classifier for the detection. In most cases, the selection of such features is empirically verified. In this paper, two different methods are proposed to select the most discriminative features among a set of reasonably arbitrary features for spam e-mail detection. The selection methods are developed using the Common Vector Approach (CVA) which is actually a subspace-based pattern classifier.Experimental results indicate that the proposed feature selection methods give considerable reduction on the number of features without affecting recognition rates.

Serkan Günal, Semih Ergin, M. Bilginer Gülmezoğlu, Ö. Nezih Gerek
Symmetric Interplatory Framelets and Their Erasure Recovery Properties

A new class of wavelet-type frames in signal space that uses (anti)symmetric waveforms is utilized for the development of robust error recovery algorithms for transmitted of rich multimedia content over lossy networks. That algorithms use the redundancy inherent in frame expansions. The construction employs interpolatory filters with rational transfer functions that have linear phase. Experimental results recover images when (as much as) 60% of the expansion coeeficients s are either lost or corrupted. Finally, the frame-based error recovery algorithm is compared with a classical coding approach.

O. Amrani, A. Z. Averbuch, V. A. Zheludev
A Scalable Presentation Format for Multichannel Publishing Based on MPEG-21 Digital Items

In order to experience true Universal Multimedia Access, people want to access their multimedia content anytime, anywhere, and on any device. Several solutions exist which allow content providers to offer video, audio, and graphics to as many devices as possible by using scalable coding techniques. In addition, content providers also need a scalable presentation format to be able to create a presentation once and distribute it to all possible target devices. This paper introduces a scalable presentation format combining MPEG-21 technology with the User Interface Markup Language. The introduced presentation format is based on assigning types to MPEG-21 Digital Items and can be used to create a presentation once, whereupon several device-specific versions can be extracted. The reuse of resource and presentation information together with the use of a device-independent presentation language are hereby the key parameters in the development of the scalable presentation format.

Davy Van Deursen, Frederik De Keukelaere, Lode Nachtergaele, Johan Feyaerts, Rik Van de Walle
X3D Web Service Using 3D Image Mosaicing and Location-Based Image Indexing

We present a method of 3D image mosaicing for effective 3D representation of roadside buildings and implement an X3D-based Web service for the 3D image mosaics generated by the proposed method. A more realistic 3D facade model is developed by employing the multiple projection planes using sparsely distributed feature points and the sharp corner detection using perpendicular distances between a vertical plane and its feature points. In addition, the location-based image indexing enables stable providing of the 3D image mosaics in X3D format over the Web, using tile segmentation and direct reference to memory address for the selective retrieval of the image-slits around user’s location.

Jaechoon Chon, Yang-Won Lee, Takashi Fuse
Adaptive Hybrid Data Broadcast for Wireless Converged Networks

This paper proposes an adaptive hybrid data broadcast scheme for wireless converged networks. Balanced allocation of broadcast resource between Push and Pull and adaptation to the various changes of user request are keys to the successful operation of a hybrid data broadcast system. The proposed scheme is built based on two key features, BEI (Broadcast Efficiency Index) based adaptation and RHPB (Request History Piggy Back) based user request estimation. BEI is an index defined for each data item, and used to determine whether an item should be serviced through Push or Pull. RHPB is an efficient user request sampling mechanism, which utilizes small number of explicit user requests to assess overall user request change. Simulation study shows that the proposed scheme improves responsiveness to user request and resource efficiency by adapting to the various changes of user request.

Jongdeok Kim, Byungjun Bae
Multimedia Annotation of Geo-Referenced Information Sources

We present a solution to the problem of allowing collaborative construction and fruition of annotations on georeferenced information, by combining three Web-enabled applications: a plugin annotating multimedia content, an environment for multimodal interaction, and a WebGIS system. The resulting system is unique in its offering a wealth of possibilities for interacting with geographically based material.

Paolo Bottoni, Alessandro Cinnirella, Stefano Faralli, Patrick Maurelli, Emanuele Panizzi, Rosa Trinchese

Video and Image Processing

Video Synthesis with High Spatio-temporal Resolution Using Spectral Fusion

We propose a novel strategy to obtain a high spatio-temporal resolution video. To this end, we introduce a dual sensor camera that can capture two video sequences with the same field of view simultaneously. These sequences record high resolution with low frame rate and low resolution with high frame rate. This paper presents an algorithm to synthesize a high spatio-temporal resolution video from these two video sequences by using motion compensation and spectral fusion. We confirm that the proposed method improves the resolution and frame rate of the synthesized video.

Kiyotaka Watanabe, Yoshio Iwai, Hajime Nagahara, Masahiko Yachida, Toshiya Suzuki
Content-Aware Bit Allocation in Scalable Multi-view Video Coding

We propose a new scalable multi-view video coding (SMVC) method with content-aware bit allocation among multiple views. The video is encoded off-line with a predetermined number of temporal and SNR scalability layers. Content-aware bit allocation among the views is performed during bitstream extraction by adaptive selection of the number of temporal and SNR scalability layers for each group of pictures (GOP) according to motion and spatial activity of that GOP. The effect of bit allocation among the multiple views on the overall video quality has been studied on a number of training sequences by means of both quantitative quality measures as well as qualitative visual tests. The number of temporal and SNR scalability layers selected as a function of motion and spatial activity measures for the actual test sequences are “learned” from these bit allocation vs. video quality studies on the training sequences. SMVC with content-aware bit allocation among views can be used for multi-view video transport over the Internet for interactive 3DTV. Experimental results are provided on stereo video sequences.

Nükhet Özbek, A. Murat Tekalp
Disparity-Compensated Picture Prediction for Multi-view Video Coding

Multi-view video coding (MVC) is currently being standardized by International Standardization Organization (ISO) and International Telecommunication Union (ITU). Although translation-based motion compensation can be applied to the picture prediction between different cameras, a better prediction exists if the camera parameters are known. This paper analyses the rules between pictures taken at the parallel or arc camera arrangements where the object is facing to an arbitrary direction. Based on the derived rules, block-width, block-slant and block-height compensations are proposed for the accurate picture prediction. A fast disparity vector detection algorithm and an efficient disparity vector compression algorithm are also discussed.

Takanori Senoh, Terumasa Aoki, Hiroshi Yasuda, Takuyo Kogure
Reconstruction of Computer Generated Holograms by Spatial Light Modulators

Computer generated holograms generated by using three different numerical techniques are reconstructed optically by spatial light modulators. Liquid crystal spatial light modulators (SLM) on transmission and on reflection modes with different resolutions were investigated. A good match between numerical simulation and optically reconstructed holograms on both SLMs was observed. The resolution of the optically reconstructed images was comparable to the resolution of the SLMs.

M. Kovachev, R. Ilieva, L. Onural, G. B. Esmer, T. Reyhan, P. Benzie, J. Watson, E. Mitev
Iterative Super-Resolution Reconstruction Using Modified Subgradient Method

Modified subgradient method has been employed to solve super-resolution restoration problem. The technique uses augmented Lagrangians for nonconvex minimization problems with equality constraints. The subgradient of the constructed dual function is used for a measure. Initial results on comparative studies have shown that the technique is very promising.

Kemal Özkan, Erol Seke, Nihat Adar, Selçuk Canbek
A Comparison on Textured Motion Classification

Textured motion – generally known as dynamic or temporal texture – analysis, classification, synthesis, segmentation and recognition is popular research areas in several fields such as computer vision, robotics, animation, multimedia databases etc. In the literature, several algorithms are proposed to characterize these textured motions such as stochastic and deterministic algorithms. However, there is no study which compares the performances of these algorithms. In this paper, we carry out a complete comparison study. Also, improvements to deterministic methods are given.

Kaan Öztekin, Gözde Bozdağı Akar
Schemes for Multiple Description Coding of Stereoscopic Video

This paper presents and compares two multiple description schemes for coding of stereoscopic video, which are based on H.264. The SS-MDC scheme exploits spatial scaling of one view. In case of one channel failure, SS-MDC can reconstruct the stereoscopic video with one view low-pass filtered. SS-MDC can achieve low redundancy (less than 10%) for video sequences with lower inter-view correlation. MS-MDC method is based on multi-state coding and is beneficial for video sequences with higher inter-view correlation. The encoder can switch between these two methods depending on the characteristics of video.

Andrey Norkin, Anil Aksay, Cagdas Bilen, Gozde Bozdagi Akar, Atanas Gotchev, Jaakko Astola
Fast Hole-Filling in Images Via Fast Comparison of Incomplete Patches

We present an algorithm for fast filling of missing regions (holes) in images. Holes may be the result of various causes: manual manipulation e.g. removal of an object from an image, errors in the transmission of an image or video, etc. The hole is filled one pixel at a time by comparing the neighborhood of each pixel to other areas in the image. Similar areas are used as clues for choosing the color of the pixel. The neighborhood and the areas that are compared are square shaped. This symmetric shape allows the hole to be filled in an evenly fashion. However, since square areas inside the hole include some uncolored pixels, we introduce a fast and efficient data structure which allows fast comparison of areas, even with partially missing data. The speed is achieved by using a two phase algorithm: a

learning

phase which can be done offline and a fast

synthesis

phase. The data structure uses the fact that colors in an image can be represented by a bounded natural number. The algorithm fills the hole from the boundaries inward, in a spiral form to produce a smooth and coherent result.

A. Averbuch, G. Gelles, A. Schclar
Range Image Registration with Edge Detection in Spherical Coordinates

In this study, we focus on model reconstruction for 3D objects using range images. We propose a crude range image alignment method to overcome the initial estimation problem of the iterative closest point (ICP) algorithm using edge points of range images. Different from previous edge detection methods, we first obtain a function representation of the range image in spherical coordinates. This representation allows detecting smooth edges on the object surface easily by a zero crossing edge detector. We use ICP on these edges to align patches in a crude manner. Then, we apply ICP to the whole point set and obtain the final alignment. This dual operation is performed extremely fast compared to directly aligning the point sets. We also obtain the edges of the 3D object model while registering it. These edge points may be of use in 3D object recognition and classification.

Olcay Sertel, Cem Ünsalan
Confidence Based Active Learning for Whole Object Image Segmentation

In selective object segmentation, the goal is to extract the entire object of interest without regards to homogeneous regions or object shape. In this paper we present the selective image segmentation problem as a classification problem, and use active learning to train an image feature classifier to identify the object of interest. Since our formulation of this segmentation problem uses human interaction, active learning is used for training to minimize the training effort needed to segment the object. Results using several images with known ground truth are presented to show the efficacy of our approach for segmenting the object of interest in still images. The approach has potential applications in medical image segmentation and content-based image retrieval among others.

Aiyesha Ma, Nilesh Patel, Mingkun Li, Ishwar K. Sethi

Video Analysis and Representation

Segment-Based Stereo Matching Using Energy-Based Regularization

We propose a new stereo matching algorithm through energy-based regularization using color segmentation and visibility constraint. Plane parameters in the entire segments are modeled by robust least square algorithm, which is

LMedS

method. Then, plane parameter assignment is performed by the cost function penalized for occlusion, iteratively. Finally, disparity regularization which considers the smoothness between the segments and penalizes the occlusion through visibility constraint is performed. For occlusion and disparity estimation, we include the iterative optimization scheme in the energy-based regularization. Experimental results show that the proposed algorithm produces comparable performance to the state-of-the-arts especially in the object boundaries, un-textured regions.

Dongbo Min, Sangun Yoon, Kwanghoon Sohn
Head Tracked 3D Displays

It is anticipated that head tracked 3D displays will provide the next generation of display suitable for widespread use. Although there is an extensive range of 3D display types currently available, head tracked displays have the advantage that they present the minimum amount of image information necessary for the perception of 3D. The advantages and disadvantages of the various 3D approaches are considered and a single and a multi-user head tracked display are described. Future work based on the findings of a prototype multi-user display that has been constructed is considered.

Phil Surman, Ian Sexton, Klaus Hopf, Richard Bates, Wing Kai Lee
Low Level Analysis of Video Using Spatiotemporal Pixel Blocks

Low-level video analysis is an important step for further semantic interpretation of the video. This provides information about the camera work, video editing process, shape, texture, color and topology of the objects and the scenes captured by the camera. Here we introduce a framework capable of extracting the information about the shot boundaries and the camera and object motion, based on the analysis of spatiotemporal pixel blocks in a series of video frames. Extracting the motion information and detecting shot boundaries using the same underlying principle is the main contribution of this paper. Besides, this original principle is likely to improve robustness of the abovementioned low-level video analysis as it avoids typical problems of standard frame-based approaches and the camera motion information provides critical help to improve the shot boundary detection performance. The system is evaluated using TRECVID data [1] with promising results.

Umut Naci, Alan Hanjalic
Content-Based Retrieval of Video Surveillance Scenes

A novel method for content-based retrieval of surveillance video data is presented. The study starts from the realistic assumption that the automatic feature extraction is kept simple, i.e. only segmentation and low-cost filtering operations have been applied.

The solution is based on a new and generic dissimilarity measure for discriminating video surveillance scenes. This weighted compound measure can be interactively adapted during a session in order to capture the user’s subjectivity. Upon this, a key-frame selection and a content-based retrieval system have been developed and tested on several actual surveillance sequences. Experiments have shown how the proposed method is efficient and robust to segmentation errors.

Jérôme Meessen, Matthieu Coulanges, Xavier Desurmont, Jean-François Delaigle
Stream-Based Classification and Segmentation of Speech Events in Meeting Recordings

In this paper, we presents a stream-based speech event classification and segmentation method in meeting recordings. Four speech events are considered: normal speech, laughter, cough and pause between talks. hidden Markov Models (HMMs) are used to model these speech events and a model topology optimization using Bayesian Information Criterion (BIC) is applied. Experimental results have shown that our system can obtain satisfying results. Based on the detected speech events, the recording of the meeting is structured using an XML-based description language and is visualized by a browser.

Jun Ogata, Futoshi Asano
Backmatter
Metadaten
Titel
Multimedia Content Representation, Classification and Security
herausgegeben von
Bilge Gunsel
Anil K. Jain
A. Murat Tekalp
Bülent Sankur
Copyright-Jahr
2006
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-39393-1
Print ISBN
978-3-540-39392-4
DOI
https://doi.org/10.1007/11848035

Premium Partner