Skip to main content

2006 | Buch

Advances in Multimedia Information Processing - PCM 2006

7th Pacific Rim Conference on Multimedia, Hangzhou, China, November 2-4, 2006. Proceedings

herausgegeben von: Yueting Zhuang, Shi-Qiang Yang, Yong Rui, Qinming He

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

Welcome to the proceedings of the 7th Paci?c-Rim Conference on Multimedia (PCM2006)heldatZhejiang University,Hangzhou,China,November2-4,2006. Following the success of the previous conferences, PCM 2000 in Sydney, PCM 2001 in Beijing, PCM 2002 in Hsinchu, PCM 2003 in Singapore, PCM 2004 in Tokyo, and PCM 2005 in Jeju, PCM 2006 again brought together researchers, developers, practitioners, and educators in the ?eld of multimedia from around the world. Both theoretical breakthroughs and practical systems were presented at the conference. There were sessions from multimedia retrieval to multimedia coding to multimedia security,coveringa wide spectrum of multimedia research. PCM2006featuredacomprehensiveprogramincludingkeynotetalks,regular paper presentations, and special sessions. We received 755 submissions and the number was the largest among all the PCMs. From such a large number of submissions,weacceptedonly116oralpresentations.Wekindlyacknowledgethe great support provided by the Program Committee members in the reviewing of submissions, as well as the additional reviewers who generously spent many hours. The many useful comments provided by the reviewing process are very useful to authors’ current and future research.

Inhaltsverzeichnis

Frontmatter
Expressive Speech Recognition and Synthesis as Enabling Technologies for Affective Robot-Child Communication

This paper presents our recent and current work on expressive speech synthesis and recognition as enabling technologies for affective robot-child interaction. We show that current expression recognition systems could be used to discriminate between several archetypical emotions, but also that the old adage ”there’s no data like more data” is more than ever valid in this field. A new speech synthesizer was developed that is capable of high quality concatenative synthesis. This system will be used in the robot to synthesize expressive nonsense speech by using prosody transplantation and a recorded database with expressive speech examples. With these enabling components lining up, we are getting ready to start experiments towards hopefully effective child-machine communication of affect and emotion.

Selma Yilmazyildiz, Wesley Mattheyses, Yorgos Patsis, Werner Verhelst
Embodied Conversational Agents: Computing and Rendering Realistic Gaze Patterns

We describe here our efforts for modeling multimodal signals exchanged by interlocutors when interacting face-to-face. This data is then used to control embodied conversational agents able to engage into a realistic face-to-face interaction with human partners. This paper focuses on the generation and rendering of realistic gaze patterns. The problems encountered and solutions proposed claim for a stronger coupling between research fields such as audiovisual signal processing, linguistics and psychosocial sciences for the sake of efficient and realistic human-computer interaction.

Gérard Bailly, Frédéric Elisei, Stephan Raidt, Alix Casari, Antoine Picot
DBN Based Models for Audio-Visual Speech Analysis and Recognition

We present an audio-visual automatic speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system consists of three components: (i) a visual module, (ii) an acoustic module, and (iii) a Dynamic Bayesian Network-based recognition module. The vision module, locates and tracks the speaker head, and mouth movements and extracts relevant speech features represented by contour information and 3D deformations of lip movements. The acoustic module extracts noise-robust features, i.e. the Mel Filterbank Cepstrum Coefficients (MFCCs). Finally we propose two models based on Dynamic Bayesian Networks (DBN) to either consider the single audio and video streams or to integrate the features from the audio and visual streams. We also compare the proposed DBN based system with classical Hidden Markov Model. The novelty of the developed framework is the persistence of the audiovisual speech signal characteristics from the extraction step, through the learning step. Experiments on continuous audiovisual speech show that the segmentation boundaries of phones in the audio stream and visemes in the video stream are close to manual segmentation boundaries.

Ilse Ravyse, Dongmei Jiang, Xiaoyue Jiang, Guoyun Lv, Yunshu Hou, Hichem Sahli, Rongchun Zhao
An Extensive Method to Detect the Image Digital Watermarking Based on the Known Template

There are many types of digital watermarking algorithms, but each type corresponds with a certain detecting method to detect the watermark. However, the embedding method is usually unknown, so that it is not possible to know whether the hidden information exists or not. An extensive digital watermarking detecting method based on the known template is proposed in this paper. This method extracts some feature parameters form the spatial, DCT and DWT domains of the image and template, and then use some detecting strategies on those parameters to detect the watermark. The experiment result shows that the correct detecting rate is more than 97%. Obviously, the extensive digital watermarking detection method can be realized, and the method is valuable in theory and practice.

Yang Feng, Senlin Luo, Limin Pan
Fast Mode Decision Algorithm in H.263+/H.264 Intra Transcoder

In this paper, we proposed a fast mode decision algorithm in transform-domain for H.263+ to H.264 intra transcoder. In the transcoder, the residual signals carried by H.263+ bitstreams are threshold controlled to decide whether we should reuse the prediction direction provided by H.263+ or re-estimate the prediction direction. Then the DCT coefficients in H.263+ bitstreams are converted to H.264 transform coefficients entirely in the transform-domain. Finally, by using the new prediction mode and direction, the H.264 transform residual coefficients are coded to generate the H.264 bitstream. The simulation results show the performance of the proposed algorithm is close to that of a cascaded pixel-domain transcoder (CPDT) while transcoding computation complexity is significantly lower.

Min Li, Guiming He
Binary Erasure Codes for Packet Transmission Subject to Correlated Erasures

We design some simple binary codes that are very well suited to reconstruct erased packets over a transmission medium that is characterized by correlation between subsequent erasures. We demonstrate the effectiveness of these codes for the transmission of video packets for HDTV over a DSL connection.

Frederik Vanhaverbeke, Frederik Simoens, Marc Moeneclaey, Danny De Vleeschauwer
Image Desynchronization for Secure Collusion-Resilient Fingerprint in Compression Domain

Collusion is a major menace to image fingerprint. Recently, an idea is introduced for collusion-resilient fingerprint by desynchronizing images in raw data. In this paper, we consider compression domain image desynchronization method and its system security. First, appropriate desynchronization forms for compression domain are presented; secondly, the system security is discussed and a secure scheme is proposed; thirdly, for evaluating the visual degradation of space desynchronization, we propose a metric called Synchronized Degradation Metric (SDM). Performance analysis including the experiments indicate the effectiveness of the proposed scheme and the metric.

Zhongxuan Liu, Shiguo Lian, Zhen Ren
A Format-Compliant Encryption Framework for JPEG2000 Image Code-Streams in Broadcasting Applications

The increased popularity of multimedia applications such as JPEG2000 places a great demand on efficient data storage and transmission techniques. Unfortunately, traditional encryption techniques have some limits for JPEG2000 images, which are considered only to be common data. In this paper, an efficient secure encryption scheme for JPEG2000 code-streams in broadcasting applications is proposed. The scheme does not introduce superfluous JPEG2000 markers in the protected code-stream and achieves full information protection for data confidentiality. It also deals with the stream data in sequence and is computationally efficient and memory saving.

Jinyong Fang, Jun Sun
Euclidean Distance Transform of Digital Images in Arbitrary Dimensions

A new algorithm for Euclidean distance transform is proposed in this paper. It propagates from the boundary to the inner of object layer by layer, like the inverse propagation of water wave. It can be applied in every dimensional space and has linear time complexity. Euclidean distance transformations of digital images in 2-D and 3-D are conducted in the experiments. Voronoi diagram and Delaunay triangulation can also be produced by this method.

Dong Xu, Hua Li
JPEG2000 Steganography Possibly Secure Against Histogram-Based Attack

This paper presents two steganographic methods for JPEG2000 still images which preserve histograms of discrete wavelet transform (DWT) coefficients. The first one is a histogram quasi- preserving method using quantization index modulation (QIM) with a dead zone in DWT domain. The second one is a histogram preserving method based on histogram matching using two quantizers with a dead zone. Comparing with a conventional JPEG2000 steganography, the two methods show better histogram preservation. The proposed methods are promising candidates for secure JPEG2000 steganography against histogram-based attack.

Hideki Noda, Yohsuke Tsukamizu, Michiharu Niimi
Perceptual Depth Estimation from a Single 2D Image Based on Visual Perception Theory

The depth of image is conventionally defined as the distance between the corresponding scene point of the image and the pinhole of the camera, which is not harmony with the depth perception of human vision. In this paper we define a new perceptual depth of image which is perceived by human vision. The traditional computation models of image depth are all based on the physical imaging model, which ignore the human depth perception. This paper presents a novel computation model based on the visual perception theory. In this approach, we can get the relative perceptual depth from a single 2-D image. Experimental results show that our model is effective and corresponds to the human perception.

Li Bing, Xu De, Feng Songhe, Wu Aimin, Yang Xu
A System for Generating Personalized Virtual News

To improve the degree of immersion for strategic situation representation in strategic war gaming, the concept of virtual news and automatic generation model are presented in this paper. Via analyzing characteristic of news video, the design and generation algorithm for virtual news narrative template are given, which borrow the idea of Natural Language Process and combine the specialties of news video. And the narrative template revise algorithm is also proposed based on time constraints. Virtual News is automatically generated driven by virtual news narrative template, which retrieving relative news segments in multimedia database and selecting appropriate representation method based on the model- EEDU (Extended Entity-Description-Utility). This approach can generate virtual news according to text description about strategic situation provided by users, and furthermore provide personalized service for decision-makers. Finally, experiment results are used to indicate the validity of our system.

Jian-Jun Xu, Jun Wen, Dan-Wen Chen, Yu-Xiang Xie, Ling-Da Wu
Image Fingerprinting Scheme for Print-and-Capture Model

This paper addresses an image fingerprinting scheme for the print-to-capture model performed by a photo printer and digital camera. When capturing an image by a digital camera, various kinds of distortions such as noise, geometrical distortions, and lens distortions are applied slightly and simultaneously. In this paper, we consider several steps to extract fingerprints from the distorted image in print-and capture scenario. To embed ID into an image as a fingerprint, multi-bits embedding is applied. We embed 64 bits ID information as a fingerprint into spatial domain of color images. In order to restore a captured image from distortions a noise reduction filter is performed and a rectilinear tiling pattern is used as a template. To make the template a multi-bits fingerprint is embedded repeatedly like a tiling pattern into the spatial domain of the image. We show that the extracting is successful from the image captured by a digital camera through the experiment.

Won-gyum Kim, Seon Hwa Lee, Yong-seok Seo
16×16 Integer Cosine Transform for HD Video Coding

High-Definition (HD) videos often contain rich details as well as large homogeneous regions. To exploit such a property, Variable Block-size Transforms (VBT) should be in place so that transform block size can adapt to local activities. In this paper, we propose a 16× 16 Integer Cosine Transform (ICT) for HD video coding, which is simple and efficient. This 16×16 ICT is integrated into the AVS Zengqiang Profile and used adaptively as an alternative to the 8×8 ICT. Experimental results show that 16×16 transform can be a very efficient coding tool especially for HD video coding.

Jie Dong, King N. Ngan
Heegard-Berger Video Coding Using LMMSE Estimator

In this paper a novel distributed video coding scheme was proposed based on Heegard-Berger coding theorem, rather than Wyner-Ziv theorem. The main advantage of HB coding is that the decoder can still decode and output a coarse reconstruction, even if side information degrade or absent. And if side information present or upgrade at decoder, a better reconstruction can be achieved. This robust feature can solve the problem lies in Wyner-Ziv video coding that the encoder can hardly decide the bit rate because rate-distortion was affected by the side information known only at the decoder. This feature also leaded to our HB video coding scheme with 2 decoding level of which we first reconstruct a coarse reconstruction frame without side information, and do motion search in previous reconstructed frame to find side information, then reconstruct a fine reconstruction frame through HB decoding again, with side information available.

Xiaopeng Fan, Oscar Au, Yan Chen, Jiantao Zhou
Real-Time BSD-Driven Adaptation Along the Temporal Axis of H.264/AVC Bitstreams

MPEG-21 BSDL offers a solution for exposing the structure of a binary media resource as an XML description, and for the generation of a tailored media resource using a transformed XML description. The main contribution of this paper is the introduction of a real-time work flow for the XML-driven adaptation of H.264/AVC bitstreams in the temporal domain. This real-time approach, which is in line with the vision of MPEG-21 BSDL, is made possible by two key technologies: BFlavor (BSDL + XFlavor) for the efficient generation of XML descriptions and Streaming Transformations for XML (STX) for the efficient transformation of these descriptions. Our work flow is validated in several applications, all using H.264/AVC bitstreams: the exploitation and emulation of temporal scalability, as well as the creation of video skims using key frame selection. Special attention is paid to the deployment of hierarchical B pictures and to the use of placeholder slices for synchronization purposes. Extensive performance data are also provided.

Wesley De Neve, Davy De Schrijver, Davy Van Deursen, Peter Lambert, Rik Van de Walle
Optimal Image Watermark Decoding

Not much has been done in utilizing the available information at the decoder to optimize the decoding performance of watermarking systems. This paper focuses on analyzing different decoding methods, namely, Minimum Distance, Maximum Likelihood and Maximum

a-posteriori

decoding given varying information at the decoder in the blind detection context. Specifically, we propose to employ Markov random fields to model the prior information given the embedded message is a structured logo. The application of these decoding methods in Quantization Index Modulation systems shows that the decoding performance can be improved by Maximum Likelihood decoding that exploits the property of the attack and Maximum

a-posteriori

decoding that utilizes the modeled prior information in addition to the property of the attack.

Wenming Lu, Wanqing Li, Rei Safavi-Naini, Philip Ogunbona
Diagonal Discrete Cosine Transforms for Image Coding

A new block-based DCT framework has been developed recently in[1] in which the first transform may choose to follow a direction other than the vertical or horizontal one – the default direction in the conventional DCT. In this paper, we focus on two diagonal directions because they are visually more important than other directions in an image block (except the vertical and horizontal ones). Specifically, we re-formulate the framework of two diagonal DCTs and use them in combination with the conventional DCT. We will discuss issues such as the directional mode selection and the cross-check of directional modes. Some experimental results are provided to demonstrate the effectiveness of our proposed diagonal DCT’s in image coding applications

.

Jingjing Fu, Bing Zeng
Synthesizing Variational Direction and Scale Texture on Planar Region

Traditional 2D texture synthesis methods mainly focus on seamlessly generating a big size texture, with coherent texture direction and homogeneous texture scale, from an input sample. This paper presents a method of synthesizing texture with variational direction and scale on arbitrary planar region. The user first decomposes the interest region into a set of triangles, on which a vector field is subsequently specified for controlling the direction and scale of the synthesized texture. The variational texture direction and scale are achieved by mapping a suitable texture patch found in the sample via matching a check-mask, which is rotated and zoomed according to the vector field in advance. To account for the texture discontinuity induced by not well matching or different texture directions/scales between adjacent triangles, a feature based boundary optimization technique is further developed. Experimental results show the satisfactory synthesis results.

Yan-Wen Guo, Xiao-Dong Xu, Xi Chen, Jin Wang, Qun-Sheng Peng
Fast Content-Based Image Retrieval Based on Equal-Average K-Nearest-Neighbor Search Schemes

The four most important issues in content-based image retrieval (CBIR) are how to extract features from an image, how to represent these features, how to search the images similar to the query image based on these features as fast as we can and how to perform relevance feedback. This paper mainly concerns the third problem. The traditional features such as color, shape and texture are extracted offline from all images in the database to compose a feature database, each element being a feature vector. The “linear scaling to unit variance” normalization method is used to equalize each dimension of the feature vector. A fast search method named equal-average K nearest neighbor search (EKNNS) is then used to find the first K nearest neighbors of the query feature vector as soon as possible based on the squared Euclidean distortion measure. Experimental results show that the proposed retrieval method can largely speed up the retrieval process, especially for large database and high feature vector dimension.

Zhe-Ming Lu, Hans Burkhardt, Sebastian Boehmer
Characterizing User Behavior to Improve Quality of Streaming Service over P2P Networks

The universal recognition that it is critical to improve the performance of existing systems and protocols with the understanding to practical service experiences motivates us to discuss this issue in the context of peer-to-peer (P2P) streaming. With the benefit of both practical traces from traditional client-server (C/S) service systems and logs from P2P live broadcasting system, in this paper we first characterize end user behaviors in terms of online duration and reveal the statistically positive correlation between elapsed online duration and expected remaining online time. Then we explore the feasibility to improve the quality of streaming service over P2P networks by proposing Low Disruption Tree Construction (LDTC) algorithm to take the online duration information into account when peers self-organize into the service overlay. The experiment results show that LDTC could achieve higher stability of video date delivery tree and in turn improve the quality of streaming service.

Yun Tang, Lifeng Sun, Jianguang Luo, Yuzhuo Zhong
Interacting Activity Recognition Using Hierarchical Durational-State Dynamic Bayesian Network

Activity recognition is one of the most challenging problems in the high-level computer vision field. In this paper, we present a novel approach to interacting activity recognition based on dynamic Bayesian network (DBN). In this approach the features representing the human activities are divided into two classes: global features and local features, which are on two different spatial scales. To model and recognize human interacting activities, we propose a hierarchical durational-state DBN model (HDS-DBN). HDS-DBN combines the global features with local ones organically and reveals structure of interacting activities well. The effectiveness of this approach is demonstrated by experiments.

Youtian Du, Feng Chen, Wenli Xu, Weidong Zhang
Improving the Image Retrieval Results Via Topic Coverage Graph

In the area of image retrieval, search engines are tender to retrieve images that are most relevant to the users’ queries. Nevertheless, in most cases, queries cannot be represented just by several query words. Therefore, it is necessary to provide relevant retrieval results with broad topic-coverage to meet the users’ ambiguous needs. In this paper, a re-ranking method based on topic coverage analysis is proposed to perform the refinement of retrieval results. A graph called Topic Coverage Graph (TCG) is constructed to model the degree of mutual topic coverage among images. Then, Topic Richness Score (TRS), which is calculated based on TCG, is used to measure the importance of each image in improving the topic coverage of image retrieval results. Experimental results on over 20,000 images demonstrate that our proposed approach is effective in improving the topic coverage of retrieval results without loss of relevance.

Kai Song, Yonghong Tian, Tiejun Huang
Relevance Feedback for Sketch Retrieval Based on Linear Programming Classification

Relevance feedback plays as an important role in sketch retrieval as it does in existing content-based retrieval. This paper presents a method of relevance feedback for sketch retrieval by means of Linear Programming (LP) classification. A LP classifier is designed to do online training and feature selection simultaneously. Combined with feature selection, it can select a set of user-sensitive features and perform classification well facing a small number of training samples. Experiments prove the proposed method both effective and efficient for relevance feedback in sketch retrieval.

Bin Li, Zhengxing Sun, Shuang Liang, Yaoye Zhang, Bo Yuan
Hierarchical Motion-Compensated Frame Interpolation Based on the Pyramid Structure

This paper presents a hierarchical motion-compensated frame interpolation (HMCFI) algorithm based on the pyramid structure for high-quality video reconstruction. Conversion between images having different frame rates produces motion jitter and blurring near moving object boundaries. To reduce degradation in video quality, the proposed algorithm performs motion estimation (ME) and motion-compensated frame interpolation (MCFI) at each level of the Gaussian/Laplacian image pyramids. In experiments, the frame rate of the progressive video sequence is up-converted by a factor of two and the performance of the proposed HMCFI algorithm is compared with that of conventional frame interpolation methods.

Gun-Ill Lee, Rae-Hong Park
Varying Microphone Patterns for Meeting Speech Segmentation Using Spatial Audio Cues

Meetings, common to many business environments, generally involve stationary participants. Thus, participant location information can be used to segment meeting speech recordings into each speaker’s ‘turn’. The authors’ previous work proposed the use of spatial audio cues to represent the speaker locations. This paper studies the validity of using spatial audio cues for meeting speech segmentation by investigating the effect of varying microphone pattern on the spatial cues. Experiments conducted on recordings of a real acoustic environment indicate that the relationship between speaker location and spatial audio cues strongly depends on the microphone pattern.

Eva Cheng, Ian Burnett, Christian Ritz
Region-Based Sub-pixel Motion Estimation from Noisy, Blurred, and Down-Sampled Sequences

Motion estimation is one of the most important steps in super-resolution algorithms for a video sequence, which require estimating motion from a noisy, blurred, and down-sampled sequence; therefore the motion estimation has to be robust. In this paper, we propose a robust sub-pixel motion estimation algorithm based on region matching. Non-rectangular regions are first extracted by using a so-called watershed transform. For each region, the best matching region in a previous frame is found to get the integer-pixel motion vector. Then in order to refine the accuracy of the estimated motion vector, we search the eight sub-pixels around the estimated motion vector for a sub-pixel motion vector. Performance of our proposed algorithm is compared with the well known full search with both integer-pixel and sup-pixel accuracy. Also it is compared with the integer-pixel region matching algorithm for several noisy video sequences with various noise variances. The results show that our proposed algorithm is the most suitable for noisy, blurred, and down-sampled sequences among these conventional algorithms.

Osama A. Omer, Toshihisa Tanaka
Differential Operation Based Palmprint Authentication for Multimedia Security

This paper presents a novel approach of palmprint authentication for multimedia security by using the differential operation. In this approach, a differential operation is first conducted to the palmprint image in horizontal direction. And then the palmprint is encoded according to the sign of the value of each pixel of the differential image. This code is called DiffCode of the palmprint. The size of DiffCode is 128 bytes, which is the smallest one among the existing palmprint features and suitable for multimedia security. The similarity of two DiffCode is measured using their Hamming distance. This approach is tested on the public PolyU Palmprint Database and the EER is 0.6%, which is comparable with the existing palmprint recognition methods.

Xiangqian Wu, Kuanquan Wang, David Zhang
A Broadcast Model for Web Image Annotation

Automatic annotation of Web image has great potential in improving the performance of web image retrieval. This paper presents a Broadcast Model (BM) for Web image annotation. In this model, pages are divided into blocks and the annotation of image is realized through the interaction of information from blocks and relevant web pages. Broadcast means each block will receive information (just like signals) from relevant web pages and modify its feature vector according to this information. Compared with most existing image annotation systems, the proposed algorithm utilizes the associated information not only from the page where images locate, but also from other related pages. Based on generated annotations, a retrieval application is implemented to evaluate the proposed annotation algorithm. The preliminary experimental result shows that this model is effective for the annotation of web image and will reduce the number of the result images and the time cost in the retrieval.

Jia Li, Ting Liu, Weiqiang Wang, Wen Gao
An Approach to the Compression of Residual Data with GPCA in Video Coding

Generalized Principle Component Analysis (GPCA) is a global solution to identify a mixture of linear models for signals. This method has been proved to be efficient in compressing natural images. In this paper we try to introduce GPCA into video coding. We focus on encoding residual frames with GPCA in place of classical DCT, and also propose to use it in MCTF based scalable video coding. Experiments show that GPCA really gets better PSNR with the same amount of data components as DCT, and this method is promising in our scalable video coding scheme.

Lei Yao, Jian Liu, Jiangqin Wu
A Robust Approach for Object Recognition

In this paper, we present a robust and unsupervised approach for recognition of object categories, RTSI-pLSA, which overcomes the weakness of TSI-pLSA in recognizing rotated objects in images. Our approach uses radial template to describe spatial information (position, scale and orientation) of an object. A bottom up heuristical and unsupervised scheme is also proposed to estimate spatial parameters of object. Experimental results show the RTSI-pLSA can effectively recognize object categories, especially in recognizing rotated, translated, or scaled objects in images. It lowers the error rate by about 10%, compared with TSI-pLSA. Thus, it is a more robust approach for unsupervised object recognition.

Yuanning Li, Weiqiang Wang, Wen Gao
A Novel Method for Spoken Text Feature Extraction in Semantic Video Retrieval

We propose a novel method for extracting text feature from the automatic speech recognition (ASR) results in semantic video retrieval. We combine HowNet-rule-based knowledge with statistic information to build special concept lexicons, which can rapidly narrow the vocabulary and improve the retrieval precision. Furthermore, we use the term precision (TP) weighting method to analyze ASR texts. This weighting method is sensitive to the sparse but important terms in the relevant documents. Experiments show that the proposed method is effective for semantic video retrieval.

Juan Cao, Jintao Li, Yongdong Zhang, Sheng Tang
A Semantic Image Category for Structuring TV Broadcast Video Streams

TV broadcast video stream consists of various kinds of programs such as sitcoms, news, sports, commercials, weather, etc. In this paper, we propose a semantic image category, named as Program Oriented Informative Images (POIM), to facilitate the segmentation, indexing and retrieval of different programs. The assumption is that most stations tend to insert lead-in/-out video shots for explicitly introducing the current program and indicating the transitions between consecutive programs within TV streams. Such shots often utilize the overlapping of text, graphics, and storytelling images to create an image sequence of POIM as a visual representation for the current program. With the advance of post-editing effects, POIM is becoming an effective indicator to structure TV streams, and also is a fairly common “prop” in program content production. We have attempted to develop a POIM recognizer involving a set of global/local visual features and supervised/unsupervised learning. Comparison experiments have been carried out. A promising result, F1 = 90.2%, has been achieved on a part of TRECVID 2005 video corpus. The recognition of POIM, together with other audiovisual features, can be used to further determine program boundaries.

Jinqiao Wang, Lingyu Duan, Hanqing Lu, Jesse S. Jin
Markov Chain Monte Carlo Super-Resolution Image Reconstruction with Simultaneous Adaptation of the Prior Image Model

In our recent work, the

Markov chain Monte Carlo

(MCMC) technique has been successfully exploited and shown as an effective approach to perform super-resolution image reconstruction. However, one major challenge lies at the selection of the hyperparameter of the prior image model, which affects the degree of regularity imposed by the prior image model, and consequently, the quality of the estimated high-resolution image. To tackle this challenge, in this paper, we propose a novel approach to automatically adapt the model’s hyperparameter during the MCMC process, rather than the exhaustive, off-line search. Experimental results presented show that the proposed hyperparameter adaptation method yields extremely close performance to that of the optimal prior image model case.

Jing Tian, Kai-Kuang Ma
Text Detection in Images Using Texture Feature from Strokes

Text embedded in images or videos is indispensable to understand multimedia information. In this paper we propose a new text detection method using the texture feature derived from text strokes. The method consists of four steps: wavelet multiresolution decomposition, thresholding and pixel labeling, text detection using texture features from strokes, and refinement of mask image. Experiment results show that our method is effective.

Caifeng Zhu, Weiqiang Wang, Qianhui Ning
Robust Mandarin Speech Recognition for Car Navigation Interface

This paper presents a robust automatic speech recognition (ASR) system as multimedia interface for car navigation. In front-end, we use the minimum-mean square error (MMSE) enhancement to suppress the background in-car noise and then compensate the spectrum components distorted by noise over-reduction by smoothing technologies. In acoustic model training, an immunity learning scheme is adopted, in which pre-recorded car noises are artificially added to clean training utterances to imitate the in-car environment. The immunity scheme makes the system robust to both residual noise and speech enhancement distortion. In the context of Mandarin speech recognition, a special issue is the diversification of Chinese dialects, i.e. the pronunciation difference among accents decreases the recognition performance if the acoustic models are trained with an unmatched accented database. We propose to train the models with multiple accented Mandarin databases to solve this problem. The efficiency of the proposed ASR system is confirmed in evaluations.

Pei Ding, Lei He, Xiang Yan, Rui Zhao, Jie Hao
GKDA: A Group-Based Key Distribution Algorithm for WiMAX MBS Security

Multicast and Broadcast Service (MBS) is a novel application supported by the currently released IEEE 802.16e. This service can increase the efficiency of WiMAX networks. The key management and distribution is crucial to evolve MBS efficiently and safely. In this paper, we present a group-based key distribution algorithm GKDA to provide a more scalable solution to reduce the key updating overhead, therefore a base station (BS) can support and maintain more MBS users. Theoretical analyses and simulation results have proven the performance and advantage of our algorithm. In addition, GKDA can strengthen the network security.

Huijie Li, Guangbin Fan, Jigang Qiu, Xiaokang Lin
A Watermarking Algorithm for JPEG File

In this paper, we propose a watermarking algorithm working directly on JPEG bit-stream. The algorithm embeds watermark bits by modifying de-quantized DC coefficients. By improving an existing embedding method for watermark bit, the quality of the watermarked image can be improved greatly while keeping the same robustness of the original method. Further more, we analyze the performance of the watermarking algorithm against re-quantization and recompression. We give the relationship among the watermarking strength, the quality factor of JPEG compression and the BER (Bit Error Rate) of the watermark. Experiment results support the analysis. Compared with several JPEG-based algorithms in literature, the robustness to JPEG recompression of the proposed algorithm is better than most of them when recompression quality factor is above 30.

Hongmei Liu, Huiying Fu, Jiwu Huang
SNR Scalability in H.264/AVC Using Data Partitioning

Although no scalability is explicitly defined in the H.264/ AVC specification, some forms of scalability can be achieved by using the available coding tools in a creative way. In this paper we will explain how to use the data partitioning tool to perform a coarse form of SNR scalability. The impact of various parameters, including the presence of IDR frames and the number of intra-coded macroblocks per frame, on bit rate and bit rate savings and on quality and quality loss will be discussed. Furthermore we will introduce and elaborate a possible use case for the technique proposed in this paper.

Stefaan Mys, Peter Lambert, Wesley De Neve, Piet Verhoeve, Rik Van de Walle
A Real-Time XML-Based Adaptation System for Scalable Video Formats

Scalable bitstreams are used today to contribute to the Universal Multimedia Access (UMA) philosophy, i.e., accessing multimedia anywhere, at anytime, and on any device. Bitstream structure description languages provide means to adapt scalable bitstreams in order to extract a lower quality version. This paper introduces a real-time XML-based framework for content adaptation by relying on BFlavor, a combination of two existing bitstream structure description languages (i.e., the MPEG-21 Bitstream Syntax Description Language (BSDL) and the Formal Language for Audio-Visual Representation extended with XML features (XFlavor)). In order to use BFlavor with state-of-the-art media formats, we have added support for transparent retrieval of context information and support for emulation prevention bytes. These extensions are validated by building a BFlavor code for bitstreams compliant with the scalable extension of the H.264/AVC specification. Performance measurements show that such a bitstream (containing a bitrate of 17 MBit/s) can be adapted in real-time by a BFlavor-based adaptation framework (with a speed of 27 MBit/s).

Davy Van Deursen, Davy De Schrijver, Wesley De Neve, Rik Van de Walle
Generic, Scalable Multimedia Streaming and Delivery with Example Application for H.264/AVC

The ever increasing diversity of multimedia technology presents a growing challenge to interoperability in as new content formats are developed. The Bitstream Binding Language (BBL) addresses this problem by providing a format-independent language to describe how multimedia content is to be delivered. This paper proposes extensions to BBL that enable a generic, scalable streaming server architecture. In this architecture, new content formats are supported by providing a simple file with instructions as to how the software may be streamed. This approach removes any need to modify existing software to provide such support.

Joseph Thomas-Kerr, Ian Burnett, Christian Ritz
Shape-Based Image Retrieval in Botanical Collections

Apart from the computer vision community, an always increasing number of scientific domains show a great interest for image analysis techniques. This interest is often guided by practical needs. As examples, we can cite all the medical imagery systems, the satellites images treatment and botanical databases. A common point of these applications is the large image collections that are generated and therefore require some automatic tools to help the scientists. These tools should allow clear structuration of the visual information and provide fast and accurate retrieval process. In the framework of the plant genes expression study we designed a content-based image retrieval (CBIR) system to assist botanists in their work. We propose a new contour-based shape descriptor that satisfies the constraints of this application (accuracy and real-time search). It is called Directional Fragment Histogram (DFH). This new descriptor has been evaluated and compared to several shape descriptors.

Itheri Yahiaoui, Nicolas Hervé, Nozha Boujemaa
Macroblock Mode Decision Scheme for Fast Encoding in H.264/AVC

To improve coding efficiency, the H.264/AVC video coding standard uses new coding tools, such as variable block size, quarter-pixel-accuracy motion estimation, multiple reference frames, intra prediction and a loop filter. Using these coding tools, H.264/AVC achieves significant improvement in coding efficiency compared with existing standards. However, the encoder complexity also increases tremendously. Among the tools, macroblock mode decision and motion estimation contribute most to total encoder complexity. This paper focuses on complexity reduction in macroblock mode decision. Of the macroblock modes which can be selected, inter8×8 and intra4×4 have the highest complexity. We propose three methods for complexity reduction, one for intra4×4 in intra-frames, one for inter8×8 in inter-frames, and one for intra4×4 in inter-frames. Simulation results show that the proposed methods save about 56.5% of total encoding time compared with the H.264/AVC reference implementation.

Donghyung Kim, Joohyun Lee, Kicheol Jeon, Jechang Jeong
A Mathematical Model for Interaction Analysis Between Multiview Video System and User

Multiview video coding (MVC) plays an important role in three-dimensional audio-video (3DAV) systems. Multiview video display systems are built to provide interactive video services and quality of services (QoS) provided by the system is currently under consideration. MVC encoder uses advanced coding schemes and group of GOP (GoGOP) structure to pursue high compressibility. There is a conflict between compressibility and access ability, i.e., QoS of interaction. In this paper, several evaluation functions are proposed to measure the load and access ability of multiview video system. A nonlinear multipurpose mathematical model based on these functions is provided for interaction analysis. On considering the model, the access ability is a factor to be taken into account for encoder when high compressibility is the primary, and so is the compressibility when trying to achieve high access ability.

You Yang, Gangyi Jiang, Mei Yu, Zhu Peng
Motion Composition of 3D Video

3D video, which is composed of a sequence of mesh models and can provide the user with interactivity, is attracting increasing attention in many research groups. However, it is time-consuming and expensive to generate 3D video sequences. In this paper, a motion composition method is proposed to edit 3D video based on the user’s requirements so that 3D video can be re-used. By analyzing the feature vectors, the hierarchical motion structure is parsed and then a motion database is set up by selecting the representative motions. A motion graph is constructed to organize the motion database by finding the possible motion transitions. Then, the best path is searched based on a proposed cost function by a modified Dijkstra algorithm after the user selects the desired motions in the motion database, which are called key motions in this paper. Our experimental results show the edited 3D video sequence looks natural and realistic.

Jianfeng Xu, Toshihiko Yamasaki, Kiyoharu Aizawa
EKM: An Efficient Key Management Scheme for Large-Scale Peer-to-Peer Media Streaming

Recently media streaming applications via Peer-to-Peer (P2P) overlay networks are getting more and more significant. However, before these applications can be successfully deployed, it is very important to develop efficient access control mechanisms to ensure that only legitimate members can access the media content. Existing schemes of key management and distribution often fail in facing a large-scale group accessing. In this paper, we propose an efficient key management scheme (EKM) for large-scale P2P media streaming applications. It employs the Distributed Hash Table (DHT) technique to build a key distribution overlay network and incorporates a periodical global rekeying mechanism, which is highly scalable and efficient, and is robust against frequently joining/leaving of members. EKM can cut down the overhead of storage and communication on the server side, which can eliminate potential bottleneck of the server. We demonstrate its scalability, efficiency and robustness properties through simulation. Its performance can be examined under real environments by combining EKM with the existing P2P media streaming protocols.

Feng Qiu, Chuang Lin, Hao Yin
Using Earth Mover’s Distance for Audio Clip Retrieval

This paper presents a new approach for audio clip retrieval based on Earth Mover’s Distance (EMD). Instead of using frame-based or salient-based features in most existing methods, our approach propose a segment-based representation, and allows many-to-many matching among audio segments for the clip similarity measure, which is capable of tolerating errors due to audio segmentation and various audio effects. We formulate audio clip retrieval as a graph matching problem in two stages. In the first stage, segment-based feature is employed to represent the audio clips, which can not only capture the change property of audio clip, but also keep and present the change relation and temporal order of audio features. In the second stage, based on the result of the segment similarity measure, a weighted graph is constructed to model the similarity between two clips. EMD is proposed to compute the minimum cost of the weighted graph as the similarity value between two audio clips. Experimental results show that the proposed approach is better than some existing methods in terms of retrieval and ranking capabilities.

Yuxin Peng, Cuihua Fang, Xiaoou Chen
Streaming-Mode MB-Based Integral Image Techniques for Fast Multi-view Video Illumination Compensation

Multi-view video systems are often faced with brightness variations across the multi-perspective captured images. To tackle this problem and maintain the coding performance, block-based illumination compensation (BBIC) methods are recently proposed, where a first order affine BBIC model, consisting of a multiplicative factor and an additive offset, is often adopted. However, so far little attention has been paid to the fast algorithms that can reduce the computational overhead of BBIC. Therefore, we propose a fast image local statistics computation scheme using the technique of integral images, which can largely ease the process of computing the BBIC parameters for any blocks of interest. Moreover, a fast progressive integral image generation scheme, seamlessly integrated in a streaming-mode macroblock-based video coding system, is proposed. The experimental results show that the proposed technique achieves an average six-fold speedup, in comparison to the traditional computation methods under typical conditions.

Jiangbo Lu, Gauthier Lafruit, Francky Catthoor
A Motion Vector Predictor Architecture for AVS and MPEG-2 HDTV Decoder

In the advanced Audio Video coding Standard (AVS), many efficient coding tools are adopted in motion compensation, such as new motion vector prediction, direct mode matching, variable block-sizes etc. However, these features enormously increase the computational complexity and the memory bandwidth requirement and make the traditional MV predictor more complicated. This paper proposes an efficient MV predictor architecture for both AVS and MPEG-2 decoder. The proposed architecture exploits the parallelism to accelerate the speed of operations and uses the dedicated design to optimize the memory access. In addition, it can reuse the on-chip buffer to support the MV error-resilience for MPEG-2 decoding. The design has been described in Verilog HDL and synthesized using 0.18

μ

m CMOS cells library by Design Compiler. The circuit costs about 62k logic gates when the working frequency is set to 148.5MHz. This design can support the real-time MV predictor of HDTV 1080i video decoding for both AVS and MPEG-2.

Junhao Zheng, Di Wu, Lei Deng, Don Xie, Wen Gao
Inter-camera Coding of Multi-view Video Using Layered Depth Image Representation

The multi-view video is a collection of multiple videos, capturing the same scene at different viewpoints. If we acquire multi-view videos from multiple cameras, it is possible to generate scenes at arbitrary view positions. It means that users can change their viewpoints freely and can feel visible depth with view interaction. Therefore, the multi-view video can be used in a variety of applications including three-dimensional TV (3DTV), free viewpoint TV, and immersive broadcasting. However, since the data size of the multi-view video linearly increases as the number of cameras, it is necessary to develop an effective framework to represent, process, and display multi-view video data. In this paper, we propose inter-camera coding methods of multi-view video using layered depth image (LDI) representation. The proposed methods represents various information included in multi-view video hierarchically based on LDI. In addition, we reduce a large amount of multi-view video data to a manageable size by exploiting spatial redundancies among multiple videos and reconstruct the original multiple viewpoints successfully from the constructed LDI.

Seung-Uk Yoon, Eun-Kyung Lee, Sung-Yeol Kim, Yo-Sung Ho, Kugjin Yun, Sukhee Cho, Namho Hur
Optimal Priority Packetization with Multi-layer UEP for Video Streaming over Wireless Network

Most of current packetization schemes consider only bit error or packet erasure, both of which are common in wireless networks. This paper addresses these two problems together, and proposes an optimal packetization scheme for video streaming over wireless network, which is independent of video coding method. To combat the packet erasure, priority packetization combined with multi-layer unequal error protection (UEP) is applied on video frames. Multi-layer UEP contains low-complexity duplication of high-priority packet in application layer and different retransmission limit in media access control layer. Content-aware rate-distortion optimization is also introduced in order to countermine the distortion caused by bit errors. Simulations show that our scheme gains 2.17 dB or more compared with the conventional scheme.

Huanying Zou, Chuang Lin, Hao Yin, Zhen Chen, Feng Qiu, Xuening Liu
A Multi-channel MAC Protocol with Dynamic Channel Allocation in CDMA Ad Hoc Networks

It is a challenging task to design an efficient MAC protocol in Ad Hoc networks due to the lack of central control equipments. In this paper, a new multi-channel MAC protocols with dynamic channel allocation in CDMA Ad Hoc networks, MMAC-DCA, is presented. In MMAC-DCA, the wireless channel is divided into one common sub-channel and L service sub-channels by CDMA mechanism. All the nodes exchange RTS and CTS on the common sub-channel to reserve the service sub-channels for transmission. Different from the MACA/C-T and C-T, the service sub-channels are allocated dynamically in the distributed mode only when a node has a package to transmit. The protocol can reduce the number of spreading codes required and increase the throughput normalized by available bandwidth. In addition, a Markov mode is presented to analyze the performance of this protocol in theory including the normalized throughput and the transfer delay of data packages.

Jigang Qiu, Guangbin Fan, Huijie Li, Xiaokang Lin
Fuzzy Particle Swarm Optimization Clustering and Its Application to Image Clustering

Image classification and clustering is a challenging problem in computer vision. This paper proposed a kind of particle swarm optimization clustering approach: FPSOC to process image clustering problem. This approach considers each particle as a candidate cluster center. The particles fly in the solution space to search suitable cluster centers. This method is different from previous work in that it employs fuzzy concept in particle swarm optimization clustering and adopts attribute selection mechanism to avoid the ‘curse of dimensionality’ problem. The experimental results show that the presented approach can properly process image clustering problem.

Wensheng Yi, Min Yao, Zhiwei Jiang
A New Fast Motion Estimation for H.264 Based on Motion Continuity Hypothesis

H.264 video standard, in spite of its high quality, is too time-consuming for widespread acceptance in video applications, mainly due to its computationally complex motion estimation (ME). To reduce this complexity, we propose motion continuity hypothesis, which means that all motion vectors (MVs) of a block are usually located in a small area. This area is formalized as modified valid region (MVR), an improved version of valid region which is proposed by the present authors in a previous paper. Then, this paper develops a new fastME algorithm for H.264, called MVR-based fast ME (MVRF), which searches only a much smaller area in reference frames(RFs) for motion estimation than full search

H.

264 does, so it reduces up to 43% search pixels. MVRF is so deliberately chosen that on average, up to 98% MVs determined by MVRF coincide with those by full search H.264, therefore keeping the recovery quality and bit-rate almost the same as those of full search H.264.

Juhua Pu, Zhang Xiong, Lionel M. Ni
Statistical Robustness in Multiplicative Watermark Detection

The requirement of robustness is of fundamental importance for all watermarking schemes in various application scenarios. When talking about watermark robustness, we usually mean that the receiver performance degrades smoothly with the attack power. Here we look from another angle, i.e., robustness in statistics. A new detector structure which is robust to small uncertainties in host signal modeling for multiplicative watermarking in the discrete Fourier transform (DFT) domain is presented. By relying on robust statistics theory, an

ε

-contamination model is applied to describe the magnitudes of the DFT spectrum, based on which we are able to derive a minimax detector that is most robust in a well-defined sense. Experiments on real images demonstrate that the new watermark detector performs more stably than classical ones.

Xingliang Huang, Bo Zhang
Adaptive Visual Regions Categorization with Sets of Points of Interest

The Query By Visual Thesaurus (QBVT) paradigm has strongly contributed to the visual information retrieval objective when no starting example is available. The Visual Thesaurus is a representative summary of all the visual patches in the database. Its reliable construction helps the user expression a ”mental image” by composing the visual patches according to the details he has in mind. In this paper, we introduce a relational clustering algorithm (CARD) to build the Visual Thesaurus from regions finely described by variable signature dimensions. The resulting visual categories depict the variability of regions based on local color points of interest. Therefore, we extend first the notion of image matching to regions using non-traditional metrics suitable for the multi-dimensional variables. We also, introduce an appropriate relational clustering for regions categorization using the similarity matrix induced by the latter metrics. Moreover, we propose an efficient method to speed up distance computation and reduce the feature representatives based on adaptive clustering. Our approach was tested on generic images and gives perceptually relevant visual categories.

Hichem Houissa, Nozha Boujemaa, Hichem Frigui
A Publishing Framework for Digitally Augmented Paper Documents: Towards Cross-Media Information Integration

Paper keeps as a key information medium and this has motivated the development of new technologies for digitally augmented paper (DAP) that enable printed content to be linked with multimedia information. Among those technologies, one simplest approach is to print some visible patterns on paper (e.g., barcodes in the margin) as cross-media links. Due to the latest progress in printing industry, some more sophisticated methods have been developed, that is, some kinds of patterns printed on the background of a page in a high resolution are almost invisible and then we are affected little when reading. For all these pattern-embedding based approaches to integrate printed and multimedia information, we aim to present a unified publishing framework independent of particular patterns and readers(e.g., cameras to capture patterns) used to realize DAP. The presented framework manages semantic information about printed documents, multimedia resources, and patterns as links between them and users are provided with a platform for publishing DAP documents.

Xiaoqing Lu, Zhiwu Lu
Web-Based Semantic Analysis of Chinese News Video

The semantic analysis of the Chinese news video with the help of World Wide Web is proposed. First, we segment the news video into a series of story units. Second, we extract the key phrases from the corresponding ASR transcript of news story, and optimize the key phrases through computing both the correlation among key phrases and the correlation between key phrases and event. Finally, we get the news Web-page corresponding to the event from World Wide Web via the search engine, and obtain the information of news video through analyzing the news Web-page. In order to extract effectively the searching key phrases from the ASR transcription containing mistakes, this paper also presents a novel method of optimizing key phrase for searching, the experiment result with the set of Chinese news video (CCTV4_NEWS) from the TRECVID2005 shows that our approach is effective.

Huamin Feng, Zongqiang Pang, Kun Qiu, Guosen Song
A Quality-Controllable Encryption for H.264/AVC Video Coding

During the boosting of networking multimedia applications in recent years, secure transmission of video streams becomes highly demanded by many hot applications, such as confidential video conference and pay-TV. In this paper, we present a quality-controllable encryption method for H.264 coded video streams. Our goal has been to provide an efficient way to scramble the video streams to prevent illegal users from plagiarizing. By making use of the property of H.264 specification that Intra coded blocks are divided into three different types with different sizes, our algorithm provides the flexibility of scrambling the video up to certain level, which may be manually specified by the user or automatically determined by the system according to the networking traffic condition. Our design ensures that even the deepest scrambling level adds trivial performance overhead to the standard H.264 encoding/decoding process.

Guang-Ming Hong, Chun Yuan, Yi Wang, Yu-Zhuo Zhong
Texture Synthesis Based on Minimum Energy Cut and Its Applications

In this paper, a simple but efficient texture synthesis algorithm is presented. New image is synthesized by a patch-based approach. Motivated by energy equation, the method can manipulate the overlap region perfectly. After the most reasonable cut path through overlap regions is found, satisfying resultant images whose size specified by user can be produced. As a general method, our algorithm is also applied to image composition and texture transfer—rendering a target image with given source texture image. Experiments show that our algorithm is very efficient and easy to implement....

Shuchang Xu, Xiuzi Ye, Yin Zhang, Sanyuan Zhang
Unifying Keywords and Visual Features Within One-Step Search for Web Image Retrieval

The multi-modal characteristics of Web image make it possible to unify keywords and visual features for image retrieval in Web context. Most of the existing methods about the integration of these two features focus on the interactive relevance feedback technique, which needs the user’s interaction (i.e. a two-step interactive search). In this paper, an approach based on association rule and clustering techniques is proposed to unify keywords and visual features in a different manner, which seamlessly implements the integration within one-step search. The proposed approach considers both

Query By Keyword

(QBK) mode and

Query By Example

(QBE) mode and need not the user’s interaction. The experiment results show the proposed approach remarkably improve the retrieval performance compared with the pure search only based on keywords or visual features, and achieve a retrieval performance approximate to the two-step interactive search without requiring the user’s additional interaction.

Ruhan He, Hai Jin, Wenbing Tao, Aobing Sun
Dynamic Bandwidth Allocation for Stored Video Under Renegotiation Frequency Constraint

In this paper, a dynamic bandwidth allocation algorithm is proposed for stored video transmission with renegotiation interval constraint. It is to handle the problem of short renegotiation intervals in optimal smoothing algorithms[4,5], which may increase the renegotiation cost or cause renegotiation failures. Based on the transmission rate bounds derived from buffer constraints, a transmission segment is calculated based on the optimal smoothing algorithm [5]. If the length of the segment is less than the minimum renegotiation interval, it is merged to the neighboring segment considering the relation between the transmission rates of neighboring segments by allowing encoder buffer underflows. From the simulation results, the proposed algorithm is shown to keep the renegotiation intervals larger than the minimum and the renegotiation cost is greatly reduced with slight decrease in the channel utilization.

Myeong-jin Lee, Kook-yeol Yoo, Dong-jun Lee
Online Selection of Discriminative Features Using Bayes Error Rate for Visual Tracking

Online feature selection using Bayes error rate is proposed to address visual tracking problem, where the appearances of the tracked object and scene background change during tracking. Given likelihood functions of the object and background with respect to a feature, Bayes error rate is a natural way to evaluate discriminating power of the feature. From previous frame, object and background pixels are sampled to estimate likelihood functions of each color feature in the form of histogram. Then, all features are ranked according to their Bayes error rate. And the top

N

features with the smallest Bayes error rate are selected to generate a weight map for current frame, where mean shift is employed to find the local mode, and hence, the location of the object. Experimental results on real image sequences demonstrate the effectiveness of the proposed approach.

Dawei Liang, Qingming Huang, Wen Gao, Hongxun Yao
Interactive Knowledge Integration in 3D Cloth Animation with Intelligent Learning System

In this paper, we focus on the parameter identification problem, one of the most essential problems in the 3D cloth animation created by multimedia software. We present a novel interactive parameter identification framework which integrates the industry knowledge. The essential of this paper is that we design a hybrid intelligent learning system using statistical analysis of kawabata evaluation system(KES) data from fabric industry database, fuzzy system and radial basis function(RBF) neural networks. By adopting our method the 3D cloth animator can interactively identify the parameters of cloth simulation with subjective linguistic variables while in the past decades it is very difficult for cloth animators to tune the parameters. We solve the 3D cloth parameter problem using the intelligent knowledge integration method for the first time in the multimedia and graphics research area and our method is applied to the most popular 3D tool Maya. The experimental results illustrate the practicability and expansibility of this method.

Chen Yujun, Wang Jiaxin, Yang Zehong, Song Yixu
Multi-view Video Coding with Flexible View-Temporal Prediction Structure for Fast Random Access

Multi-view video is becoming increasingly popular, as it provides users greatly enhanced viewing experience. Multi-view video coding (MVC) focuses on exploiting not only the temporal correlation among the adjacent pictures for each view, but also inter-view correlation. Though the coding efficiency is a key target for MVC, the view-temporal prediction structure for improving the compression efficiency usually results in the decoding delay and limits the random access ability. Random access ability is an important feature in MVC because it provides the view switching, temporal frame sweepingly browsing and other interactive abilities for the client users in multi-view video streaming. In this paper, we propose an algorithm to flexibly regulate the viewtemporal prediction structure. It is able to achieve a good trade-off between compression performance and random access ability.

Yanwei Liu, Qingming Huang, Xiangyang Ji, Debin Zhao, Wen Gao
Squeezing the Auditory Space: A New Approach to Multi-channel Audio Coding

This paper presents a novel solution for efficient representation of multi-channel spatial audio signals. Unlike other spatial audio coding techniques, the solution inherently requires no additional side information to recover the surround sound panorama from a two-channel downmix. For a typical five-channel case, only a stereo downmix signal is required for the decoder to reconstruct the full five-channel audio signal. In addition to the bandwidth saved by transmitting no side information, the technique has significant advantages in terms of computational complexity.

Bin Cheng, Christian Ritz, Ian Burnett
Video Coding by Texture Analysis and Synthesis Using Graph Cut

A new approach to analyze and synthesize texture regions in video coding is presented, where texture blocks in video sequence are synthesized using graph cut technique. It first identifies the texture regions by video segmentation technique, and then calculates their motion vectors by motion vector (MV) scaling technique like temporal direct mode. After the correction of these MVs, texture regions are predicted from forward and/or backward reference frames by the corrected MVs. Furthermore, Overlapped Block Motion Compensation (OBMC) is applied to these texture regions to reduce block artifacts. Finally, the texture blocks are stitched together along optimal seams to reconstruct the current texture block using graph cuts. Experimental results show that the proposed method can achieve compared visual quality for texture regions with H.264/AVC, while spending fewer bits.

Yongbing Zhang, Xiangyang Ji, Debin Zhao, Wen Gao
Multiple Description Coding Using Adaptive Error Recovery for Real-Time Video Transmission

Real-time video transmission over packet networks faces several challenges such as limited bandwidth and packet loss. Multiple description coding (MDC) is an efficient error-resilient tool to combat the problem of packet loss. The main problem of MDC is the mismatch of reference frames in encoder and decoder, when some descriptions are lost during transmission. This paper presents an adaptive error recovery (AER) scheme for multiple description video coding. The proposed AER scheme, which is based on statistical analysis, can adaptively determine the nearly optimal error recovery (ER) method among our predefined ER methods such as interpolation, block replacement and motion vector (MV) reusing. The AER scheme has three advantages. First, it efficiently reduces the mismatch error. Second, it is completely based on pre- post-processing which requires no modification of the source coder. Third, it has low computational complexity, which is suitable for real-time video applications. Simulation results demonstrate that our proposed AER scheme achieves better performance compared with MDC with fixed error recovery (FER) scheme over lossy networks.

Zhi Yang, Jiajun Bu, Chun Chen, Linjian Mo, Kangmiao Liu
An Improved Motion Vector Prediction Scheme for Video Coding

The motion vector prediction (MVP) is an important part of video coding. In the original median predictor, if the neighbor blocks of current block are intra-mode coded, their motion vectors (MVs) will be set to zeros for MVP of current block. This is not very precise for sequences with strong motion. This paper propose an improved motion vector prediction (MVP) scheme for H.264. In the proposed scheme, when there are intra-mode macroblocks beside current block, more MV of the neighbor inter-mode block is utilized instead of zero MVs of intra-mode macroblocks for MVP of current block. The experimental results show that the improved scheme achieves better coding efficiency than the original median predictor. Meanwhile the point obtained by the proposed MVP scheme is closer to the global minimum point, the following fast motion estimation (FME) computation complexity is reduced.

Da Liu, Debin Zhao, Qiang Wang, Wen Gao
Classifying Motion Time Series Using Neural Networks

This paper proposes an effective time-series classification model based on the Neural Networks. Classification under this model consists of three phases, namely

data preprocessing

,

training

, and

testing

. The main contributions of the paper are described as following: We propose a feature extraction algorithm, which involves computation of finite difference of sequences, for preprocessing. We employ two different types of Neural Networks for training and testing. The results of the experiments on real univariate motion capture data and synthetic data show that our approach is effective in providing good performance in terms of accuracy. It is therefore a promising method for classifying time-series, in particular for univariate human motion capture data.

Lidan Shou, Ge Gao, Gang Chen, Jinxiang Dong
Estimating Intervals of Interest During TV Viewing for Automatic Personal Preference Acquisition

The demand for information services considering personal preferences is increasing. In this paper, aiming at the development of a system for automatically acquiring personal preferences from TV viewers’ behaviors, we propose a method for automatically estimating TV viewers’ intervals of interest based on temporal patterns in facial changes with Hidden Markov Models. Experimental results have shown that the proposed method was able to correctly estimate intervals of interest with a precision rate of 86.6% and a recall rate of 80.6%.

Makoto Yamamoto, Naoko Nitta, Noboru Babaguchi
Image Annotations Based on Semi-supervised Clustering with Semantic Soft Constraints

An efficient image annotation and retrieval system is highly desired for the increase of amounts of image information. Clustering algorithms make it possible to represent images with finite symbols. Based on this, many statistical models, which analyze correspondence between visual features and words, have been published for image annotation. But most of these models cluster only using visual features, ignoring semantics of images. In this paper, we propose a novel model based on semi-supervised clustering with semantic soft constraints which can utilize both visual features and semantic meanings. Our method first measures the semantic distance with generic knowledge (e.g. WordNet) between regions of the training images with manual annotations. Then a semi-supervised clustering algorithm with semantic soft constraints is proposed to cluster regions with semantic soft constraints which are formed by semantic distance. The experiment results show that our model improves performance of image annotation and retrieval system.

Rui Xiaoguang, Yuan Pingbo, Yu Nenghai
Photo Retrieval from Personal Memories Using Generic Concepts

This paper presents techniques for retrieving photos from personal memories collections using generic concepts that the users specify. It is part of a larger project for capturing, storing, and retrieving personal memories in different contexts of use. Semantic concepts are obtained by training binary classifiers using the Regularized Least Squares Classifier (RLSC)and can be combined to express more complex concepts. The results that were obtained so far are quite good and by adding more low level features, better results are possible. The paper describes the proposed approach, the classifier and features, and the results that were obtained.

Rui M. Jesus, Arnaldo J. Abrantes, Nuno Correia
PanoWalk: A Remote Image-Based Rendering System for Mobile Devices

Real-time rendering of complex 3D scene on mobile devices is a challenging task. The main reason is that mobile devices have limited computational capabilities and are lack of powerful 3D graphics hardware support. In this paper, we propose a remote Image-Based Rendering system for mobile devices to interactively visualize real world and synthetic scenes under wireless network. Our system uses panoramic video as building block of representing scene data. The scene data is compressed with one MPEG like encoding scheme tailored for mobile device. The compressed data is stored on remote server. Our system carefully partitions the rendering task between client and server. The server is responsible for determining the required data for rendering novel views. It streams the required data to client in server pushing manner. After receiving data, mobile client carries out rendering locally using image warping and displays the resultant images onto its small screen. Experimental results show that our system can achieve real time rendering speed on mainstream mobile devices. It allows multiple mobile clients to explore the same or different scenes simultaneously.

Zhongding Jiang, Yandong Mao, Qi Jia, Nan Jiang, Junyi Tao, Xiaochun Fang, Hujun Bao
A High Quality Robust Watermarking Scheme

In recent years, digital watermarking has become a popular technique for hiding information in digital images to help protect against copyright infringement. In this paper we develop a high quality and robust watermarking algorithm that combines the advantages of block-based permutation with that of neighboring coefficient embedding. The proposed approach uses the relationship between the coefficients of neighboring blocks to hide more information into high frequency blocks without causing serious distortion to the watermarked image. In addition, an extraction method for improving robustness to mid-frequency filter attacks is proposed. Our experimental results show that the proposed approach is very effective in achieving perceptual invisibility with an increase in the peak signal to noise ratio (PSNR). Moreover, the proposed approach is robust to a variety of signal processing operations, such as compression (JPEG), image cropping, sharpening, blurring, and brightness adjustments. In those experimentation, the robustness is especially evident under the attack of blurring.

Yu-Ting Pai, Shanq-Jang Ruan, Jürgen Götze
An Association Rule Mining Approach for Satellite Cloud Images and Rainfall

This paper aims at discovering useful knowledge from a large collection of satellite cloud images and rainfall data using image mining. The paper illustrates how important the data conversion is in building an accurate data mining architecture. Most of data about image features and rainfall data are values or vectors, which are not fit for mining directly. We present two approaches to implement the conversion of data: a clustering algorithm and a fuzzy clustering method (FCM). The clustering algorithm is used to map the numerical value to categorical value. The FCM implements the conversion of feature vector. Finally, the association rules are determined using the Apriori algorithm. The experiment results show that the acquired association rules are consistent with the fact and the results are satisfying.

Xu Lai, Guo-hui Li, Ya-li Gan, Ze-gang Ye
AVAS: An Audio-Visual Attendance System

Biometric identification technology is being applied to physical and information access control in some workplace with the improvements in the accuracy of biometric devices and declining price. This paper describes a multimodal biometric identification system for time and attendance application called AVAS (Audio-Visual Attendance System). This system takes users’ voice and face characteristics as their badge. The motivation behind using multimodal biometrics is to improve availability and accuracy of the system. The score differences between the genuine speaker class and the mistaken identified speaker class labeled by each classifier are taken into account, and Score Difference Weighted Sum rule (SDWS) is introduced to fuse the individual expert. We describe the functions of the AVAS in detail from three aspects, the interaction with users, the authentication implementation and the data management. The practical tests conducted on staff working environment gain distinct improvement about 9.8% with the proposed system.

Dongdong Li, Yingchun Yang, Zhenyu Shan, Gang Pan, Zhaohui Wu
Improved POCS-Based Deblocking Technique Using Wavelet Transform in Block Coded Image

This paper presents a improved POCS-based deblocking technique, based on the theory of the projection onto convex sets (POCS) to reduce the blocking artifacts in decoded images. We propose a new smoothness constraint set (SCS) and its projection operator in the wave-let transform (WT) domain to remove unnecessary high-frequency components caused by blocking artifacts. In order to eliminate the blocking artifacts component while preserving the original edge component, we also propose a significant coefficient decision method (SCDM)for fast and efficient performance. Experimental results show that the proposed method can not only achieve a significantly enhanced subjective quality but also increase the PSNR improvement in the reconstructed image.

Goo-Rak Kwon, Hyo-Kak Kim, Chun-Soo Park, Yoon Kim, Sung-Jea Ko
Sketch Case Based Spatial Topological Data Retrieval

A large proportion of the information can be regarded as spatial data which is spatial position related. For accessing spatial databases, different query specification techniques have been proposed. But traditional query methods are tedious and cannot realize fuzzy query. A content-based spatial data retrieval system is presented to afford users a sketch interface which has the ability to accept fuzzy retrieval. Firstly the retrieval algorithm builds the spatial topological vector by refining the 9-intersection model metrically. Then the independent topological relations are extracted by training ICA assisted fuzzy SVMs, which can remove redundancy among the binary relations and reduce the dimension in complex spatial scene. In query processing the

tftimes

idf

model is referenced, and the similarity is calculated by cosine distance function on the weight vectors of the query scene and each of spatial scenes in database. The experimental results show the recall factor and precision factor are improved compared with the query method without ICA and SVM.

Yuan Zhen-ming, Zhang Liang, Pan Hong
Providing Consistent Service for Structured P2P Streaming System

In decentralized but structured peer-to-peer (P2P) streaming system, when a node is overloading, the new incoming requests will be replicated to its neighboring nodes in the same session, and then the requesting nodes will receive the streams from these neighboring nodes. However, the replication of the requests might result in the service inconsistency due to no-zero replicated time. In general, there is a tradeoff between the system performance and the service consistency. In this paper, we focus on how to provide the service consistency for decentralized but structured P2P streaming system, under the precondition of no obvious degrading at the system performance. We propose a service update algorithm (SUA) which iteratively adjusts the actual read delay at these neighboring nodes, and thus converges to the desired misread probability. The analytic and simulated results show that the algorithm achieves a good tradeoff between the service consistency and the system performance.

Zhen Yang, Huadong Ma
Adaptive Search Range Scaling for B Pictures Coding

This paper presents a frame-level adaptive search range scaling strategy for B pictures coding for H.264/AVC from the hardware-oriented viewpoint. After studying the relationship between search range of P and B picture, a simple search range scaling strategy is proposed at first, which is efficient for normal or low motion video. After that, this strategy is extended to high motion video by using the information of intra prediction and motion vector of each P picture to restrict the search range of adjacent B pictures. This adaptive search range scaling strategy can not only reduce approximate 60% search area of B pictures, but also keep almost the same coding performance as the reference software.

Zhigang Yang, Wen Gao, Yan Liu, Debin Zhao
Video QoS Monitoring and Control Framework over Mobile and IP Networks

With the development of network technology, multimedia applications in various video forms are widely used in network services. In order to leverage video QoS, it becomes a pressing problem to monitor and control video QoS during network transmission of video. In this paper, we propose a monitoring and control framework for video QoS over IP and mobile network. Also, we develop a low computational complexity and more effective video quality assessment (VQA) method based on human visual system (HVS), Improved Human Visual Model (I-HVM), and propose Adaptive and Dynamic Sampling Strategy (ADSS) of video feature, to monitor video quality at both ends of our framework. The experimental results show that our framework can monitor well video QoS over IP and mobile network. Consequence, to leverage video QoS, dynamic control can be applied to transmission decision of video service according to the monitoring results of video QoS by our framework.

Bingjun Zhang, Lifeng Sun, Xiaoyu Cheng
Extracting Moving / Static Objects of Interest in Video

Extracting objects of interest in video is a challenging task that can improve the performance of video compression and retrieval. Usually moving objects in video were considered as objects of interest, so there were many researches to extract them. However, we know that some non-moving (static) objects also can be objects of interest. A segmentation method is proposed in this paper, which extracts static objects as well as moving objects that are likely to attract human’s interest. An object of interest is defined as the relatively large region that appears frequently over several frames and is not located near boundaries of the frames. A static object of interest should also have significant color and texture characteristics against its surround. We found that the objects of interest extracted by the proposed method were well matched with the objects of interest selected manually.

Sojung Park, Minhwan Kim
Building a Personalized Music Emotion Prediction System

With the development of multimedia technology, research on music is getting more and more popular. Nowadays researchers focus on studying the relationship between music and listeners’ emotions but they didn’t consider users’ differences. Therefore, we propose a Personalized Music Emotion Prediction (P-MEP) System to assist predicting listeners’ music emotion concerning with users’ differences. To analyze listeners’ emotional response to music, the P-MEP rules will be generated in the analysis procedure consisting of 5 phases. During the application procedure, the P-MEP System predicts the new listener’s emotional response to music. The result of the experiment shows that the generated P-MEP rules can be used to predict emotional response to music concerning with listeners’ differences.

Chan-Chang Yeh, Shian-Shyong Tseng, Pei-Chin Tsai, Jui-Feng Weng
Video Segmentation Using Joint Space-Time-Range Adaptive Mean Shift

Video segmentation has drawn increasing interest in multimedia applications. This paper proposes a novel joint space-time-range domain adaptive mean shift filter for video segmentation. In the proposed method, segmentation of moving/static objects/background is obtained through inter-frame mode-matching in consecutive frames and motion vector mode estimation. Newly appearing objects/regions in the current frame due to new foreground objects or uncovered background regions are segmented by intra-frame mode estimation. Simulations have been conducted to several image sequences, and results have shown the effectiveness and robustness of the proposed method. Further study is continued to evaluate the results.

Irene Y. H. Gu, Vasile Gui, Zhifei Xu
EagleRank: A Novel Ranking Model for Web Image Search Engine

The explosive growth of World Wide Web has already made it the biggest image repository. Despite some image search engines provide con-venient access to web images, they frequently yield unwanted results. Locating needed and relevant images remains a challenging task. This paper proposes a novel ranking model named EagleRank for web image search engine. In EagleRank, multiple sources of evidence related to the images are considered, including image surrounding text passages, terms in special HTML tags, website types of the images, the hyper-textual structure of the web pages and even the user feedbacks. Meanwhile, the flexibility of EagleRank allows it to combine other potential factors as well. Based on inference network model, EagleRank also gives sufficient support to Boolean AND and OR operators. Our experimental results indicate that EagleRank has better performance than traditional approaches considering only the text from web pages.

Kangmiao Liu, Wei Chen, Chun Chen, Jiajun Bu, Can Wang, Peng Huang
Color Image Enhancement Using the Laplacian Pyramid

we present a color image enhancement method. The proposed method enhances the brightness and contrast of an input image using the low pass and band pass images in Laplacian pyramid, respectively. For color images, our method enhances the color tone by increasing the saturation adpatively according to the intensity of an input image. The major parameters required in our method are automatically set by the human preference data, therefore, the proposed method runs fully automatically without user interaction. Moreover, due to the simplicity and efficiency of the proposed method, a real time implementation and the enhanced results of the image quality was validated through the experiments on various images and video sequences.

Yeul-Min Baek, Hyoung-Joon Kim, Jin-Aeon Lee, Sang-Guen Oh, Whoi-Yul Kim
3D Mesh Construction from Depth Images with Occlusion

The realistic broadcasting is a broadcasting service system using multi-modal immersive media to provide clients with realism that includes such things as photorealistic and 3D display, 3D sound, multi-view interaction and haptic interactions. In such a system, a client is able to see stereoscopic views, to hear stereo sound, and even to touch both the real actor and virtual objects using haptic devices. This paper presents a 3D mesh modeling considering self-occlusion from 2.5D depth video to provide broadcasting applications with multi-modal interactions. Depth video of a real object is generally captured by using a depth video camera from a single point of view such that it often includes self-occluded images. This paper presents a series of techniques that can construct a smooth and compact mesh model of an actor that contains self-occluded regions. Although our methods work only for an actor with a simple posture, it can be successfully applied to a studio environment where the body movement of the actor is relatively limited.

Jeung-Chul Park, Seung-Man Kim, Kwan-Heng Lee
An Eigenbackground Subtraction Method Using Recursive Error Compensation

Eigenbackground subtraction is a commonly used method for moving object detection. The method uses the difference between an input image and the reconstructed background image for detecting foreground objects based on eigenvalue decomposition. In the method, foreground regions are represented in the reconstructed image using eigenbackground in the sense of least mean squared error minimisation. This results in errors that are spread over the entire reconstructed reference image. This will also result in degradation of quality of reconstructed background leading to inaccurate moving object detection. In order to compensate these regions, an efficient method is proposed by using recursive error compensation and an adaptively computed threshold. Experiments were conducted on a range of image sequences with variety of complexity. Performance were evaluated both qualitatively and quantitatively. Comparisons made with two existing methods have shown better approximations of the background images and more accurate detection of foreground objects have been achieved by the proposed method.

Zhifei Xu, Pengfei Shi, Irene Yu-Hua Gu
Attention Information Based Spatial Adaptation Framework for Browsing Videos Via Mobile Devices

The limited display size of the mobile devices has been imposing significant barriers for mobile device users to enjoy browsing high-resolution videos. In this paper, we present a novel video adaptation scheme based on attention area detection for users to enrich browsing experience on mobile devices. During video compression, the attention information which refers to as attention objects in frames will be detected and embedded into bitstreams using the supplement enhanced information (SEI) tool. In this research, we design a special SEI structure for embedding the attention information. Furthermore, we also develop a scheme to adjust adaptive quantization parameters in order to improve the quality on encoding the attention areas. When the high-resolution bitstream is transmitted to mobile users, a fast transcoding algorithm we developed earlier will be applied to generate a new bitstream for attention areas in frames. The new low-resolution bitstream containing mostly attention information, instead of the high-resolution one, will be sent to users for display on the mobile devices. Experimental results show that the proposed spatial adaptation scheme is able to improve both subjective and objective video qualities.

Yi Wang, Houqiang Li, Zhengkai Liu, Chang Wen Chen
Style Strokes Extraction Based on Color and Shape Information

Taking Dunhuang MoGao Frescoes as research background, a new algorithm to extract style strokes from fresco images is proposed. All the pixels in a fresco image are classified into either the stroke objects or the non-stroke objects, and the strokes extraction problem is defined as the process of selecting pixels that forms stroke objects from a given image. The algorithm first detects most likely ROIs (Region-Of-Interest) from the image using stroke color and shape information, and produces a stroke color similarity map and a stroke shape constraint map. Then these two maps are fused to extract style strokes. Experimental results have demonstrated its validity in extracting style strokes under certain variations. This research has the potential to provide a computer aided tool for artists and restorers to imitate and restore time-honored paintings.

Jianming Liu, Dongming Lu, Xiqun Lu, Xifan Shi
Requantization Transcoding of H.264/AVC Bitstreams for Intra 4×4 Prediction Modes

Efficient bitrate reduction of video content is necessary in order to satisfy the different constraints imposed by decoding devices and transmission networks. Requantization is a fast technique for bitrate reduction, and has been successfully applied for MPEG-2 bitstreams. Because of the newly introduced intra prediction in H.264/AVC, the existing techniques are rendered useless. In this paper we examine requantization transcoding of H.264/AVC bitstreams, focusing on the intra 4×4 prediction modes. Two architectures are proposed, one in the pixel domain and the other in the frequency domain, that compensate the drift introduced by the requantization of intra 4×4 predicted blocks. Experimental results show that these architectures perform approximately equally well as the full decode and recode architecture for low to medium bitrates. Because of the reduced computational complexity of these architectures, in particular the frequency-domain compensation architecture, they are highly suitable for real-time adaptation of video content.

Stijn Notebaert, Jan De Cock, Koen De Wolf, Rik Van de Walle
Prediction Algorithms in Large Scale VOD Services on Grid Infrastructure

VOD (Video on Demand) is one of significant services for next generation networks. Commonly large scale VOD services mean local networks to provide VOD services to communities about 500 to 1000 users accessing simultaneously. VOD services on grid infrastructure make resources sharing and management easy, which leads substantial cooperation among systems distributed in many places. This paper presents prediction algorithms trying to reduce the cost of external communications among large VOD nodes in a grid community. Basic algorithms can reduce overall costs about 30trained ANN can provide extra 10% performance.

Bo Li, Depei Qian
A Hierarchical Framework for Fast Macroblock Prediction Mode Decision in H.264

Many intra and inter prediction modes for macroblock are supported in the latest video compression standard H.264. Using the powerful Lagrangian minimization tool such as rate-distortion optimization, the mode with the optimal rate-distortion performance is determined. This achieves highest possible coding efficiency, but total calculation of cost for all candidate modes results in much higher computational complexity. In this paper, we propose a hierarchical framework for fast macroblock prediction mode decision in H.264 encoders. It is based on hierarchical mode classification method which assists fast mode decision by pre-selecting the class for macroblock using the extracted spatial and temporal features of macroblock. Since tests for many modes of non-selected classes will be skipped, much computation of rate-distortion optimization can be saved. Experimental results show that the proposed method can reduce the execution time of mode decision by 85% on the average without perceivable loss in coding rate and quality.

Cheng-dong Shen, Si-kun Li
Compact Representation for Large-Scale Clustering and Similarity Search

Although content-based image retrieval has been researched for many years, few content-based methods are implemented in present image search engines. This is partly bacause of the great difficulty in indexing and searching in high-dimensional feature space for large-scale image datasets. In this paper, we propose a novel method to represent the content of each image as one or multiple hash codes, which can be considered as special keywords. Based on this compact representation, images can be accessed very quickly by their visual content. Furthermore, two advanced functionalities are implemented. One is content-based image clustering, which is simplified as grouping images with identical or near identical hash codes. The other is content-based similarity search, which is approximated by finding images with similar hash codes. The hash code extraction process is very simple, and both image clustering and similarity search can be performed in real time. Experiments on over 11 million images collected from the web demonstrate the efficiency and effectiveness of the proposed method.

Bin Wang, Yuanhao Chen, Zhiwei Li, Mingjing Li
Robust Recognition of Noisy and Partially Occluded Faces Using Iteratively Reweighted Fitting of Eigenfaces

Robust recognition of noisy and partially occluded faces is essential for an automated face recognition system, but most appearance-based methods (e.g., Eigenfaces) are sensitive to these factors. In this paper, we propose to address this problem using an iteratively reweighted fitting of the Eigenfaces method (IRF-Eigenfaces). Unlike Eigenfaces fitting, in which a simple linear projection operation is used to extract the feature vector, the IRF-Eigenfaces method first defines a generalized objective function and then uses the iteratively reweighted least-squares (IRLS) fitting algorithm to extract the feature vector by minimizing the generalized objective function. Our simulated and experimental results on the AR database show that IRF-Eigenfaces is far superior to both Eigenfaces and to the local probabilistic method in recognizing noisy and partially occluded faces.

Wangmeng Zuo, Kuanquan Wang, David Zhang
Pitching Shot Detection Based on Multiple Feature Analysis and Fuzzy Classification

Pitching-shot is known to be a root-shot for subsequent baseball video content analysis, e.g., event or highlight detection, and video structure parsing. In this paper, we integrate multiple feature analysis and fuzzy classification techniques to achieve pitching-shot detection in commercial baseball video. The adopted features include color (e.g., field color percentage and dominant color), temporal motion, and spatial activity distribution. On the other hand, domain knowledge of the baseball game forms the basis for fuzzy inference rules. Experiment results show that our detection rate is capable of achieving 95.76%.

Wen-Nung Lie, Guo-Shiang Lin, Sheng-Lung Cheng
Color Changing and Fading Simulation for Frescoes Based on Empirical Knowledge from Artists

Visualizing the color changing and fading process of ancient Chinese wall paintings to tourists and researchers is of great value in both education and preservation. But previously, because empirical knowledge from artists was not introduced, it is infeasible to simulate the color changing and fading in the absence of physical and chemical knowledge of color changing and fading of frescoes. In this paper, however, empirical knowledge from artists is formalized. Since the improved system can reflect knowledge from artists in addition to the previous physical and chemical modeling, the simulation results are faithful to the color aging process of frescoes. In this article, first, the former related works is reviewed. Then, the formalization of empirical knowledge from artists is narrated. After that, the simulation results are shown and discussed. And finally, future research is suggested.

Xifan Shi, Dongming Lu, Jianming Liu
A Novel Spatial-Temporal Position Prediction Motion-Compensated Interpolation for Frame Rate Up-Conversion

In this paper, a novel spatial-temporal position prediction motion-compensated interpolation method (MCI) for frame rate up-conversion is proposed using the transmitted Motion Vectors (MVs). Based on our previous proposed GMPP algorithm, the new method uses the motion vectors correction (MVC) first. Then joint spatial-temporal position prediction algorithm is applied on the transmitted MVs to predict more accurately the positions the interpolated blocks really move to, which makes the MVs used for interpolation more nearer to the true motion. Then the weighted-adaptive spatial-temporal MCI algorithm is used to complete the final interpolation. Applied to the H.264 decoder, the new proposed method can achieve significant increase on PSNR and obvious decrease of the block artifacts, which can be widely used in video streaming and distributed video coding applications.

Jianning Zhang, Lifeng Sun, Yuzhuo Zhong
Web Image Clustering with Reduced Keywords and Weighted Bipartite Spectral Graph Partitioning

There has been recent work done in the area of search result organization for image retrieval. The main aim is to cluster the search results into semantically meaningful groups. A number of works benefited from the use of the bipartite spectral graph partitioning method [3][4]. However, the previous works mentioned use a set of keywords for each corresponding image. This will cause the bipartite spectral graph to have a high number of vertices and thus high in complexity. There is also a lack of understanding of the weights used in this method. In this paper we propose a two level reduced keywords approach for the bipartite spectral graph to reduce the complexity of bipartite spectral graph. We also propose weights for the bipartite spectral graph by using hierarchical term frequency-inverse document frequency (

tf-idf

). Experimental data show that this weighted bipartite spectral graph performs better than the bipartite spectral graph with a unity weight. We further exploit the

tf-idf

weights in merging the clusters.

Su Ming Koh, Liang-Tien Chia
An Architecture to Connect Disjoint Multimedia Networks Based on Node’s Capacity

TCP/IP protocol suite allows building multimedia networks of nodes according to nodes’ content sharing. Some of them have different types of protocols (some examples given in unstructured P2P file-sharing networks are Gnutella 2, FastTrack, OpenNap, eDonkey and so on). This paper proposes a new protocol to connect disjoint multimedia networks using the same resource or content sharing to allow multimedia content distribution. We show how nodes connect with nodes from other multimedia networks based on nodes’ capacity. The system is scalable and fault-tolerant. The designed protocol, its mathematical model, the messages developed and their bandwidth cost are described. The architecture has been developed to be applied in multiple types of multimedia networks (P2P file-sharing, CDNs and so on). We have developed a general-purpose application tool with all designed features. Results show the number of octets, the number of messages and the number of broadcasts sent through the network when the protocol is running.

Jaime Lloret, Juan R. Diaz, Jose M. Jimenez, Fernando Boronat
Quantitative Measure of Inlier Distributions and Contour Matching for Omnidirectional Camera Calibration

This paper presents a novel approach to both the calibration of the omnidirectional camera and the contour matching in architectural scenes. The proposed algorithm divides an entire image into several sub-regions, and then examines the number of the inliers in each sub-region and the area of each region. In our method, the standard deviations are used as quantitative measure to select a proper inlier set. Since the line segments of man-made objects are projected to contours in omnidirectional images, contour matching problem is important for more precise camera recovery. We propose a novel contour matching method using geometrical information of the omnidirectional camera.

Yongho Hwang, Hyunki Hong
High-Speed All-in-Focus Image Reconstruction by Merging Multiple Differently Focused Images

This paper deals with high-speed all-in-focus image reconstruction by merging multiple differently focused images. Previously, we proposed a method of generating an all-in-focus image from multi-focus imaging sequences based on spatial frequency analysis using three-dimensional FFT. In this paper, first, we combine the sequence into a two-dimensional image having fine quantization step size. Then, just by applying a certain convolution using two-dimensional FFT to the image, we realize high-speed reconstruction of all-in-focus images robustly. Some simulations utilizing synthetic images are shown and conditions achieving the good quality of reconstructed images are discussed. We also show experimental results of high-speed all-in-focus image reconstruction compared with those of the previous method by using real images.

Kazuya Kodama, Hiroshi Mo, Akira Kubota
A Real-Time Video Deinterlacing Scheme for MPEG-2 to AVS Transcoding

Real-time motion compensated (MC) deinterlacing is defined to be deinterlacing at the decoder in real-time at low cost using the transmitted motion vectors. Although the possibility of this was shown ten years ago, unfortunately few such studies have been reported so far. The major difficulty is that motion vectors derived from video decoders, which generally refer to average motion over several field periods instead of motion between adjacent fields, are far from perfect. In this paper, a real-time MC deinterlacing scheme is proposed for transcoding from MPEG-2 to AVS, which is the Audio Video coding Standard of China targeting at higher coding efficiency and lower complexity than existing standards for high definition video coding. Experimental results show that the presented scheme is more insensitive to incorrect motion vectors than conventional algorithms.

Qian Huang, Wen Gao, Debin Zhao, Cliff Reader
Persian Text Watermarking

Digital watermarking applies to variety of media including image, video, audio and text. Because of the nature of digital text, its watermarking methods are special. Moreover, these methods basically depend on the script used in the text. This paper reviews application of digital watermarking to Farsi (Persian) and similar scripts (like Arabic, Urdu and Pashto) which are substantially different from English and other western counterparts, especially in using connected alphabets. Focusing on the special characteristics of these scripts, application of common methods used for text watermarking is studied. By comparing the results, suitable methods which results in the highest payload will be presented.

Ali Asghar Khodami, Khashayar Yaghmaie
Three Dimensional Reconstruction of Structured Scenes Based on Vanishing Points

The paper is focused on the problem of 3D reconstruction of structured scenes from uncalibrated images based on vanishing points. Under the assumption of three-parameter-camera model, we prove that with a certain preselected world coordinate system, the camera projection matrix can be uniquely determined from three mutually orthogonal vanishing points that can be obtained from images. We also prove that global consistent projection matrices can be recovered if an additional set of correspondences across multiple images is present. Compared with previous stereovision techniques, the proposed method avoids the bottleneck problem of image matching and is easy to implement, thus more accurate and robust results are expected. Extensive experiments on synthetic and real images validate the effectiveness of the proposed method.

Guanghui Wang, Shewei Wang, Xiang Gao, Yubing Li
Parallel Processing for Reducing the Bottleneck in Realtime Graphics Rendering

The rendering process of graphics rendering pipeline is usually completed by both the CPU and the GPU, and a bottleneck can be located either in the CPU or the GPU. This paper focuses on reducing the bottleneck between the CPU and the GPU. We are proposing a method for improving the performance of parallel processing for realtime graphics rendering by separating the CPU operations into two parts: pure CPU operations and operations related to the GPU, and let them operate in parallel. This allows for maximizing the parallelism in processing the communication between the CPU and the GPU. Some experiments lead us to confirm that our method proposed in this paper can allow for faster graphics rendering. In addition to our method of using a dedicated thread for GPU related operations, we are also proposing an algorithm for balancing the graphics pipeline using the idle time due to the bottleneck. We have implemented the two methods proposed in this paper in our networked 3D game engine and verified that our methods are effective in real systems.

Mee Young Sung, Suk-Min Whang, Yonghee Yoo, Nam-Joong Kim, Jong-Seung Park, Wonik Choi
Distributed Data Visualization Tools for Multidisciplinary Design Optimization of Aero-crafts

A user oriented grid platform for multidisciplinary design optimization (MDO) of aero-crafts based on web service highly requires to construct a tool for visualization of the computing data and visual steering of the whole process for MDO. In this paper, a distributed data visualization tool for MDO of Aero-crafts is described. And the visual steering scheme for MDO is described in detail, which is constructed under web service environment and is performed as a series of web pages. Visualization Toolkit (VTK) and Java are adopted in visualization service to process the results of MDO of Geometry and the distributed computational data.

Chunsheng Liu, Tianxu Zhang
An Efficient Clustering and Indexing Approach over Large Video Sequences

In a video database, the similarity between video sequences is usually measured by the percentages of similar frames shared by both video sequences, where each frame is represented as a high-dimensional feature vector. The direct computation of the similarity measure involves time-consuming sequential scans over the whole dataset. On the other hand, adopting existing indexing technique to high-dimensional datasets suffers from the “Dimensionality Curse”. Thus, an efficient and effective indexing method is needed to reduce the computation cost for the similarity search. In this paper, we propose a Multi-level Hierarchical Divisive Dimensionality Reduction technique to discover correlated clusters, and develop a corresponding indexing structure to efficiently index the clusters in order to support efficient similarity search over video data. By using dimensionality reduction techniques as Principal Component Analysis, we can restore the critical information between the data points in the dataset using a reduced dimension space. Experiments show the efficiency and usefulness of this approach.

Yu Yang, Qing Li
An Initial Study on Progressive Filtering Based on Dynamic Programming for Query-by-Singing/Humming

This paper presents the concept of progressive filtering (PF) and its efficient design based on dynamic programming. The proposed PF is scalable for large music retrieval systems and is data-driven for performance optimization. Moreover, its concept and design are general in nature and can be applied to any multimedia retrieval systems. The application of the proposed PF to a 5-stage query-by-singing/humming (QBSH) system is reported, and the experimental results demonstrate the feasibility of the proposed approach.

Jyh-Shing Roger Jang, Hong-Ru Lee
Measuring Multi-modality Similarities Via Subspace Learning for Cross-Media Retrieval

Cross-media retrieval is an interesting research problem, which seeks to breakthrough the limitation of modality so that users can query multimedia objects by examples of different modalities. In order to cross-media retrieve, the problem of similarity measure between media objects with heterogeneous low-level features needs to be solved. This paper proposes a novel approach to learn both intra- and inter-media correlations among multi-modality feature spaces, and construct MLE semantic subspace containing multimedia objects of different modalities. Meanwhile, relevance feedback strategies are developed to enhance the efficiency of cross-media retrieval from both short- and long-term perspectives. Experiments show that the result of our approach is encouraging and the performance is effective.

Hong Zhang, Jianguang Weng
SNR-Based Bit Allocation in Video Quality Smoothing

Quality fluctuation has a major negative effect on perceptive video quality. Many recent video quality smoothing works target on constant distortion (i.e., constant PSNR) throughout the whole coded video sequence. In [1], a target distortion was set up for each frame based on a hypothesis that maintaining constant distortion over frames would boast video quality smoothing and extensive experiments showed the constant-distortion bit allocation (CDBA) scheme significantly outperforms the popular constant bit allocation (CBA) scheme and Xie et al’s recent work [2, 3] in terms of delivered video quality. But during the scene changes, it has been observed that the picture energy often dramatically changes. Maintaining constant PSNR would result in dramatically different SNR performance and translate into dramatically different perceptive effects. Although computationally more complex, SNR represents a more objective measure than PSNR in assessing video quality. In this paper, a single-pass frame-level constant-SNR bit allocation scheme (CSNRBA) is developed for video quality smoothing throughout the video sequence. To achieve constant-SNR, a power series weighted actual SNR average of previous coded frames is adopted as the target SNR for the current frame. From the target SNR, the target distortion for the current frame is calculated. Then according to the analytic close-form

D-Q

model and the linear rate control algorithm, the bit budget for the current frame can be estimated. Experimental results show that the proposed CSNRBA scheme provides much smoother video quality and achieve much better subjective video quality in terms of natural color, sharp objects and silhouette significantly on all testing video sequences than both CBA and CDBA schemes.

Xiangui Kang, Junqiang Lan, Li Liu, Xinhua Zhuang
Shadow Removal in Sole Outdoor Image

A method of shadow removal from sole uncalibrated outdoor image is proposed. Existing approaches usually decompose the image into albedo and illumination images, in this paper, based on the mechanism of shadow generation, the occlusion factor is introduced, and the illumination image is further decomposed as the linear combination of solar irradiance and ambient irradiance images. The involved irradiance are achieved from the user-supplied hints. The shadow matte are evaluated by the anisotropic diffusion of posterior probability. Experiments show that our method could simultaneously extract the detailed shadow matte and recover the texture beneath the shadow.

Zhenlong Du, Xueying Qin, Wei Hua, Hujun Bao
3D Head Model Classification Using KCDA

In this paper, the 3D head model classification problem is addressed by use of a newly developed subspace analysis method: kernel clustering-based discriminant analysis or KCDA as an abbreviation. This method works by first mapping the original data into another high-dimensional space, and then performing clustering-based discriminant analysis in the feature space. The main idea of clustering-based discriminant analysis is to overcome the Gaussian assumption limitation of the traditional linear discriminant analysis by using a new criterion that takes into account the multiple cluster structure possibly embedded within some classes. As a result, Kernel CDA tries to get through the limitations of both Gaussian assumption and linearity facing the traditional linear discriminant analysis simultaneously. A novel application of this method in 3D head model classification is presented in this paper. A group of tests of our method on 3D head model dataset have been carried out, reporting very promising experimental results.

Bo Ma, Hui-yang Qu, Hau-san Wong, Yao Lu
Framework for Pervasive Web Content Delivery

It is generally agreed that traditional transcoding involves complex computations, which may introduce substantial additional delay to content delivery. Inspired by new multimedia data formats, like JPEG 2000, a new adaptation called modulation is devised. Unlike transcoding, modulation is fast since it basically generates an object’s representation by selecting fragments of the object without decoding/encoding it. In this paper, a framework for pervasive Web content delivery is proposed to exploit the modulation’s benefits.

Henry N. Palit, Chi-Hung Chi, Lin Liu
Region-Based Semantic Similarity Propagation for Image Retrieval

In order to reduce the gap between low-level image features and high-level image semantics, various long term learning strategies were integrated into content-based image retrieval system. The strategies always use the semantic relationships among images to improve the effectiveness of the retrieval system. This paper proposes a semantic similarity propagation method to mine the hidden semantic relationships among images. The semantic relationships are propagated between the similar images and regions. Experimental results verify the improvement on similarity propagation and image retrieval.

Weiming Lu, Hong Pan, Jiangqin Wu
Backmatter
Metadaten
Titel
Advances in Multimedia Information Processing - PCM 2006
herausgegeben von
Yueting Zhuang
Shi-Qiang Yang
Yong Rui
Qinming He
Copyright-Jahr
2006
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-48769-2
Print ISBN
978-3-540-48766-1
DOI
https://doi.org/10.1007/11922162