Skip to main content
Top

2012 | Book

Advances in Image and Video Technology

5th Pacific Rim Symposium, PSIVT 2011, Gwangju, South Korea, November 20-23, 2011, Proceedings, Part II

insite
SEARCH

About this book

The two-volume proceedings LNCS 7087 + 7088 constitute the proceedings of the 5th Pacific Rim Symposium, PSIVT 2011, held in Gwangju, Korea, in November 2011. The total of 71 revised papers was carefully reviewed and selected from 168 submissions. The topics covered are: image/video coding and transmission; image/video processing and analysis; imaging and graphics hardware and visualization; image/video retrieval and scene understanding; biomedical image processing and analysis; biometrics and image forensics; and computer vision applications.

Table of Contents

Frontmatter
Lossless Image Coding Based on Inter-color Prediction for Ultra High Definition Image

This paper addresses the lossless image coding for ultra-high definition television system which supports 4K (4096×2160) resolution image with 22.2ch audio. Ultra High Definition Tele-vision system is being developed to satisfy end-user who has a longing for higher resolution, higher quality, and higher fidelity picture and sound. However, the major characteristic of Ultra High Definition Tele-vision system is considerably huge input information must be processed in real-time compare to conventional systems. Therefore high speed data handling for editing and playing, high speed signal interface between devices and real-time source codecs without delay for saving memory space are unavoidable requirements of ultra-high system. This paper focused on lossless image codec for the reason of real-time processing with reasonable coding gain and the proposed algorithms is pixel based algorithms called spatio-color prediction. The proposed algorithm uses inter-color correlation with spatial correlation appropriately and it shows 5.5% coding efficiency improvement compare to JPEG-LS. The simulation was performed using 4K resolution RGB 10bit images and it is claimed that the proposed lossless coding was verified under the developed Ultra High Definition Tele-vision system.

Jiho Park, Je Woo Kim, Jechang Jeong, Byeongho Choi
Multithreading Architecture for Real-Time MPEG-4 AVC/H.264 SVC Decoder

The inter-layer prediction (ILP) including intra, residual, and motion up-sampling operation in Scalable Video Coding (SVC) significantly increases the compression ratio compared to simulcast. The SVC Codec capable of processing the inter-layer prediction among multiple layers, however, requires much more memory and computational power than single-layer MPEG-4 AVC/H.264 Codec. This paper presents a fast and memory-efficient multithreading architecture for real-time MPEG-4 AVC/H.264 Scalable High profile decoder. Unlike existing approaches where multi-threaded video encoding and decoding have been performed within a frame or among frames, the designed algorithm utilizes inter-layer parallelism based on a group of macroblocks (GOM). Also, improved buffer management can be achieved by the proposed access unit (AU) based decoding architecture for enabling GOM-based inter-layer multithreading architecture. The proposed multithreading architecture has three properties: (1) scalable to the number of SVC layers, (2) no additional coding delay, and (3) no additional memory requirement. Experimental results show that the proposed multithreading architecture speeds up the decoding time of 3-layer extended spatial scalability sequences by about 36% on average, 3-layer coarse grain scalability 50%, and 5-layer medium grain scalability by about 102%, respectively, compared to a single-threaded SVC decoder.

Yong-Hwan Kim, Jiho Park, Je-Woo Kim
Fast Mode Decision Algorithm for Depth Coding in 3D Video Systems Using H.264/AVC

Complexity of multiview coding is proportional to the number of cameras. It makes difficult to implement multiview sequences in real applications. Thus, we propose a fast mode decision algorithm for both intra and inter prediction to reduce the computational complexity of H.264/AVC for depth video coding. By analyzing the depth variation, we classify the depth video into depth-continuity and depth-discontinuity regions. We determine a threshold value for classifying these regions by experiments. Since the depth-continuity region has an imbalance in the mode distribution, we limit the mode candidates. Experimental results show that our proposed algorithm reduces the encoding time by up to 78% and 84% for the intra and inter frames, respectively, with negligible PSNR loss and slight bit-rate increase, compared to JMVC 8.3.

Da-Hyun Yoon, Yo-Sung Ho
Improved Diffusion Basis Functions Fitting and Metric Distance for Brain Axon Fiber Estimation

We present a new regularization approach for Diffusion Basis Functions fitting to estimate

in vivo

brain the axonal orientation from Diffusion Weighted Magnetic Resonance Images. That method assumes that the observed Magnetic Resonance signal at each voxel is a linear combination of a given diffusion basis functions; the aim of the approach is the estimation of the coefficients of the linear combination. An issue with the Diffusion Basis Functions method is the overestimation on the number of tensors (associated with different axon fibers) within a voxel due to noise, namely, the over fitting of the noisy signal. Our proposal overcomes such an overestimation problem. In additionally, we propose a metric to compare the performance of multi-fiber estimation algorithms. The metric is based on the Earth Mover’s Distance and allows us to compare in a single metric the orientation, size compartment and the number of axon bundles between two different estimations. The improvements of our two proposals is shown on synthetic and real experiments.

Ramón Aranda, Mariano Rivera, Alonso Ramírez-Manzanares
An Adaptive Motion Data Storage Reduction Method for Temporal Predictor

In the state-of-art video coding standard HEVC, temporal motion vector (MV) predictor is adopted in order to improve coding efficiency. However, motion vector information in reference frames, which is used by temporal MV predictor, takes significant amount of bits in memory storage. Therefore motion data needs to be compressed before storing into buffer. In this paper we propose an adaptive motion data storage reduction method. First, it divides the current 16x16 block in the reference frame into four partitions. One MV is sampled from each partition and all sampled MVs form a MV candidate set. Then it judges if one or two MVs should be stored into the MV buffer by checking the maximum distance between any two of the MVs in the candidate set. If the maximum distance is greater than a certain threshold, the motion data of the two MVs that have maximum distance are put into memory; otherwise the motion data of the upper left block is stored. The basic goal of the proposed method is to improve the accuracy of temporal MV predictor at the same time reducing motion data memory size. Simulation results show that compared to the original HEVC MV memory compression method in the 4

th

JCT-VC meeting, the proposed scheme achieves a coding gain of 0.5%~0.6%; and the memory size is reduced by more than 87.5% comparing to without using motion data compression.

Ruobing Zou, Oscar C. Au, Lin Sun, Sijin Li, Wei Dai
A Local Variance-Based Bilateral Filtering for Artifact-Free Detail- and Edge-Preserving Smoothing

Edge-preserving smoothing has recently emerged as a crucial technique for a variety of computer vision and image processing applications. The idea is to smooth the small scale variations, while preserving edges and fine details in the image, without causing halo artifacts in detail enhancement. In this paper, we propose a modified bilateral filter model that has better behavior near edges and details than the standard model does. The edge-stopping function takes into account the intensity difference and the local variance of the filter windows, which provides insightful information about the local pixel distribution. We demonstrate the existing detail-related bilateral-based applications can achieve better results by simply switching from the standard model to our proposed model. In particular, we applied our method to detail-preserving image denoising and detail enhancement.

Cuong Cao Pham, Synh Viet Uyen Ha, Jae Wook Jeon
Iterative Gradient-Driven Patch-Based Inpainting

A novel exemplar-based image inpainting is proposed in this paper. This method is based on iterative approach which provides better result than greedy one. The problem of inconsistent results caused by raster scanning on target patch selection in iterative approach is focused in this paper. The proposed gradient-driven ordering is used to select target patch instead of traditionally predefined ordering. Due to the information-driven nature, this new approach is image’s rotation invariant which means the same result is provided by different rotation of the same damaged image. Moreover, a random search approach is redesigned to be more reasonable and suitable for our novel gradient-driven ordering. The proposed method provides the best inpainting result among several well-known exemplar-based inpainting techniques including both greedy and iterative approach.

Sarawut Tae-o-sot, Akinori Nishihara
Feature Extraction Based on Co-occurrence of Adjacent Local Binary Patterns

In this paper, we propose a new image feature based on spatial co-occurrence among micropatterns, where each micropattern is represented by a Local Binary Pattern (LBP). In conventional LBP-based features such as LBP histograms, all the LBPs of micropatterns in the image are packed into a single histogram. Doing so discards important information concerning spatial relations among the LBPs, even though they may contain information about the image’s global structure. To consider such spatial relations, we measure their co-occurrence among multiple LBPs. The proposed feature is robust against variations in illumination, a feature inherited from the original LBP, and simultaneously retains more detail of image. The significant advantage of the proposed method versus conventional LBP-based features is demonstrated through experimental results of face and texture recognition using public databases.

Ryusuke Nosaka, Yasuhiro Ohkawa, Kazuhiro Fukui
Natural Image Composition with Inhomogeneous Boundaries

Image composition usually floods the composition region of a target image with the same shape as a source image patch. To achieve seamless transition effect, the tone of the boundary in the target image is then transferred to the interior region of the source patch. Traditional approaches usually fail to work for the case that the corresponding boundaries of target and source images don’t match well because the tone transformation of all pixels on the boundary are equally propagated to the inner region. This paper presents a new image composition technique based on discrete mean value coordinates(DMVC), which supports the transition of tone transformation of part selected not all pixels on the boundary to the inner region. The approach works as follows. It firstly selects boundary pixels having good matching. The new color of inner pixels is then calculated using DMVC according to those selected pixel pairs from the source and target boundaries. Matting technique is finally introduced to compose the new pixels to the target image. Experiments show that the proposed approach can obtain reasonable results for examples with inconsistent boundaries between source and target images.

Dong Wang, Weijia Jia, Guiqing Li, Yunhui Xiong
Directional Eigentemplate Learning for Sparse Template Tracker

Automatic eigentemplate learning is discussed for a sparse template tracker. Using an eigentemplate learned from multiple sequences, a sparse template tracker can efficiently track a target that changes appearance. The present paper provides a feasible solution for eigentemplate learning when multiple image sequences are available. Two types of eigentemplates are compared in the present paper, namely, a single eigentemplate, and a set of directional eigentemplates. The single eigentemplate simply consists of all images learned from multiple sequences.On the other hand, directional eigentemplates are obtained by decomposing the single eigentemplate into three directions of the face poses. The sparse template tracker is also expanded to directional eigentemplates.Finally, the effectiveness of the provided solution is demonstrated in the learning and tracking experiments. The experimental results indicate that directional learning works well with small seed data,and that the directional eigentracker works better than the single eigentracker.

Hiroyuki Seto, Tomoyuki Taguchi, Takeshi Shakunaga
Gender Identification Using Feature Patch-Based Bayesian Classifier

In the paper, we propose a Bayesian classifier which exploits non-parametric model to identify the gender from the facial images. Our major contribution is that we use feature patch-based non-parametric method to generate the posteriori of male and female based on the characteristics of the labeled training image patches. Our system consists of four modules. First, we use AAM model to identify facial feature points. Facial images are represented by the overlapping feature patches around the feature points. Second, from the labeled training patches, we select a smaller subset as the patch library based on the K means clustering. Third, in training, we embed the gender characteristics of the training feature patches as the posteriori of the library patches. Fourth, in testing, we integrate the posterior of the test patches to determine the gender. The experimental results demonstrate that our proposed method is better than the conventional non-feature-patch-based methods.

Shen-Ju Lin, Chung-Lin Huang, Shih-Chung Hsu
Multiple Objects Tracking across Multiple Non-Overlapped Views

This paper introduces a tracking algorithm to track the multiple objects across multiple non-overlapped views. First, we track every single object in each single view and record its activity as the object-based video fragments (OVFs). By linking the related OVFs across different cameras, we may connect two OVFs across two non-overlapped views. Because of scene illumination change, blind region lingering, and objects similar appearance, we may have the problem of path misconnection and fragmentation. This paper develops the Error Path Detection Function (EPDF) and uses the augmented feature (AF) to solve those two problems.

Ke-Yin Chen, Chung-Lin Huang, Shih-Chung Hsu, I-Cheng Chang
Fast Hypercomplex Polar Fourier Analysis for Image Processing

Hypercomplex polar Fourier analysis treats a signal as a vector field and generalizes the conventional polar Fourier analysis. It can handle signals represented by hypercomplex numbers such as color images. It is reversible that can reconstruct image. Its coefficient has rotation invariance property that can be used for feature extraction. With these properties, it can be used for image processing applications like image representation and image understanding. However in order to increase the computation speed, fast algorithm is needed especially for image processing applications like realtime systems and limited resource platforms. This paper presents fast hypercomplex polar Fourier analysis that based on symmetric properties and mathematical properties of trigonometric functions. Proposed fast hypercomplex polar Fourier analysis computes symmetric eight points simultaneously that significantly reduce the computation time.

Zhuo Yang, Sei-ichiro Kamata
Colorization by Landmark Pixels Extraction

A one-dimensional luminance scalar is replaced by a vector of a colorful multi-dimension for every pixel of a monochrome image, it is called as colorization. Obviously, it is under-constrained. Some prior knowledge is considered to be given to the monochrome image. Colorization using optimization algorithm is an effective algorithm for the above problem. Scribbles are considered as the prior knowledge. However, it cannot effectively do with complex images without repeating experiments for confirming the place of scribbles. Therefore, in our paper, landmark pixels are considered as the prior knowledge. We propose an algorithm which is colorization by landmark pixels extraction. It need not repeat experiments and automatically generates landmark pixels like scribbles. Finally, colorize the monochrome image according to requirements of user.

Weiwei Du, Shiya Mori, Nobuyuki Nakamori
Filtering-Based Noise Estimation for Denoising the Image Degraded by Gaussian Noise

In this paper, a denoising algorithm for the Gaussian noise image using filtering-based estimation is presented. To adaptively deal with variety of the amount of noise corruption, the algorithm initially estimates the noise density from the degraded image. The standard deviation of the noise is computed from the different images between the noisy input and its’ pre-filtered version. In addition, the modified Gaussian noise removal filter based on the local statistics such as local weighted mean, local weighted activity and local maximum is flexibly used to control the degree of noise suppression. Experimental results show the superior performance of the proposed filter algorithm compared to the other standard algorithms in terms of both subjective and objective evaluations.

Tuan-Anh Nguyen, Min-Cheol Hong
Combining Mendonça-Cipolla Self-calibration and Scene Constraints

In this paper, we propose a method that combines plane parallelism and the Mendonça/Cipolla self-calibration constraints. In our method each pair of images is treated independently and can therefore use a different pair of parallel planes not necessarily visible in the other views. While, for each pair of images, constraints on the singular values of the essential matrix provide two algebraic constraints on the intrinsic parameters, those we derive from plane parallelism have the advantage of providing two additional ones making the calibration of a no-skew camera possible from two images only.

Adlane Habed, Tarik Elamsy, Boubakeur Boufama
A Key Derivation Scheme for Hierarchical Access Control to JPEG 2000 Coded Images

This paper proposes a key derivation scheme to control access of JPEG 2000 (JP2) coded images, which consist of hierarchical scalability such as SNR, resolution levels, and so on. The proposed scheme simultaneously controls access to each level of scalability. The proposed scheme derives keys through hash chains, and each JP2 packet is enciphered with each individual key. By introducing combinations of a cyclic shift and a hash function, the proposed scheme manages only a single key for a JP2 image; whereas the conventional access control schemes having the above mentioned features manage multiple keys. The single managed key is not delivered to any user. The proposed scheme is also resilient to collusion attacks. Performance analysis shows the effectiveness of the proposed scheme.

Shoko Imaizumi, Masaaki Fujiyoshi, Hitoshi Kiya, Naokazu Aoki, Hiroyuki Kobayashi
Bifocal Matching Using Multiple Geometrical Solutions

Determining point-to-point correspondence in multiple images is a complex problem because of the multiple geometric and photometric transformations and/or occlusions that the same point can undergo in corresponding images. This paper presents a method of point-to-point correspondence analysis based on the combination of two techniques: (1) correspondence analysis through similarity of invariant features, and (2) combination of multiple partial solutions through bifocal geometry. This method is quite novel because it allows the determination of point-to-point geometric correspondence by means of the intersection of multiple partial solutions that are weighted through the MLESAC algorithm. The main advantage of our method is the extension of the algorithms based on the correspondence of invariant descriptors, generalizing the problem of correspondence to a geometric model in multiple views. In the sequences used we got an F-score = 97% at a distance of less than 1 pixel. These results show the effectiveness of the method and potentially can be used in a wide range of applications.

Miguel Carrasco, Domingo Mery
Digital Hologram Compression Using Correlation of Reconstructed Object Images

An efficient digital hologram compression algorithm is proposed using the correlation in the complex valued object image. While the pure values are almost uncorrelated, the magnitude values exhibit a strong correlation between the real and imaginary part object images. Therefore, we adaptively employ the encoding result of one image to encode another image. Both images are first wavelet transformed and the wavelet coefficients are encoded using the SPIHT method. We used the significance encoding result of the real part image as the contexts of arithmetic coder for encoding the imaginary part image. Experimental results demonstrate that the proposed algorithm yields a better compression performance than the conventional method.

Jae-Young Sim
Pedestrian Image Segmentation via Shape-Prior Constrained Random Walks

In this paper, we present an automatic and accurate pedestrian segmentation algorithm by incorporating pedestrian shape prior into the random walks segmentation algorithm. The random walks [1] algorithm requires user-specified labels to produce segmentation with each pixel assigned to a label, and it can provide satisfactory segmentation result with proper input labeled seeds. To take advantage of this interactive segmentation algorithm, we improve the random walks segmentation algorithm by incorporating prior shape information into the same optimization formulation. By using the human shape prior, we develop a fully automatic pedestrian image segmentation algorithm. Our experimental results demonstrate that the proposed algorithm significantly outperforms the previous segmentation methods in terms of pedestrian segmentation accuracy on a number of real images.

Ke-Chun Li, Hong-Ren Su, Shang-Hong Lai
A Novel Rate Control Algorithm for H.264/AVC Based on Human Visual System

To improve performance of rate control algorithm for H.264/AVC, and keep a better control accuracy of the output of the compressed video stream, a novel rate control algorithm based on human visual system (HVS) is proposed in this paper. The proposed rate control algorithm consists of two layers: frame level and basic unit (BU) level. In frame level, changed scene is first detected and frame difference ratio is utilized to represent the motion complexity of the frame, and then target bit for frame level are allocated by considering the two factors. In BU level, by analyzing motion information, texture characteristics and the location of the frames, visual sensitivity of a macroblock is first measured, and the bit is allocated for the macroblock based on the sensitivity factor. Experimental results show that the proposed method can provide an improved visual quality and higher PSNR while almost the same control accuracy, compared with traditional rate control method.

Jiangying Zhu, Mei Yu, Qiaoyan Zheng, Zongju Peng, Feng Shao, Fucui Li, Gangyi Jiang
Blind Image Deblurring with Modified Richardson-Lucy Deconvolution for Ringing Artifact Suppression

In this paper, we develop a unified image deblurring framework that consists of both blur kernel estimation and non-blind image deconvolution. For blind kernel estimation, we propose a patch selection procedure and integrate it with a coarse-to-fine kernel estimation algorithm to develop a robust blur kernel estimation algorithm. For the non-blind image deconvolution, we modify the traditional Richardson-Lucy (RL) image restoration algorithm to suppress the notorious ringing artifact in the regions around strong edges. Experimental results on some real blurred images are shown to demonstrate the improved efficiency and image restoration by using the proposed algorithm.

Hao-Liang Yang, Yen-Hao Chiao, Po-Hao Huang, Shang-Hong Lai
Quality Estimation for H.264/SVC Inter-layer Residual Prediction in Spatial Scalability

Scalable Video Coding (SVC) provides an efficient compression for the video bitstream equipped with various scalable configurations. H.264 scalable extension (H.264/SVC) is the most recent scalable coding standard. It involves the state-of-the-art inter-layer prediction to provide higher coding efficiency than previous standards. Moreover, the requirements for the video quality on distinct situations like link conditions or video contents are usually different. Therefore, it is very desirable to be able to construct a model so that the target quality can be estimated in advance. This work proposes a Quantization-Distortion (Q-D) model for H.264/SVC spatial scalability, and then we can estimate video quality before the actual encoding is performed. In particular, we further decompose the residual from the inter-layer residual prediction into the previous distortion and Prior-Residual so that the residual can be estimated. In simulations, based on the proposed model, we estimate the actual Q-D curves, and its average accuracy is 88.79%.

Ren-Jie Wang, Yan-Ting Jiang, Jiunn-Tsair Fang, Pao-Chi Chang
Extracting Interval Distribution of Human Interactions

Recently, activity support systems that enable dialogue with humans have been intensively studied owing to the development of various sensors and recognition technologies. In order to enable a smooth dialogue between a system and a human user, we need to clarify the rules of dialogue, including how utterances and motions are interpreted among human users. In conventional study on dialogue analysis, duration between the time when someone finishes an utterance and the time when another human starts the next utterance were analyzed. In a real dialogue between humans, however, there are sufficient intervals between an utterance and a visually observable motion such as bowing and establishing eye-contact; the facilitation of communication and cooperation seem to depend on these intervals. In our study, we analyze interactions that involve utterances and motions at a reception scenario by resolving motions into motion primitives (a basic unit of motion). We also analyze the timing of utterances and motions in order to structure dialogue behaviors. Our result suggest that a structural representation of interaction can be useful for improving the ability of activity support systems to interact and support human dialogue.

Ryohei Kimura, Noriko Takemura, Yoshio Iwai, Kosuke Sato
A Flexible Method for Localisation and Classification of Footprints of Small Species

In environmental surveillance, ecology experts use a standard tracking tunnel system to acquire tracks or footprints of small animals, so that they can measure the presence of any selected animals or detect threatened species based on the manual analysis of gathered tracks. Unfortunately, distinguishing morphologically similar species through analysing their footprints is extremely difficult, and even very experienced experts find it hard to provide reliable results on footprint identification. This expensive task also requires a great amount of efforts on observation. In recent years, image processing technology has become a model example for applying computer science technology to many other study areas or industries, in order to improve accuracy, productivity, and reliability. In this paper, we propose a method based on image processing technology, it firstly detects significant interest points from input tracking card images. Secondly, it filters irrelevant interest points in order to extract regions of interest. Thirdly, it gathers useful information of footprint geometric features, such as angles, areas, distance, and so on. These geometric features can be generally found in footprints of small species. Analysing the detected features statistically can certainly provide strong proof of footprint localization and classification results. We also present experimental results on extracted footprints by the proposed method. With appropriate developments or modifications, this method has great potential for applying automated identification to any species.

Haokun Geng, James Russell, Bok-Suk Shin, Radu Nicolescu, Reinhard Klette
Learning and Regularizing Motion Models for Enhancing Particle Filter-Based Target Tracking

This paper describes an original strategy for using a data-driven probabilistic motion model into particle filter-based target tracking on video streams. Such a model is based on the local motion observed by the camera during a learning phase. Given that the initial, empirical distribution may be incomplete and noisy, we regularize it in a second phase. The hybrid discrete-continuous probabilistic motion model learned this way is then used as a sampling distribution in a particle filter framework for target tracking. We present promising results for this approach in some common datasets used as benchmarks for visual surveillance tracking algorithms.

Francisco Madrigal, Mariano Rivera, Jean-Bernard Hayet
CT-MR Image Registration in 3D K-Space Based on Fourier Moment Matching

CT-MRI registration is a common processing procedure for clinical diagnosis and therapy. We propose a novel K-space affine image registration algorithm via Fourier moment matching. The proposed algorithm is based on estimating the affine matrix from the moment relationship between the corresponding Fourier spectrums. This estimation strategy is very robust because the energy of the Fourier spectrum is mostly concentrated in the low-frequency band, thus the moments of the Fourier spectrum are robust against noises and outliers. Our experiments on the real CT and MRI datasets show that the proposed Fourier-based registration algorithm provides higher registration accuracy than the existing mutual information registration technique.

Hong-Ren Su, Shang-Hong Lai
Sparse Temporal Representations for Facial Expression Recognition

In automatic facial expression recognition, an increasing number of techniques had been proposed for in the literature that exploits the temporal nature of facial expressions. As all facial expressions are known to evolve over time, it is crucially important for a classifier to be capable of modelling their dynamics. We establish that the method of sparse representation (SR) classifiers proves to be a suitable candidate for this purpose, and subsequently propose a framework for expression dynamics to be efficiently incorporated into its current formulation. We additionally show that for the SR method to be applied effectively, then a certain threshold on image dimensionality must be enforced (unlike in facial recognition problems). Thirdly, we determined that recognition rates may be significantly influenced by the size of the projection matrix Φ. To demonstrate these, a battery of experiments had been conducted on the CK+ dataset for the recognition of the seven prototypic expressions − anger, contempt, disgust, fear, happiness, sadness and surprise − and comparisons have been made between the proposed temporal-SR against the static-SR framework and state-of-the-art support vector machine.

S. W. Chew, R. Rana, P. Lucey, S. Lucey, S. Sridharan
Dynamic Compression of Curve-Based Point Cloud

With the increasing demands for highly detailed 3D data, dynamic scanning systems are capable of producing 3D+t (

a.k.a.

4D) spatio-temporal models with millions of points recently. As a consequence, effective 4D geometry compression schemes are required to face the need to store/transmit the huge amount of data, in addition to classical static 3D data. In this paper, we propose a 4D spatio-temporal point cloud encoder via a curve-based representation of the point cloud, particularly well-suited for dynamic structured-light-based scanning systems, wherein a grid pattern is projected onto the surface object. The object surface is then naturally sampled in a series of curves, due to the grid pattern. This motivates our choice to leverage a curve-based representation to remove the spatial and temporal correlation of the sampled point along the scanning directions through a competitive-based predictive encoder that includes different spatio-temporal prediction modes. Experimental results show the significant gain obtained with the proposed method.

Ismael Daribo, Ryo Furukawa, Ryusuke Sagawa, Hiroshi Kawasaki, Shinsaku Hiura, Naoki Asada
Recovering Depth Map from Video with Moving Objects

In this paper, we propose a novel approach to reconstructing depth map from a video sequence, which not only considers geometry coherence but also temporal coherence. Most of the previous methods of reconstructing depth map from video are based on the assumption of rigid motion, thus they cannot provide satisfactory depth estimation for regions with moving objects. In this work, we develop a depth estimation algorithm that detects regions of moving objects and recover the depth map in a Markov Random Field framework. We first apply SIFT matching across frames in the video sequence and compute the camera parameters for all frames and the 3D positions of the SIFT feature points via structure from motion. Then, the 3D depths at these SIFT points are propagated to the whole image based on image over-segmentation to construct an initial depth map. Then the depth values for the segments with large reprojection errors are refined by minimizing the corresponding re-projection errors. In addition, we detect the area of moving objects from the remaining pixels with large re-projection errors. In the final step, we optimize the depth map estimation in a Markov random filed framework. Some experimental results are shown to demonstrate improved depth estimation results of the proposed algorithm.

Hsiao-Wei Chen, Shang-Hong Lai
An Iterative Algorithm for Efficient Adaptive GOP Size in Transform Domain Wyner-Ziv Video Coding

Transform Domain Wyner-Ziv Video Coding (TDWZ) is one of the most popular paradigms of Distributed Video Coding (DVC) which supports low encoding complexity. However, there is still a gap in its coding performance compared to conventional video coding standards such as MPEG-x, or H.264/AVC. In order for TDWZ to reach comparable performance to them, a good method for deciding a proper Group of Picture (GOP) size is in great necessity. From this point of view, we propose an iterative algorithm which efficiently determines GOP size based on an intra mode decision method at frame level. This approach firstly constructs a coarse GOP size and then refines it by iterative checking for final GOP size. Experimental results show superiority of the proposed algorithm with improvement up to 2dB in term of coding efficiency.

Khanh DinhQuoc, Xiem HoangVan, Byeungwoo Jeon
A Robust Zero-Watermark Copyright Protection Scheme Based on DWT and Image Normalization

Recently, protecting the copyright of digital media has become an imperative issue due to the growing illegal reproduction and modification of digital media. A large number of digital watermarking algorithms have been proposed to protect the integrity and copyright of images. Traditional watermarking schemes protect image copyright by embedding a watermark in the spatial or frequency domain of an image. However, these methods degrade the quality of the original image in some extend. In recent years, a new approach called zero-watermarking algorithms is introduced. In these methods, the watermark does not require to be embedded into the protected image but is used to generate a verification map which is registered to a trusted authority for further protection. In this paper a robust copyright proving scheme based on discrete wavelet transform is proposed. It uses a normalization procedure to provide robustness against geometric distortions and a cellular automaton for noise robustness. Experimental results on images with different complexity demonstrate that our proposed scheme is robust against common geometric and non geometric attacks including blurring, JPEG compression, noise addition, sharpening, scaling, rotation, and cropping. In addition, our experimental results obtained on images with different complexities showed that our method could outperform the related methods in most cases.

Mahsa Shakeri, Mansour Jamzad
Multi-view Video Coding Based on High Efficiency Video Coding

Multiview video coding is one of the key techniques to realize the 3D video system. MPEG started a standardization activity on 3DVC (3D video coding) in 2007. 3DVC is based on multiview video coding. MPEG finalized the standard for multiview video coding (MVC) based on H.264/AVC in 2008. However, High Efficiency Video Coding (HEVC) which is a 2D video coding standard under developing outperforms the MVC although it does not employ interview prediction. Thus, we designed a new multiview video coding method based on HEVC. Interview prediction was added into HEVC and some coding tools were refined to be proper to MVC. The encoded multiple bitstreams are assembled into one bitstream and it is decoded into multiview video at decoder. From experimental results, we confirmed that the proposed MVC based on HEVC is much better than H.264/AVC, MVC, and HEVC. It achieves about 59.95% bit saving compared to JMVC simulcast at the same quality.

Kwan-Jung Oh, Jaejoon Lee, Du-Sik Park
2D to 3D Image Conversion Based on Classification of Background Depth Profiles

In this paper, a 2D to 3D stereo image conversion scheme is proposed for 3D content creation. The difficulty in this problem lies on depth estimation/assignment from a mono image, which actually does not have sufficient information. To estimate the depth map, we adopt a strategy of performing foreground/background separation first, then classifying a background depth profile by neural network, estimating foreground depth from image cues, and finally combining them. To enhance stereoscopic perception for the synthesized images viewed on 3D display, depth refinement based on bilateral filter and HVS-based contrast modification between the foreground and background are adopted. Subjective experiments show that the stereo images generated by using the proposed scheme can provide good 3D perception.

Guo-Shiang Lin, Han-Wen Liu, Wei-Chih Chen, Wen-Nung Lie, Sheng-Yen Huang
Shape Matching and Recognition Using Group-Wised Points

Shape matching/recognition is a very critical problem in the field of computer vision, and a lot of descriptors and methods have been studied in the literature. However, based on predefined descriptors, most of current matching stages are accomplished by finding the optimal correspondence between every two contour points, i.e., in a pair-wised manner. In this paper, we provide a novel matching method which is to find the correspondence between groups of contour points. The points in the same group are adjacent to each other, resulting in a strong relationship among them. Two groups are considered to be matched when the two point sequences formed by the two groups lead to a perfect one-to-one mapping. The proposed

group-wised matching

method is able to obtain a more robust matching result, since the co-occurrence (order) information of the grouped points is used in the matching stage. We test our method on three famous benchmarks: MPEG-7 data set, Kimia’s data set and Tari1000 data set. The retrieval results show that the new group-wised matching method is able to get encouraging improvements compared to some traditional pair-wised matching approaches.

Junwei Wang, Yu Zhou, Xiang Bai, Wenyu Liu
Backmatter
Metadata
Title
Advances in Image and Video Technology
Editor
Yo-Sung Ho
Copyright Year
2012
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-25346-1
Print ISBN
978-3-642-25345-4
DOI
https://doi.org/10.1007/978-3-642-25346-1