Fast reference frame selection based on content similarity for low complexity HEVC encoder

doi:10.1016/j.jvcir.2016.07.018

Journal of Visual Communication and Image Representation

Volume 40, Part B, October 2016, Pages 516-524

https://doi.org/10.1016/j.jvcir.2016.07.018 Get rights and content

Highlights

•
A fast reference frame selection algorithm for the HEVC encoder is proposed.
•
A relationship between the content similarity and reference frame selection is derived.
•
The content similarity is studied without any extra computational complexity.
•
Experimental results show that the proposed algorithm efficiently removes the encoding complexity of the best reference frame decision process.

Abstract

The high efficiency video coding (HEVC) is the state-of-the-art video coding standard, which achieves about 50% bit rate saving while maintaining the same visual quality as compared to the H.264/AVC. This achieved coding efficiency benefits from a set of advanced coding tools, such as the multiple reference frames (MRF) based interframe prediction, which efficiently improves the coding efficiency of the HEVC encoder, while it also increases heavy computation into the HEVC encoder. The high encoding complexity becomes a bottleneck for the high definition videos and HEVC encoder to be widely used in real-time and low power multimedia applications. In this paper, we propose a content similarity based fast reference frame selection algorithm for reducing the computational complexity of the multiple reference frames based interframe prediction. Based the large content similarity between the parent prediction unit (Inter_2N × 2N) and the children prediction units (Inter_2N × N, Inter_N × 2N, Inter_N × N, Inter_2N × nU, Inter_2N × nD, Inter_nL × 2N, and Inter_nR × 2N), the reference frame selection information of the children prediction units are obtained by learning the results of their parent prediction unit. Experimental results show that the proposed algorithm can reduce about 54.29% and 43.46% MRF encoding time saving for the low-delay-main and random-access-main coding structures, respectively, while the rate distortion performance degradation is negligible.

Introduction

With the development of capture and display technologies, the full high definition (HD) and ultra HD videos are attracting more and more people’s attention since they can provide higher perception/video quality. However, with the increased video resolution and frame rate, the data volume of the raw HD videos increases dramatically. It is highly desirable to develop high compression techniques due to the current memory and channel bandwidth are still limited. Under this kind of compression rate demand, the joint collaborative team on video coding (JCT-VC) of the ITU-T video coding experts group (VCEG) and ISO/IEC moving picture experts group (MPEG) has developed a state-of-the-art video coding standard named high efficiency video coding (HEVC) [1], [2], [3]. The HEVC can achieve the same subjective visual quality as the H.264/AVC [4] high profile while requiring only about 50% of the bit rate. This obtained coding efficiency benefits from a set of advanced coding tools, such as flexible size unit representation, intraframe prediction with 35 modes, multiple reference frames (MRF) interframe prediction, new in-loop filtering methods, and so on. Meanwhile, the computational complexity of the HEVC encoder increases dramatically as these used coding tools. The high computational complexity becomes a bottleneck for the HD videos and HEVC encoder to be widely used in real-time and low power multimedia applications, such as live video broadcasting, mobile video communication, and video surveillance. Thus, there is a pressing need to reduce the computational complexity of the HEVC encoder.

Recently, many researchers have devoted their efforts on reducing the computational complexity of the HEVC encoder [5], [6], [7], [8], [9], [10], [11]. Based on the Bayesian decision theory and rate distortion (RD) characteristics, Lee et al. proposed a fast coding unit (CU) size decision method for the HEVC [5]. In [6], Shen et al. proposed a CU depth decision method based on the depth selection correlation between the spatial-temporal neighboring CUs and the current CU. Besides, they also proposed an early termination for the motion estimation based on the motion homogeneity, RD cost and Skip mode. By using the CU depth selection information of spatial neighboring CUs, Kim et al. proposed a CU depth range decision method for the HEVC [7]. In [8], the RD-complexity characteristics of the inter prediction was analyzed and derived an efficient inter mode decision method for the HEVC. Based on the CU motion activity and mode selection correlation among hierarchical depth CUs, Pan et al. proposed an early Merge mode decision method for the HEVC fast interframe prediction [9]. By utilizing the estimated optical flow of the downsampled frames, Xiong et al. proposed a fast inter CU selection method for the HEVC [10]. Based on the prediction mode and RD cost correlations among different quadtree depth levels and spatially neighboring CUs, Shen et al. proposed a fast CU size and intra mode decision method for the HEVC [11]. These methods mainly focus on reducing the computational complexity of the flexible size unit representation technique, the HEVC encoding complexity could be further reduced by optimizing the MRF based interframe prediction.

In the last decade, a number of methods have been proposed to reduce the encoding complexity of the MRF based interframe prediction for the H.264/AVC and its extensions [12], [13], [14], [15], [16], [17], [18]. By taking into account the correlation/continuity of motion vectors among different reference frames, Su et al. proposed a fast MRF based motion estimation for the H.264/AVC [12]. Based on the spatial and temporal correlation of the reference frame index and motion vectors, Jun et al. proposed an efficient priority-based MRF selection method for the H.264/AVC fast motion estimation [13]. In [14], Chen et al. proposed a fast MRF based motion estimation for the H.264/AVC by using the stored motion vectors to compose the motion vector without searching all active reference frames. In [15], Liu et al. proposed a fast MRF selection method for the H.264/AVC motion estimation by using the motion activity and Hadamard coefficients. Based on the reference frame selection of the 16 × 16 mode partition, Zhang et al. proposed an efficient MRF selection method for the H.264 based multiview video coding [16]. In [17], Yeh et al. proposed a fast mode decision based MRF selection for H.264 based multiview video coding system by using inter-view rate distortion prediction method. In [18], by using the inter-view and inter-component correlations based fast mode decision, Lei et al. proposed a low complexity MRF decision method for H.264 based multiview depth video coding. These methods can efficiently reduce the computational complexity, however, they were proposed for the H.264, and are not suitable for directly applying into the HEVC encoder due to the different statistical characteristics and different coding tools used in HEVC encoding system. In [19], according to the motion complexity which is computed by the distribution of the best reference frame, the motion vector difference and its associated average distortion, a fast reference frame selection was proposed. However, the MRF encoding time saving of that method is still limited and unstable for the HEVC with random-access-main coding structure due to using of the IBP prediction structure.

In this paper, we propose a fast MRF selection algorithm for the fast HEVC interframe prediction, which is based on the relationship between the content similarity and the reference frame selection. The rest of this paper is organized as follows. The review on the HEVC MRF encoding process is presented in Section 2. Then, the details of the proposed fast MRF selection algorithm are illustrated in Section 3. Section 4 shows the experimental results. Then, an algorithm discussion is given in Section 5. At last, Section 6 concludes this paper.

Section snippets

Review on the HEVC MRF encoding process

As previously video coding standards such as H.264/AVC, the HEVC standard is also a hybrid video encoder. In the HEVC encoding process, each frame is partitioned into a sequence of coding tree units (CTUs), which is the basic unit of coding, and consists of a luma coding tree block (CTB), two chroma CTBs and associated syntax elements of 4:2:0 color sampling. According to the quadtree syntax, the CTU is further split into one or multiple CUs. Then, based on the prediction-type, the CU can be

Encoding complexity analysis on the MRF encoding process

In order to analyze the encoding complexity of the MRF selection process, eight HEVC test sequences (BQSquare, BasketballPass, BQMall, BasketballDrill, FourPeople, Johnny, Cactus, and ParkScene) with various resolutions and motion activities are encoded by the HM12.0 [21] under the HEVC common test conditions [22]. Four quantization parameters (QPs) (22, 27, 32, and 37) are used. The motion estimation search range and method are 64 and TZSearch, respectively. The low-delay-main and

Experimental results

To evaluate the coding performance of the proposed fast MRF selection algorithm, the HEVC reference software HM12.0 is used as the software platform. The hardware platform is Intel Xeon CPU E5-1620 v2 @ 3.70 GHz, 16.0 GB RAM with the Microsoft Windows 7 64-bit operating system. To compare the coding performance in terms of BDPSNR, BDBR, total encoding time saving (TS for short), total reference frame encoding time saving (RTS for short), three 416 × 240 sequences with Class D (BQSquare,

Algorithm discussion

It is well known that when designing the fast algorithms for video coding, the decision accuracy of the proposed algorithm is highly correlated with the coding efficiency. In other words, if the decision accuracy is large and close to 100%, there would be no RD performance degraded; on the contrary, if the decision accuracy is small, the RD performance would be degraded. From Table 2, we can see that the average decision accuracy of the proposed algorithm is 63.26% and 67.74% for the

Conclusion

The MRF encoding process consumes about 70% of total encoding time of an HEVC encoder. To reduce the computational complexity of the MRF encoding process, an early reference frame decision algorithm is proposed in this paper. Since there is high video content similarity between the parent PU and children PUs, the reference frame information including inference frame index and reference frame direction of the children PUs is set according to the parent PU has. Experimental results show that the

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61501246, Grant 61271324, Grant 61471348, Grant 61232016, in part by the Natural Science Foundation of Jiangsu Province of China under Grant BK20150930, in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant 15KJB510019, in part by the Natural Science Foundation of Hebei Province of China under Grant F2015202311, in part by the Project through the

References (28)

H. Wang et al.
Early detection of all-zero 4 × 4 blocks in high efficiency video coding
J. Visual Commun. Image Represent.
(2014)
J. Chen et al.
Parallel fast inter mode decision for H.264/AVC encoding
J. Visual Commun. Image Represent.
(2013)
P. Hanhart et al.
Calculation of average coding efficiency based on subjective quality scores
J. Visual Commun. Image Represent.
(2014)
Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 2: High...
G.J. Sullivan et al.
Overview of the high efficiency video coding (HEVC) standard
IEEE Trans. Circ. Syst. Video Technol.
(2012)
J. Lee et al.
A fast CU size decision algorithm for HEVC
IEEE Trans. Circ. Syst. Video Technol.
(2014)
L. Shen et al.
An effective CU size decision method for HEVC encoders
IEEE Trans. Multimedia
(2013)
D.-H. Kim et al.
Selective CU depth range decision algorithm for HEVC encoder
J. Vanne et al.
Efficient mode decision schemes for HEVC inter prediction
IEEE Trans. Circ. Syst. Video Technol.
(2014)
Z. Pan et al.
Early MERGE mode decision based on motion estimation and hierarchical depth correlation for HEVC
IEEE Trans. Broadcasting
(2014)

J. Xiong et al.

A fast HEVC inter CU selection method based on pyramid motion divergence

IEEE Trans. Multimedia

(2014)

L. Shen et al.

Fast CU size decision and mode decision algorithm for HEVC intra coding

IEEE Trans. Consumer Electron.

(2013)

Y. Su et al.

Fast multiple reference frame motion estimation for H.264/AVC

IEEE Trans. Circ. Syst. Video Technol.

(2006)

D. Jun et al.

An efficient priority-based reference frame selection method for fast motion estimation in H.264/AVC

IEEE Trans. Circ. Syst. Video Technol.

(2010)

Cited by (98)

Face mask detection using deep convolutional neural network and multi-stage image processing
2023, Image and Vision Computing
Face mask detection has several applications including real-time surveillance, biometrics, etc. Face mask detection is also useful for surveillance of the public to ensure face mask wearing in public places. Ensuring that people are wearing a face mask is not possible with monitoring staff; instead, automatic systems are a much better choice for face mask detection and monitoring to help manage public behaviour and contribute to restricting the outbreak of COVID-19. Despite the availability of several such systems, the lack of a real image dataset is a big hurdle to validating state-of-the-art face mask detection systems. In addition, using the simulated datasets lack the analysis needed for real-world scenarios. This study builds a new dataset namely RILFD by taking real pictures using a camera and annotating them with two labels (with mask, without mask) which are publicly available for future research. In addition, this study investigates various machine learning models and off-the-shelf deep learning models YOLOv3 and Faster R-CNN for the detection of face masks. The customized CNN models in combination with the 4 steps of image processing are proposed for face mask detection. The proposed approach outperforms other models and proved its robustness with a 97.5% of accuracy score in face mask detection on the RILFD dataset and two publicly available datasets (MAFA and MOXA).
Fused GRU with semantic-temporal attention for video captioning
2020, Neurocomputing
Citation Excerpt :
In [32], they develop a so-called correlation component manifold space learning (CCMSL) to learn a common feature space by capturing the correlations between the heterogeneous databases. In [33], they propose a content similarity based fast reference frame selection algorithm for reducing the computational complexity of the multiple reference frames based inter-frame prediction. Therefore, in this paper we focus on studying how to design an effective approach, which is capable of utilizing semantic concepts to improve video captioning.
The encoder-decoder framework has been widely used for video captioning to achieve promising results, and various attention mechanisms are proposed to further improve the performance. While temporal attention determines where to look, semantic decides the context. However, the combination of semantic and temporal attention has never be exploited for video captioning. To tackle this issue, we propose an end-to-end pipeline named Fused GRU with Semantic-Temporal Attention (STA-FG), which can explicitly incorporate the high-level visual concepts to the generation of semantic-temporal attention for video captioning. The encoder network aims to extract visual features from the videos and predict their semantic concepts, while the decoder network is focusing on efficiently generating coherent sentences using both visual features and semantic concepts. Specifically, the decoder combines both visual and semantic representation, and incorporates a semantic and temporal attention mechanism in a fused GRU network to accurately learn the sentences for video captioning. We experimentally evaluate our approach on the two prevalent datasets MSVD and MSR-VTT, and the results show that our STA-FG achieves the currently best performance on both BLEU and METEOR.
Detection Tampering in Digital Video in Frequency Domain using DCT with Halftone
2024, International Journal of Computing and Digital Systems
Detection Tampering in Digital Video in Frequency Domain using DCT with Halftone
2023, Research Square
Inter prediction multiple reference frames impact on H266-VVC encoder
2023, Multimedia Tools and Applications
Fast Skip Inter Coding Decision Algorithm for VVC
2023, Communications in Computer and Information Science

View all citing articles on Scopus

^☆: This paper has been recommended for acceptance by Zicheng Liu.

View full text

Fast reference frame selection based on content similarity for low complexity HEVC encoder☆

Highlights

Abstract

Introduction

Section snippets

Review on the HEVC MRF encoding process

Encoding complexity analysis on the MRF encoding process

Experimental results

Algorithm discussion

Conclusion

Acknowledgments

J. Visual Commun. Image Represent.

J. Visual Commun. Image Represent.

J. Visual Commun. Image Represent.

Overview of the high efficiency video coding (HEVC) standard

IEEE Trans. Circ. Syst. Video Technol.

A fast CU size decision algorithm for HEVC

IEEE Trans. Circ. Syst. Video Technol.

An effective CU size decision method for HEVC encoders

IEEE Trans. Multimedia

Selective CU depth range decision algorithm for HEVC encoder

Efficient mode decision schemes for HEVC inter prediction

IEEE Trans. Circ. Syst. Video Technol.

Early MERGE mode decision based on motion estimation and hierarchical depth correlation for HEVC

IEEE Trans. Broadcasting

A fast HEVC inter CU selection method based on pyramid motion divergence

IEEE Trans. Multimedia

Fast CU size decision and mode decision algorithm for HEVC intra coding

IEEE Trans. Consumer Electron.

Fast multiple reference frame motion estimation for H.264/AVC

IEEE Trans. Circ. Syst. Video Technol.

An efficient priority-based reference frame selection method for fast motion estimation in H.264/AVC

IEEE Trans. Circ. Syst. Video Technol.