Skip to main content

2020 | Buch

Image and Video Technology

PSIVT 2019 International Workshops, Sydney, NSW, Australia, November 18–22, 2019, Revised Selected Papers

insite
SUCHEN

Über dieses Buch

This book constitutes the thoroughly refereed post-conference proceedings of four international workshops held in the framework of the 9th Pacific-Rim Symposium on Image and Video Technology, PSIVT 2019, in Sydney, NSW, Australia, in November 2019: Vision-Tech: Workshop on Challenges, Technology, and Solutions in the Areas of Computer Vision; Workshop on Passive and Active Electro‐Optical Sensors for Aerial and Space Imaging; Workshop on Deep Learning and Image Processing Techniques for Medical Images; and Workshop on Deep Learning for Video and Image Analysis.
The 16 revised full papers presented were carefully selected from 26 submissions. The papers cover the full range of state-of-the-art research in image and video technology with topics ranging from well-established areas to novel current trends.

Inhaltsverzeichnis

Frontmatter

Vision-Tech: A Workshop on Challenges, Technology, and Solutions in the Areas of Computer Vision

Frontmatter
Rain Streak Removal with Well-Recovered Moving Objects from Video Sequences Using Photometric Correlation
Abstract
The main challenge in a rain removal algorithm is to differentiate rain streak from moving objects. This paper addresses this problem using the spatiotemporal appearance technique (STA). Although the STA-based technique can significantly remove rain from video, in some cases it cannot properly retain all the moving object regions. The photometric feature of rain streak was used to solve this issue. In this paper, a new algorithm combining STA and the photometric correlation between rain streak and background is proposed. Rain streak and moving objects were successfully detected and separated by combining both techniques, then fused to obtain well-recovered moving objects with rain-free video. The experimental results reveal that the proposed algorithm significantly outperforms the state-of-the-art methods for both real and synthetic rain streak.
Muhammad Rafiqul Islam, Manoranjan Paul, Michael Antolovich
Face Analysis: State of the Art and Ethical Challenges
Abstract
In face analysis, the task is to identify a subject appearing in an image as a unique individual and to extract facial attributes like age, gender, and expressions from the face image. Over the last years, we have witnessed tremendous improvements in face analysis algorithms developed by the industry and by academia as well. Some applications, that might have been considered science fiction in the past, have become reality now. We can observe that nowadays tools are far from perfect, however, they can deal with very challenging images such as pictures taken in an unconstrained environment. In this paper, we show how easy is to build very effective applications with open source tools. For instance, it is possible to analyze the facial expressions of a public figure and his/her interactions in the last 24 h by processing images from Twitter given a hashtag. Obviously, the same analysis can be performed using images from a surveillance camera or from a family photo album. The recognition rate is now comparable to human vision, but computer vision can process thousands of images in a couple of hours. For these applications, it is not necessary to train complex deep learning networks, because they are already trained and available in public repositories. In our work, we show that anyone with certain computer skills can use (or misuse) this technology. The increased performance of facial analysis and its easy implementation have enormous potential for good, and –unfortunately– for ill too. For these reasons, we believe that our community should discuss the scope and limitations of this technology in terms of ethical issues such as definition of good practices, standards, and restrictions when using and teaching facial analysis.
Domingo Mery
Location Analysis Based Waiting Time Optimisation
Abstract
Customer satisfaction is very important to keep customer retention for any food and retail stores. Waiting time has been found is one of the most important factors to influence customers’ shopping experience and purchase termination rate and customers’ perceptions of retailer service offerings. Increasing customer retention can be achieved by avoiding long waiting time queuing at the checkouts. This paper investigates the current different types of sensor-based technologies in location detection to capture the customers’ behavior and then provides a fundamental optimization mechanism to avoid the long waiting time economically. Various approaches to identify a person’s location are compared in terms of principle and operation. Each makes its contribution to controlling the resources in a better way depending on the expected number of customers at checkout. Through an experiment on a supermarket, this paper contributes value to the improvement of operational resource planning, overcapacity avoiding while not increasing or reducing queues and waiting time of customers. The recommendation that waiting time is perceived by customers as a factor of lower service quality to business managers is given finally.
Hami Aksu, Wolfgang Dorner, Lihong Zheng

Passive and Active Electro-Optical Sensors for Aerial and Space Imaging

Frontmatter
In-Orbit Geometric Calibration of Firebird’s Infrared Line Cameras
Abstract
The German Aerospace Center (DLR) has developed and launched two small satellites (TET-1 and BIROS) as part of the FireBIRD mission. Both are capable to detect and observe fire related high temperature events (HTE) from space with infrared cameras. To enable a quick localization of the fires direct georeferencing of the images is required. Therefore the camera geometry measurements with laboratory set-up on ground have to be verified and validated using real data takes. This is achieved using ground control points (GCPs), identifiable in all spectral bands, allowing the investigations of the whole processing chain used for georeferencing. It is shown how the accuracy of direct georeferencing was significantly improved by means of in-orbit calibration using GCPs and how the workflow for processing and reprocessing was developed.
Jürgen Wohlfeil, Tilman Bucher, Anko Börner, Christian Fischer, Olaf Frauenberger, Björn Piltz
Evaluation of Structures and Methods for Resolution Determination of Remote Sensing Sensors
Abstract
Effective image resolution is an important image quality factor for remote sensing sensors and significantly affects photogrammetric processing tool chains. Tie points, mandatory for forming the block geometry, fully rely on feature points (i.e. SIFT, SURF) and quality of these points however is significantly correlated to image resolution. Spatial resolution can be determined in different ways. Utilizing bar test charts (e.g. USAF51), slanted edges (ISO 12233) and Siemens-Stars are widely accepted techniques. The paper describes these approaches and compares all in one joint experiment. Moreover, Slanted-Edge and Siemens-Star method is evaluated using (close to) ideal images convolved with known parameters. It will be shown that both techniques deliver conclusive and expected results.
Henry Meißner, Michael Cramer, Ralf Reulke

International Workshop on Deep Learning and Image Processing Techniques for Medical Images

Frontmatter
3D Image Reconstruction from Multi-focus Microscopic Images
Abstract
This paper presents a method for reconstructing 3D image from multi-focus microscopic images captured with different focuses. We model the multi-focus imaging by a microscopy and produce the 3D image of a target object based on the model. The 3D image reconstruction is done by minimizing the difference between the observed images and the simulated images generated by the imaging model. Simulation and experimental result shows that the proposed method can generate the 3D image of a transparent object efficiently and reliably.
Takahiro Yamaguchi, Hajime Nagahara, Ken’ichi Morooka, Yuta Nakashima, Yuki Uranishi, Shoko Miyauchi, Ryo Kurazume
Block-Wise Authentication and Recovery Scheme for Medical Images Focusing on Content Complexity
Abstract
Digital images are used to transfer most critical data in areas like medical, research, business, military, etc. The images transfer takes place over an unsecured Internet network. Therefore, there is a need for reliable security and protection for these sensitive images. Medical images play an important role in the field of Telemedicine and Tele surgery. Thus, before making any diagnostic decisions and treatments, the authenticity and the integrity of the received medical images need to be verified to avoid misdiagnosis. This paper proposes a block-wise and blind fragile watermarking mechanism for medical image authentication and recovery. By eliminating embedded insignificant data and considering different content complexity for each block during feature extraction and recovery, the capacity of data embedding without loss of quality is increased. This new embedding watermark method can embed a copy of the compressed image inside itself as a watermark to increase the recovered image quality. In our proposed hybrid scheme, the block features are utilized to improve the efficiency of data concealing for authentication and reduce tampering. Therefore, the scheme can achieve better results in terms of the recovered image quality and greater tampering protection, compared with the current schemes.
Faranak Tohidi, Manoranjan Paul, Mohammad Reza Hooshmandasl, Subrata Chakraborty, Biswajeet Pradhan
GAN-Based Method for Synthesizing Multi-focus Cell Images
Abstract
This paper presents a method for synthesizing multi-focus cell images by using generative adversarial networks (GANs). The proposed method, called multi-focus image GAN (MI-GAN), consists of two generators. A base image generator synthesizes a 2D base cell image from random noise. Using the generated base image, a multi-focus cell image generator produces 11 realistic multi-focus images of the cell while considering the relationships between the images acquired at successive focus points. From experimental results, MI-GAN achieves the good performance to generate realistic multi-focus cell images.
Ken’ich Morooka, Xueru Zhang, Shoko Miyauchi, Ryo Kurazume, Eiji Ohno

International Workshop on Deep Learning for Video and Image Analysis

Frontmatter
Improving Image-Based Localization with Deep Learning: The Impact of the Loss Function
Abstract
This work investigates the impact of the loss function on the performance of Neural Networks, in the context of a monocular, RGB-only, image localization task. A common technique used when regressing a camera’s pose from an image is to formulate the loss as a linear combination of positional and rotational mean squared error (using tuned hyperparameters as coefficients). In this work we observe that changes to rotation and position mutually affect the captured image, and in order to improve performance, a pose regression network’s loss function should include a term which combines the error of both of these coupled quantities. Based on task specific observations and experimental tuning, we present said loss term, and create a new model by appending this loss term to the loss function of the pre-existing pose regression network ‘PoseNet’. We achieve improvements in the localization accuracy of the network for indoor scenes; with reductions of up to 26.7% and 24.0% in the median positional and rotational error respectively, when compared to the default PoseNet.
Isaac Ronald Ward, M. A. Asim K. Jalwana, Mohammed Bennamoun
Face-Based Age and Gender Classification Using Deep Learning Model
Abstract
Age and gender classification of human’s face is an important research focus, having many application areas. Recently, Convolutional Neural Networks (CNNs) model has proven to be the most suitable method for the classification task, especially of unconstrained real-world faces. This could be as a result of its expertise in feature extraction and classification of face images. Availability of both high-end computers and large training data also contributed to its usage. In this paper, we, therefore, propose a novel CNN-based model to extract discriminative features from unconstrained real-life face images and classify those images into age and gender. We approach the large variations attributed to those unconstrained real-life faces with a robust image preprocessing algorithm and a pretraining on a large IMDb-WIKI dataset containing noisy and unfiltered age and genders labels. We also adopted a dropout and data augmentation regularization method to overcome the risk of overfitting and allow our model generalize on the test images. We show that well-designed network architecture and properly tuned training hyperparameters, give better results. The experimental results on OIU-Adience dataset confirm that our model outperforms other studies on the same dataset, showing significant performance in terms of classification accuracy. The proposed method achieves classification accuracy values of 84.8% on age group and classification accuracy of 89.7% on gender.
Olatunbosun Agbo-Ajala, Serestina Viriri
SO-Net: Joint Semantic Segmentation and Obstacle Detection Using Deep Fusion of Monocular Camera and Radar
Abstract
Vision-based semantic segmentation and obstacle detection are important perception tasks for autonomous driving. Vision-based semantic segmentation and obstacle detection are performed using separate frameworks resulting in increased computational complexity. Vision-based perception using deep learning reports state-of-the-art accuracy, but the performance is susceptible to variations in the environment. In this paper, we propose a radar and vision-based deep learning perception framework termed as the SO-Net to address the limitations of vision-based perception. The SO-Net also integrates the semantic segmentation and object detection within a single framework. The proposed SO-Net contains two input branches and two output branches. The SO-Net input branches correspond to vision and radar feature extraction branches. The output branches correspond to object detection and semantic segmentation branches. The performance of the proposed framework is validated on the Nuscenes public dataset. The results show that the SO-Net improves the accuracy of the vision-only-based perception tasks. The SO-Net also reports reduced computational complexity compared to separate semantic segmentation and object detection frameworks.
V. John, M. K. Nithilan, S. Mita, H. Tehrani, R. S. Sudheesh, P. P. Lalu
Deep Forest Approach for Facial Expression Recognition
Abstract
Facial Expression Recognition is a prospective area in Computer Vision (CV) and Human-Computer Interaction (HCI), with vast areas of application. The major concept in facial expression recognition is the categorization of facial expression images into six basic emotion states, and this is accompanied with many challenges. Several methods have been explored in search of an optimal solution, in the development of a facial expression recognition system. Presently, Deep Neural Network is the state-of-the-art method in the field with promising results, but it is incapacitated with the volume of data available for Facial Expression Recognition task. Therefore, there is a need for a method with Deep Learning feature and the dynamic ability for both large and small volume of data available in the field. This work is proposing a Deep Forest tree method that implements layer by layer feature of Deep Learning and minimizes overfitting regardless of data size. The experiments conducted on both Cohn Kanade (CK+) and Binghamton University 3D Facial Expression (BU-3DFE) datasets, prove that Deep Forest provides promising results with an impressive reduction in computational time.
Olufisayo Ekundayo, Serestina Viriri
Weed Density Estimation Using Semantic Segmentation
Abstract
Use of herbicides is rising globally to enhance crop yield and meet the ever increasing food demand. It adversely impacts environment and biosphere. To rationalize its use, variable rate herbicide based on weed densities mapping is a promising technique. Estimation of weed densities depends upon precise detection and mapping of weeds in the field. Recently, semantic segmentation is studied in precision agriculture due to its power to detect and segment objects in images. However, due to extremely difficult and time consuming job of labelling the pixels in agriculture images, its application is limited. To accelerate labelling process for semantic segmentation, a two step manual labelling procedure is proposed in this paper. The proposed method is tested on oat field imagery. It has shown improved intersection over union values as semantic models are trained on a comparatively bigger labelled real dataset. The method demonstrates intersection over union value of 81.28% for weeds and mean intersection over union value of 90.445%.
Muhammad Hamza Asad, Abdul Bais
Detecting Global Exam Events in Invigilation Videos Using 3D Convolutional Neural Network
Abstract
This paper designs a structure of 3D convolutional neural network to detect the global exam events in invigilation videos. Exam events in invigilation videos are defined according to the human activity performed at a certain phase in the entire exam process. Unlike general event detection which involves different scenes, global event detection focuses on differentiating different collective activities in the exam room ambiance. The challenges lie in the great intra-class variations within the same type of events due to various camera angles and different exam room ambiances, as well as inter-class similarities which are challengeable. This paper adopts the 3D convolutional neural network based on its ability in extracting spatio-temporal features and its effectiveness in detecting video events. Experiment results show the designed 3D convolutional neural network achieves an accuracy of its capability of 93.94% in detecting the global exam events, which demonstrates the effectiveness of our model.
Zichun Dai, Chao Sun, Xinguo Yu, Ying Xiang
Spatial Hierarchical Analysis Deep Neural Network for RGB-D Object Recognition
Abstract
Deep learning based object recognition methods have achieved unprecedented success in the recent years. However, this level of success is yet to be achieved on multimodal RGB-D images. The latter can play an important role in several computer vision and robotics applications. In this paper, we present spatial hierarchical analysis deep neural network, called ShaNet, for RGB-D object recognition. Our network consists of convolutional neural network (CNN) and recurrent neural network (RNNs) to analyse and learn distinctive and translationally invariant features in a hierarchical fashion. Unlike existing methods, which employ pre-trained models or rely on transfer learning, our proposed network is trained from scratch on RGB-D data. The proposed model has been tested on two different publicly available RGB-D datasets including Washington RGB-D and 2D3D object dataset. Our experimental results show that the proposed deep neural network achieves superior performance compared to existing RGB-D object recognition methods.
Syed Afaq Ali Shah
Reading Digital Video Clocks by Two Phases of Connected Deep Networks
Abstract
This paper presents an algorithm for reading digital video clocks by using two phases of connected deep networks to avoid the demerits of existing heuristic algorithms. The problem of reading digital video clocks can divided into two phases: locating the clock area and reading the clock digits. First, a phase of connected deep networks is a chain of neural networks to localize the clock area. Each of these neural networks takes use the properties of the working digital video clocks to work on one task. Its key step is to localize the place of second place by using the constancy and the periodicity of the pixels belong to second place. Second, the other phase of deep networks is a batch of custom digit recognizers that are designed based on deep networks and the properties of the working digital video clocks. The proposed method gets rid of the tedious heuristic procedure to find the accurate locations of all digits. Thus this paper forms the first algorithm that key tasks are taken by different neural networks. The experimental results show that the proposed algorithm can achieve a high accuracy in localizing and reading all the digits of clocks.
Xinguo Yu, Zhiping Chen, Hao Meng
Backmatter
Metadaten
Titel
Image and Video Technology
herausgegeben von
Joel Janek Dabrowski
Ashfaqur Rahman
Manoranjan Paul
Copyright-Jahr
2020
Electronic ISBN
978-3-030-39770-8
Print ISBN
978-3-030-39769-2
DOI
https://doi.org/10.1007/978-3-030-39770-8