Skip to main content

2019 | Buch

Image Analysis and Recognition

16th International Conference, ICIAR 2019, Waterloo, ON, Canada, August 27–29, 2019, Proceedings, Part II

insite
SUCHEN

Über dieses Buch

This two-volume set LNCS 11662 and 11663 constitutes the refereed proceedings of the 16th International Conference on Image Analysis and Recognition, ICIAR 2019, held in Waterloo, ON, Canada, in August 2019.

The 58 full papers presented together with 24 short and 2 poster papers were carefully reviewed and selected from 142 submissions. The papers are organized in the following topical sections: Image Processing; Image Analysis; Signal Processing Techniques for Ultrasound Tissue Characterization and Imaging in Complex Biological Media; Advances in Deep Learning; Deep Learning on the Edge; Recognition; Applications; Medical Imaging and Analysis Using Deep Learning and Machine Intelligence; Image Analysis and Recognition for Automotive Industry; Adaptive Methods for Ultrasound Beamforming and Motion Estimation.

Inhaltsverzeichnis

Frontmatter

Deep Learning on the Edge

Frontmatter
Foothill: A Quasiconvex Regularization for Edge Computing of Deep Neural Networks

Deep neural networks (DNNs) have demonstrated success for many supervised learning tasks, ranging from voice recognition, object detection, to image classification. However, their increasing complexity might yield poor generalization error that make them hard to be deployed on edge devices. Quantization is an effective approach to compress DNNs in order to meet these constraints. Using a quasiconvex base function in order to construct a binary quantizer helps training binary neural networks (BNNs) and adding noise to the input data or using a concrete regularization function helps to improve generalization error. Here we introduce foothill function, an infinitely differentiable quasiconvex function. This regularizer is flexible enough to deform towards $$L_1$$ and $$L_2$$ penalties. Foothill can be used as a binary quantizer, as a regularizer, or as a loss. In particular, we show this regularizer reduces the accuracy gap between BNNs and their full-precision counterpart for image classification on ImageNet.

Mouloud Belbahri, Eyyüb Sari, Sajad Darabi, Vahid Partovi Nia
NetScore: Towards Universal Metrics for Large-Scale Performance Analysis of Deep Neural Networks for Practical On-Device Edge Usage

Much of the focus in the design of deep neural networks has been on improving accuracy, leading to more powerful yet highly complex network architectures that are difficult to deploy in practical scenarios, particularly on edge devices such as mobile and other consumer devices given their high computational and memory requirements. As a result, there has been a recent interest in the design of quantitative metrics for evaluating deep neural networks that accounts for more than just model accuracy as the sole indicator of network performance. In this study, we continue the conversation towards universal metrics for evaluating the performance of deep neural networks for practical on-device edge usage. In particular, we propose a new balanced metric called NetScore, which is designed specifically to provide a quantitative assessment of the balance between accuracy, computational complexity, and network architecture complexity of a deep neural network, which is important for on-device edge operation. In what is one of the largest comparative analysis between deep neural networks in literature, the NetScore metric, the top-1 accuracy metric, and the popular information density metric were compared across a diverse set of 60 different deep convolutional neural networks for image classification on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2012) dataset. The evaluation results across these three metrics for this diverse set of networks are presented in this study to act as a reference guide for practitioners in the field. The proposed NetScore metric, along with the other tested metrics, are by no means perfect, but the hope is to push the conversation towards better universal metrics for evaluating deep neural networks for use in practical on-device edge scenarios to help guide practitioners in model design for such scenarios.

Alexander Wong
Real-Time Person Re-identification at the Edge: A Mixed Precision Approach

A critical part of multi-person multi-camera tracking is person re-identification (re-ID) algorithm, which recognizes and retains identities of all detected unknown people throughout the video stream. Many re-ID algorithms today exemplify state of the art results, but not much work has been done to explore the deployment of such algorithms for computation and power constrained real-time scenarios. In this paper, we study the effect of using a light-weight model, MobileNet-v2 for re-ID and investigate the impact of single (FP32) precision versus half (FP16) precision for training on the server and inference on the edge nodes. We further compare the results with the baseline model which uses ResNet-50 on state of the art benchmarks including CUHK03, Market-1501, and Duke-MTMC. The MobileNet-V2 mixed precision training method can improve both inference throughput on the edge node, and training time on server 3.25 $$\times $$ reaching to 27.77 fps and 1.75 $$\times $$ , respectively and decreases power consumption on the edge node by 1.45 $$\times $$ , while it deteriorates accuracy only 5.6% in respect to ResNet-50 single precision on the average for three different datasets. The code and pre-trained networks are publicly available. ( https://github.com/TeCSAR-UNCC/person-reid )

Mohammadreza Baharani, Shrey Mohan, Hamed Tabkhi
Product Recommendation Through Real-Time Object Recognition on Image Classifiers

With the development of e-commerce in the past years and its growing overlap over the classic way of doing business, many computational and statistical methods were researched and developed to make recommendations for products belonging to the store catalog. Often the data used in recommendation methods involves user interactions, being images and video types of information somewhat unexplored. This work, which we call Xanathar, proposes to extend such paradigm with real-time in-video recommendations for 25 classes of products, using image classifiers and feeding video streams to a modified ResNet-50 network processed on GPU, achieving a top-5 error of 5.17% and running at approximately 60 frames per second. Therefore, describing objects in the scene and proposing related products in-screen, directing user buying experience and creating an immersive and intensive purchase environment.

Nelson Forte de Souza Junior, Leandro Augusto da Silva, Mauricio Marengoni
Visual Inspection with Federated Learning

In industrial applications of AI, challenges for visual inspection include data shortage and security. In this paper, we propose a Federated Learning (FL) framework to address these issues. This method is incorporated with our novel DataonomySM approach which can overcome the limited size of industrial dataset in each inspection task. The models pre-trained in the server can continuously and regularly update, and help each client upgrade its inspection model over time. The FL approach only requires clients to send to the server certain information derived from raw images, and thus does not sacrifice data security. Some preliminary tests are done to examine the workability of the proposed framework. This study is expected to bring the field of automated inspection to a new level of security, reliability, and efficiency, and to unlock significant potentials of deep learning applications.

Xu Han, Haoran Yu, Haisong Gu

Recognition

Frontmatter
Looking Under the Hood: Visualizing What LSTMs Learn

Recurrent Neural Networks are a state of the art method for modeling sequential data. Unfortunately, the practice of RNNs is ahead of the theory. We lack any method for summarizing or analyzing what a network has learned, once it’s trained. This paper presents two methods for visualizing concepts learned by RNNs in the domain of action recognition. The first method shows the sensitivity of joints over time. The second generates synthetic videos that maximize the responses of a class label or hidden unit given a set of anatomical constraints. These techniques are combined in a visualization tool called SkeletonVis to help developers and users gain insights into models embedded in RNNs for action recognition.

Dhruva Patil, Bruce A. Draper, J. Ross Beveridge
Information Fusion via Multimodal Hashing with Discriminant Canonical Correlation Maximization

In this paper, we introduce an effective information fusion method using multimodal hashing with discriminant canonical correlation maximization. As an effective computation method of similarity between different inputs, multimodal hashing technique has attracted increasing attentions in fast similarity search. In this paper, the proposed approach not only finds the minimum of the semantic similarity across different modalities by multimodal hashing, but also is capable of extracting the discriminant representations, which minimize the between-class correlation and maximize the within-class correlation simultaneously for information fusion. Benefiting from the combination of semantic similarity across different modalities and the discriminant representation strategy, the proposed algorithm can achieve improved performance. A prototype of the proposed method is implemented to demonstrate its performance in audio emotion recognition and cross-modal (text-image) fusion. Experimental results show that the proposed approach outperforms the related methods, in terms of accuracy.

Lei Gao, Ling Guan
Unsupervised Variational Learning of Finite Generalized Inverted Dirichlet Mixture Models with Feature Selection and Component Splitting

Variational learning of mixture models has proved to be effective in recent research. In this paper, we propose a generalized inverted Dirichlet based mixture model with an incremental variational algorithm. We incorporate feature selection and a component splitting approach for model selection within the variational framework. This helps us estimate the complexity of the data efficiently concomitantly eliminating the irrelevant features. We validate our model with two challenging applications; image categorization and dynamic texture categorization.

Kamal Maanicshah, Samr Ali, Wentao Fan, Nizar Bouguila
TPUAR-Net: Two Parallel U-Net with Asymmetric Residual-Based Deep Convolutional Neural Network for Brain Tumor Segmentation

The utilization of different types of brain images has been expanding, which makes manually examining each image a labor-intensive task. This study introduces a brain tumor segmentation method that uses two parallel U-Net with an asymmetric residual-based deep convolutional neural network (TPUAR-Net). The proposed method is customized to segment high and low grade glioblastomas identified from magnetic resonance imaging (MRI) data. Varieties of these tumors can appear anywhere in the brain and may have practically any shape, contrast, or size. Thus, this study used deep learning techniques based on adaptive, high-efficiency neural networks in the proposed model structure. In this paper, several high-performance models based on convolutional neural networks (CNNs) have been examined. The proposed TPUAR-Net capitalizes on different levels of global and local features in the upper and lower paths of the proposed model structure. In addition, the proposed method is configured to use the skip connection between layers and residual units to accelerate the training and testing processes. The TPUAR-Net model provides promising segmentation accuracy using MRI images from the BRATS 2017 database, while its parallelized architecture considerably improves the execution speed. The results obtained in terms of Dice, sensitivity, and specificity metrics demonstrate that TPUAR-Net outperforms other methods and achieves the state-of-the-art performance for brain tumor segmentation.

Mahmoud Khaled Abd-Ellah, Ashraf A. M. Khalaf, Ali Ismail Awad, Hesham F. A. Hamed
Data Clustering Using Variational Learning of Finite Scaled Dirichlet Mixture Models with Component Splitting

We have developed a variational learning approach for finite Scaled Dirichlet mixture model with local model selection framework. By gradually splitting the components, our model is able to reach convergence as well as obtain the optimal number of clusters. By tackling real life challenging problems including spam detection and object clustering, the proposed model’s flexibility and performance are validated.

Hieu Nguyen, Kamal Maanicshah, Muhammad Azam, Nizar Bouguila
Sequential Image Synthesis for Human Activity Video Generation

In the field of computer graphics and multimedia, automatic synthesis of a new set of image sequences from another different set of image sequences for creating realistic video or animation of some human activity performed is a research challenge. Traditionally, creating such animation or similar visual media contents is done manually, which is a tedious task. Recent advancements in deep learning have made some promising progress for automating this type of media creation process. This work is motivated by the idea to synthesize a temporally coherent sequence of images (e.g., a video) of a person performing some activity by using a video or set of images of a different person performing a similar activity. To achieve that, our approach utilized the cycle-consistent adversarial network (CycleGAN). We present a new approach for learning to transfer a human activity from a source domain to a target domain without using any complicated pose detection or extraction method. Our objective in this work is to learn a mapping between two consecutive sequences of images from two domains representing two different activities and use that mapping to transfer the activity from one domain to another for synthesizing an entirely new consecutive sequence of images, which can be combined to make a video of new human activity. We also present and analyze some qualitative results generated by our method.

Fahim Hasan Khan, Akila de Silva, Jayanth Yetukuri, Narges Norouzi
A Deep Learning-Based Noise-Resilient Keyword Spotting Engine for Embedded Platforms

Keyword spotting (KWS) is important in numerous trigger, trigger-command and command and control applications of embedded platforms. However, the embedded platforms used currently in the fast growing market of the Internet of Things (IoT) and in standalone systems have still considerable processing power, memory and battery constraints. In IoT and smart devices applications, speakers are usually far from the microphone resulting in severe distortions and considerable amounts of noise and noticeable reverberation. Speech enhancement can be used as a front-end or pre-processing module to improve the performance of the KWS. However, denoisers and dereverberators as front-end processing modules add to the complexity of the keyword spotting system and the computing, memory and battery requirements of the embedded platforms. In this paper, a noise robust keyword spotting engine with small memory footprint is presented. Multi-condition utterances training of a deep neural networks model is developed to increase the keyword spotting noise robustness. A comparative study is conducted to compare the deep learning approach with Gaussian mixture model. Experimental results show that deep learning outperforms the Gaussian approach in both clean and noisy conditions. Moreover, deep learning model trained using partially noisy data saves the need for using speech enhancement module or denoiser for front-end processing.

Ramzi Abdelmoula, Alaa Khamis, Fakhri Karray
A Compact Representation of Histopathology Images Using Digital Stain Separation and Frequency-Based Encoded Local Projections

In recent years, histopathology images have been increasingly used as a diagnostic tool in the medical field. The process of accurately diagnosing a biopsy sample requires significant expertise in the field, and as such can be time-consuming and is prone to uncertainty and error. With the advent of digital pathology, using image recognition systems to highlight problem areas or locate similar images can aid pathologists in making quick and accurate diagnoses. In this paper, we specifically consider the encoded local projections (ELP) algorithm, which has previously shown some success as a tool for classification and recognition of histopathology images. We build on the success of the ELP algorithm as a means for image classification and recognition by proposing a modified algorithm which captures the local frequency information of the image. The proposed algorithm estimates local frequencies by quantifying the changes in multiple projections in local windows of greyscale images. By doing so we remove the need to store the full projections, thus significantly reducing the histogram size, and decreasing computation time for image retrieval and classification tasks. Furthermore, we investigate the effectiveness of applying our method to histopathology images which have been digitally separated into their hematoxylin and eosin stain components. The proposed algorithm is tested on the publicly available invasive ductal carcinoma (IDC) data set. The histograms are used to train an SVM to classify the data. The experiments showed that the proposed method outperforms the original ELP algorithm in image retrieval tasks. On classification tasks, the results are found to be comparable to state-of-the-art deep learning methods and better than many handcrafted features from the literature.

Alison K. Cheeseman, Hamid Tizhoosh, Edward R. Vrscay
Computer-Aided Tumor Segmentation from T2-Weighted MR Images of Patient-Derived Tumor Xenografts

Magnetic resonance imaging (MRI) is typically used to detect and assess therapeutic response in preclinical imaging of patient-derived tumor xenografts (PDX). The overarching objective of the work is to develop an automated methodology to detect and segment tumors in PDX for subsequent analyses. Automated segmentation also has the benefit that it will minimize user bias. A hybrid method combining fast k-means, morphology, and level set is used to localize and segment tumor volume from volumetric MR images. Initial centroids of k-means are selected by local density peak estimation method. A new variational model is implemented to exploit the region information by minimizing energy functional in level set. The mask specific initialization approach is used to create a genuine boundary of level set. Performance of tumor segmentation is compared with manually segmented image and to established algorithms. Segmentation results obtained from six metrics are Jaccard score (>80%), Dice score (>85%), F score (>85%), G-mean (>90%), volume similarity matrix (>95%) and relative volume error (<8%). The proposed method reliably localizes and segments PDX tumors and has the potential to facilitate high-throughput analysis of MR imaging in co-clinical trials involving PDX.

Sudipta Roy, Kooresh Isaac Shoghi

Applications

Frontmatter
Sit-to-Stand Analysis in the Wild Using Silhouettes for Longitudinal Health Monitoring

We present the first fully automated Sit-to-Stand or Stand-to-Sit (StS) analysis framework for long-term monitoring of patients in free-living environments using video silhouettes. Our method adopts a coarse-to-fine time localisation approach, where a deep learning classifier identifies possible StS sequences from silhouettes, and a smart peak detection stage provides fine localisation based on 3D bounding boxes. We tested our method on data from real homes of participants and monitored patients undergoing total hip or knee replacement. Our results show 94.4% overall accuracy in the coarse localisation and an error of 0.026 m/s in the speed of ascent measurement, highlighting important trends in the recuperation of patients who underwent surgery.

Alessandro Masullo, Tilo Burghardt, Toby Perrett, Dima Damen, Majid Mirmehdi
Target Aware Visual Object Tracking

We propose a visual object tracker that improves accuracy while significantly decreasing false alarm rate. This is achieved by a late fusion scheme that integrates the motion model of particle sampling with the region proposal network of Mask R-CNN during inference. The qualified bounding boxes selected by the late fusion are fed into the Mask R-CNN’s head layer for the detection of the tracked object. We refer the introduced scheme, TAVOT, as target aware visual object tracker since it is capable of minimizing false detections with the guidance of variable rate particle sampling initialized by the target region of interest. It is shown that TAVOT is capable of modeling temporal video content with a simple motion model thus constitutes a promising video object tracker. Performance evaluation performed on VOT2016 video sequences demonstrates that TAVOT 22% increases the success rate, while 73% decreasing the false alarm rate compared to the baseline Mask R-CNN. Compared to the top tracker of VOT2016 around 5% increase at the success rate is reported where intersection over union is greater than 0.5.

Caner Ozer, Filiz Gurkan, Bilge Gunsel
Design of an End-to-End Dual Mode Driver Distraction Detection System

This paper provides initial results on developing a deep neural network-based system for driver distraction detection which is operational at daytime as well as nightime. Unlike other existing methods that rely on only RGB images for daytime detection, the proposed system consists of two operating modes. The daytime mode uses a convolutional neural network to classify drivers’ states based on their body poses in RGB images. The nighttime mode classifies Near Infrared images using a different neural network-based model and trained under different circumstances. To the best of our knowledge, this is the first work that explicitly addresses driver behavior detection at night using end-to-end convolutional neural networks. With initial experimental results, we empirically demonstrate that, with a relatively modest model complexity, the proposed system achieves high performance on driver distraction detection for both modes. Furthermore, we discuss the feasibility of developing a system with a small footprint and design structure but accurate enough to be deployed on a memory-restricted computing platform environment.

Chaojie Ou, Qiang Zhao, Fakhri Karray, Alaa El Khatib
Key-Track: A Lightweight Scalable LSTM-based Pedestrian Tracker for Surveillance Systems

There has been a growing interest in leveraging state of the art deep learning techniques for tracking objects in recent years. Most of this work focuses on using redundant appearance models for predicting object tracklets for the next frame. Moreover, not much work has been done to explore the sequence learning properties of Long Short Term Memory (LSTM) Neural Networks for object tracking in video sequences. In this work we propose a novel LSTM tracker, Key-Track, which effectively learns the spatial and temporal behavior of pedestrians after analyzing movement patterns of human key-points provided to it by OpenPose [3]. We train Key-Track on single person sequences that we curated from the Duke Multi-target Multi-Camera (Duke-MTMC) [26] dataset and scale it to track multiple people at run-time, further testing its scalability. We report our results on the Duke-MTMC dataset for different time-series sequence lengths we feed to Key-Track and find three as the optimum time-step sequence length producing the highest Average Overlap Score (AOS). We further present our qualitative analysis on these different time-series sequence lengths producing different results depending on the type of video sequence. The total observed size of Key-Track is under 1 megabytes which paves its way into mobile devices for the purpose of tracking in real-time.

Pratik Kulkarni, Shrey Mohan, Samuel Rogers, Hamed Tabkhi
KPTransfer: Improved Performance and Faster Convergence from Keypoint Subset-Wise Domain Transfer in Human Pose Estimation

In this paper, we present a novel approach called KPTransfer for improving modeling performance for keypoint detection deep neural networks via domain transfer between different keypoint subsets. This approach is motivated by the notion that rich contextual knowledge can be transferred between different keypoint subsets representing separate domains. In particular, the proposed method takes into account various keypoint subsets/domains by sequentially adding and removing keypoints. Contextual knowledge is transferred between two separate domains via domain transfer. Experiments to demonstrate the efficacy of the proposed KPTransfer approach were performed for the task of human pose estimation on the MPII dataset, with comparisons against random initialization and frozen weight extraction configurations. Experimental results demonstrate the efficacy of performing domain transfer between two different joint subsets resulting in a PCKh improvement of up to 1.1 over random initialization on joints such as wrists and knee in certain joint splits with an overall PCKh improvement of 0.5. Domain transfer from a different set of joints not only results in improved accuracy but also results in faster convergence because of mutual co-adaptations of weights resulting from the contextual knowledge of the pose from a different set of joints.

Kanav Vats, Helmut Neher, Alexander Wong, David A. Clausi, John Zelek
Deep Learning Model for Skin Lesion Segmentation: Fully Convolutional Network

Segmentation of skin lesions is a crucial task in detecting and diagnosing melanoma cancer. Incidence of melanoma skin cancer which is the most deadly form of skin cancer has been on steady increase. Early detection of the melanoma cancer is necessary to improve the survival rate of the patients. Segmentation is an important task in analysing skin lesion images. Skin lesion segmentation has come with some challenges such as low contrast and fine grained nature of skin lesions. This has necessitated the need for automated analysis and segmentation of skin lesions using state-of-the-arts techniques. In this paper, a deep learning model has been adapted for the segmentation of skin lesions. This work demonstrates the segmentation of skin lesions using fully convolutional networks (FCNs) that train skin lesion images from end-to-end using only the images pixels and disease ground truth labels as inputs. The fully convolutional network adapted is based on U-Net architecture. The model is enhanced by employing multi-stage segmentation approach with batch normalisation and data augmentation. Performance metrics such as dice coefficient, accuracy, sensitivity and specificity were used for evaluating the performance of the model. Experimental results show that the proposed model achieved better performance compared with the other state-of-the arts methods for skin lesion image segmentation with a dice coefficient of $$90\%$$ and sensitivity of $$96\%$$ .

Adekanmi Adegun, Serestina Viriri
Deep Learning Using Bayesian Optimization for Facial Age Estimation

Age Estimation plays a significant role in many real-world applications. Age estimation is a process of determining the exact age or age group of a person depending on his biometric features. Recent research demonstrates that the deeply learned features for age estimation from large-scale data result in significant improvement of the age estimation performance for facial images. This paper propose a Convolutional Neural Network (CNN) - approach using Bayesian Optimization for facial age estimation. Bayesian Optimization is applied to minimize the classification error on the validation set for CNN model. Extensive experiments are done for evaluating Deep Learning using Bayesian Optimization (DLOB) on three datasets: MORPH, FG-NET and FERET. The results show that using Bayesian Optimization for CNN outperforms the state of the arts on FG-NET and FERET datasets with a Mean Absolute Error (MAE) of 2.88 and 1.3, and achieves comparable results compared to the most of the state-of-the-art methods on MORPH dataset with a 3.01 MAE.

Marwa Ahmed, Serestina Viriri
Female Facial Beauty Analysis Using Transfer Learning and Stacking Ensemble Model

Automatic analysis of facial beauty has become an emerging research topic in recent years and has fascinated many researchers. One of the key challenges of facial attractiveness prediction is to obtain accurate and discriminative face representation. This study provides a new framework to analyze the attractiveness of female faces using transfer learning methodology as well as stacking ensemble model. Specifically, a pre-trained Convolutional Neural Network (CNN) originally trained on relatively similar datasets for face recognition task, namely Ms-Celeb-1M and VGGFace2, is utilized to acquire high-level and robust features of female face images. This is followed by leveraging a stacking ensemble model which combines the predictions of several base models to predict the attractiveness of a face. Extensive experiments conducted on SCUT-FBP and SCUT-FBP 5500 benchmark datasets, confirm the strong robustness of the proposed approach. Interestingly, prediction correlations of 0.89 and 0.91 are achieved by our new method for SCUT-FBP and SCUT-FBP5500 datasets, respectively. This would indicate significant advantages over the other state-of-the-art work. Moreover, our successful results would certainly support the efficacy of transfer learning when applying deep learning techniques to compute facial attractiveness.

Elham Vahdati, Ching Y. Suen
Investigating the Automatic Classification of Algae Using the Spectral and Morphological Characteristics via Deep Residual Learning

Under the impact of global climate changes and human activities, harmful algae blooms (HABs) have become a growing concern due to negative impacts on water related industries, such as tourism, fishing and safe water supply. Many jurisdictions have introduced specific water quality regulations to protect public health and safety. Therefore reliable and cost effective methods of quantifying the type and concentration of algae cells has become critical for ensuring successful water management. In this work we present an innovative system to automatically classify multiple types of algae by combining standard morphological features with their multi-wavelength signals. To accomplish this we use a custom-designed microscopy imaging system which is configured to image water samples at two fluorescent wavelengths and seven absorption wavelengths using discrete-wavelength high-powered light emitting diodes (LEDs). We investigate the effectiveness of automatic classification using a deep residual convolutional neural network and achieve a classification accuracy of 96% in an experiment conducted with six different algae types. This high level of accuracy was achieved using a deep residual convolutional neural network that learns the optimal combination of spectral and morphological features. These findings illustrate the possibility of leveraging a unique fingerprint of algae cell (i.e. spectral wavelengths and morphological features) to automatically distinguish different algae types. Our work herein demonstrates that, when coupled with multi-band fluorescence microscopy, machine learning algorithms can potentially be used as a robust and cost-effective tool for identifying and enumerating algae cells.

Jason L. Deglint, Chao Jin, Alexander Wong

Medical Imaging and Analysis Using Deep Learning and Machine Intelligence

Frontmatter
A Random Field Computational Adaptive Optics Framework for Optical Coherence Microscopy

A novel random field computational adaptive optics (R-CAO) framework is proposed to jointly correct for optical aberrations and speckle noise issues in optical coherence microscopy (OCM) and thus overcome the depth-of-field limitation in OCM imaging. The performance of the R-CAO approach is validated using OCM tomograms acquired from a standard USAF target and a phantom comprised of 1 $${\upmu }$$ m diameter microspheres embedded in agar gel. The R-CAO reconstructed OCM tomograms show reduced optical aberrations and speckle noise over the entire depth of imaging compared to the existing state-of-the-art computational adaptive optics algorithms such as the regularized maximum likelihood computational adaptive optics (RML-CAO) method. The reconstructed images using the proposed R-CAO framework show the usefulness of this method for the quality enhancement of OCM imaging over different imaging depths.

Ameneh Boroomand, Bingyao Tan, Mohammad Javad Shafiee, Kostadinka Bizheva, Alexander Wong
Deep Learning Approaches for Gynaecological Ultrasound Image Segmentation: A Radio-Frequency vs B-mode Comparison

Ovarian cancer is one of the pathologies with the worst prognostic in adult women and it has a very difficult early diagnosis. Clinical evaluation of gynaecological ultrasound images is performed visually, and it is dependent on the experience of the medical doctor. Besides the dependency on the specialists, the malignancy of specific types of ovarian tumors cannot be asserted until their surgical removal. This work explores the use of ultrasound data for the segmentation of the ovary and the ovarian follicles, using two different convolutional neural networks, a fully connected residual network and a U-Net, with a binary and multi-class approach. Five different types of ultrasound data (from beam-formed radio-frequency to brightness mode) were used as input. The best performance was obtained using B-mode, for both ovary and follicles segmentation. No significant differences were found between the two convolutional neural networks. The use of the multi-class approach was beneficial as it provided the model information on the spatial relation between follicles and the ovary. This study demonstrates the suitability of combining convolutional neural networks with beam-formed radio-frequency data and with brightness mode data for segmentation of ovarian structures. Future steps involve the processing of pathological data and investigation of biomarkers of pathological ovaries.

Catarina Carvalho, Sónia Marques, Carla Peixoto, Duarte Pignatelli, Jorge Beires, Jorge Silva, Aurélio Campilho
Discovery Radiomics for Detection of Severely Atypical Melanocytic Lesions (SAML) from Skin Imaging via Deep Residual Group Convolutional Radiomic Sequencer

The incidence of severely atypical melanocytic lesions (SAML) has been increasing year after year. Early detection of SAML by skin surveillance followed by biopsy and treatment may improve survival and reduce the burden on health care systems. Discovery radiomics can be used to analyze a variety of quantitative features present in pigmented lesions that determine which lesions demonstrate enough atypical changes to pursue medical attention. This study utilizes a novel deep residual group convolutional radiomic sequencer to assess SAML. The discovery radiomic sequencer was evaluated against over 18,000 dermoscopic images of different atypical nevi to achieve a sensitivity of 90% and specificity of 83%. Furthermore, the radiomic sequences produced using the novel deep residual group convolutional radiomic sequencer are visualized and analyzed via t-SNE analysis.

Helmut Neher, John Arlette, Alexander Wong
Identifying Diagnostically Complex Cases Through Ensemble Learning

Computer-Aided Diagnosis systems have been used as second readers in the medical imaging diagnostic process. In this study, we aim to identify cases that are hard to diagnose and lead to interpretation variability among medical experts. We propose a combination of image features and advanced machine learning classifiers to predict the degree of malignancy and determine the level of diagnostic difficulty by looking where these classifiers collectively fail. Using the NIH/NCI Lung Image Database Consortium (LIDC) dataset and four ensemble learning algorithms (bagging, random forest, AdaBoost, and a heterogeneous ensemble with decision trees, support vector machines, and k-nearest neighbors), our results show that we can not only detect difficult cases, but we are also able to identify what imaging characteristics or features make these cases hard to diagnostically interpret.

Yan Yu, Yiyang Wang, Jacob Furst, Daniela Raicu
tCheXNet: Detecting Pneumothorax on Chest X-Ray Images Using Deep Transfer Learning

Pneumothorax (collapsed lung or dropped lung) is an urgent situation and can be life-threatening. It is mostly diagnosed by chest X-ray images. Detecting Pneumothorax on chest X-ray images is challenging, as it requires the expertise of radiologists. Such expertise is time-consuming and expensive to obtain. The recent release of big medical image datasets with labels enabled the Deep Neural Network to be trained to detect diseases autonomously. As the trend moves on, it is expected to foresee more and more medical image big dataset will appear. However, the major limitation is that these datasets have different labels and settings. The know-how to transfer the knowledge learnt from one Deep Neural Network to another, i.e. Deep Transfer Learning, is becoming more and more important. In this study, we explored the use of Deep Transfer Learning to detect Pneumothorax from chest X-ray images. We proposed a model architecture tCheXNet, a Deep Neural Network with 122 layers. Other than training from scratch, we used a training strategy to transfer knowledge learnt in CheXNet to tCheXNet. In our experiments, tCheXNet achieved 10% better in ROC comparing to CheXNet on a testing set which is verified by three board-certified radiologists, in which the training time was only 10 epochs. The source code is available in https://github.com/antoniosehk/tCheXNet .

Antonio Sze-To, Zihe Wang
Improving Lesion Segmentation for Diabetic Retinopathy Using Adversarial Learning

Diabetic Retinopathy (DR) is a leading cause of blindness in working age adults. DR lesions can be challenging to identify in fundus images, and automatic DR detection systems can offer strong clinical value. Of the publicly available labeled datasets for DR, the Indian Diabetic Retinopathy Image Dataset (IDRiD) presents retinal fundus images with pixel-level annotations of four distinct lesions: microaneurysms, hemorrhages, soft exudates and hard exudates. We utilize the HEDNet edge detector to solve a semantic segmentation task on this dataset, and then propose an end-to-end system for pixel-level segmentation of DR lesions by incorporating HEDNet into a Conditional Generative Adversarial Network (cGAN). We design a loss function that adds adversarial loss to segmentation loss. Our experiments show that the addition of the adversarial loss improves the lesion segmentation performance over the baseline.

Qiqi Xiao, Jiaxu Zou, Muqiao Yang, Alex Gaudio, Kris Kitani, Asim Smailagic, Pedro Costa, Min Xu
Context Aware Lung Cancer Annotation in Whole Slide Images Using Fully Convolutional Neural Networks

We propose a novel machine learning based methodology for detection and annotation of areas in Whole Slide lung Images (WSI) that are affected by lung cancer. Contrary to the trend of processing WSIs in small overlapping patches to generate a heat-map, we use a much larger patch with no overlap, aiming at capturing more of the context in each patch. As these larger patches are less likely to completely fall into one of the cancer/co-cancer classes, we use a pixel-level image segmentation approach consisting of a custom Fully Convolutional Neural Networks (FCNN). As opposed to the trend of using very deep neural networks, we carefully design a small FCNN, while avoiding the trainable upsampling layers, in order to cope with small training data and inaccurate region-based labeling of WSIs. We show that such an efficient architecture achieves better accuracy compared to the heat-map based approach. Apart from the descent results of our small network, this study shows that FCNNs are capable of learning region-based human labeling of biomedical images that sometimes does not correspond to a texture or a bounded object as a whole, but is more like drawing a line around a region containing a scattered number of small malignant tissues.

Vahid Khanagha, Sanaz Aliari Kardehdeh
Optimized Deep Learning Architecture for the Diagnosis of Pneumonia Through Chest X-Rays

One of the most common exams done in hospitals is the chest radiograph. From results of this exam, many illnesses can be diagnosed such as Pneumonia, which is deadliest illness for children. The main objective of this work is to propose a convolutional neural network model that performs the diagnosis of pneumonia through chest radiographs. The model’s proposed architecture is automatically generated through optimization of hyperparameters. Generated models were trained and validated with an image base of chest radiographs presenting cases of viral and bacterial pneumonia. The best architecture found resulted in an accuracy of 95.3% and an AUC of 94% for diagnosing pneumonia, while the best architecture for the classification of type of pneumonia attained an accuracy of 83.1% and AUC of 80%.

Gabriel Garcez Barros Sousa, Vandécia Rejane Monteiro Fernandes, Anselmo Cardoso de Paiva
Learned Pre-processing for Automatic Diabetic Retinopathy Detection on Eye Fundus Images

Diabetic Retinopathy is the leading cause of blindness in the working-age population of the world. The main aim of this paper is to improve the accuracy of Diabetic Retinopathy detection by implementing a shadow removal and color correction step as a preprocessing stage from eye fundus images. For this, we rely on recent findings indicating that application of image dehazing on the inverted intensity domain amounts to illumination compensation. Inspired by this work, we propose a Shadow Removal Layer that allows us to learn the pre-processing function for a particular task. We show that learning the pre-processing function improves the performance of the network on the Diabetic Retinopathy detection task.

Asim Smailagic, Anupma Sharan, Pedro Costa, Adrian Galdran, Alex Gaudio, Aurélio Campilho
TriResNet: A Deep Triple-Stream Residual Network for Histopathology Grading

While microscopic analysis of histopathological slides is generally considered as the gold standard method for performing cancer diagnosis and grading, the current method for analysis is extremely time consuming and labour intensive as it requires pathologists to visually inspect tissue samples in a detailed fashion for the presence of cancer. As such, there has been significant recent interest in computer aided diagnosis systems for analysing histopathological slides for cancer grading to aid pathologists to perform cancer diagnosis and grading in a more efficient, accurate, and consistent manner. In this work, we investigate and explore a deep triple-stream residual network (TriResNet) architecture for the purpose of tile-level histopathology grading, which is the critical first step to computer-aided whole-slide histopathology grading. In particular, the design mentality behind the proposed TriResNet network architecture is to facilitate for the learning of a more diverse set of quantitative features to better characterize the complex tissue characteristics found in histopathology samples. Experimental results on two widely-used computer-aided histopathology benchmark datasets (CAMELYON16 dataset and Invasive Ductal Carcinoma (IDC) dataset) demonstrated that the proposed TriResNet network architecture was able to achieve noticeably improved accuracies when compared with two other state-of-the-art deep convolutional neural network architectures for histopathology grading. Based on these promising results, the hope is that the proposed TriResNet network architecture could become a useful tool to aiding pathologists increase the consistency, speed, and accuracy of the histopathology grading process.

Rene Bidart, Alexander Wong
BEM-RCNN Segmentation Based on the Inadequately Labeled Moving Mesenchymal Stem Cells

This paper addresses the challenging task of moving mesenchymal stem cell segmentation in digital time-lapse microscopy sequences. A convolutional neural network (CNN) based pipeline is developed to segment cells automatically. To accommodate the data in its unique nature, an efficient binarization enhancement policy is proposed to increase the tracing performance. Furthermore, to work with datasets with inadequate and inaccurate ground truth, a compensation algorithm is developed to enrich the annotation automatically, and thus ensure the training quality of the model. Experiments show that our model surpassed the state-of-the-art. Result of our model measured by SEG score is 0.818.

Jingxiong Li, Yaqi Wang, Qianni Zhang

Image Analysis and Recognition for Automotive Industry

Frontmatter
Inceptive Event Time-Surfaces for Object Classification Using Neuromorphic Cameras

This paper presents a novel fusion of low-level approaches for dimensionality reduction into an effective approach for high-level objects in neuromorphic camera data called Inceptive Event Time-Surfaces (IETS). IETSs overcome several limitations of conventional time-surfaces by increasing robustness to noise, promoting spatial consistency, and improving the temporal localization of (moving) edges. Combining IETS with transfer learning improves state-of-the-art performance on the challenging problem of object classification utilizing event camera data.

R. Wes Baldwin, Mohammed Almatrafi, Jason R. Kaufman, Vijayan Asari, Keigo Hirakawa
An End-to-End Deep Learning Based Gesture Recognizer for Vehicle Self Parking System

Hand gesture recognition have become versatile in numerous applications. In particular, the automotive industry has benefited from their deployment, and human-machine interface designers are using them to improve driver safety and comfort. In this paper, we investigate expanding the product segment of one of America’s top three automakers through deep learning to provide an increased driver convenience and comfort with the application of dynamic hand gesture recognition for vehicle self parking. We adapt the architecture of the end-to-end solution to expand the state of the art video classifier from a single image as input (fed by monocular camera) to a multiview 360 feed, offered by a six cameras module. Finally, we optimize the proposed solution to work on a limited resource embedded platform that is used by automakers for vehicle-based features, without sacrificing the accuracy robustness and real time functionality of the system.

Hassene Ben Amara, Fakhri Karray
Thermal Image SuperResolution Through Deep Convolutional Neural Network

Due to the lack of thermal image datasets, a new dataset has been acquired for proposed a super-resolution approach using a Deep Convolution Neural Network schema. In order to achieve this image enhancement process, a new thermal images dataset is used. Different experiments have been carried out, firstly, the proposed architecture has been trained using only images of the visible spectrum, and later it has been trained with images of the thermal spectrum, the results showed that with the network trained with thermal images, better results are obtained in the process of enhancing the images, maintaining the image details and perspective. The thermal dataset is available at http://www.cidis.espol.edu.ec/es/dataset .

Rafael E. Rivadeneira, Patricia L. Suárez, Angel D. Sappa, Boris X. Vintimilla

Adaptive Methods for Ultrasound Beamforming and Motion Estimation

Frontmatter
Compensated Row-Column Ultrasound Imaging Systems with Data-Driven Point Spread Function Learning

Ultrasound imaging systems are invaluable tools used in applications ranging from medical diagnostics to non-destructive testing. The concept of row-column imaging using row-column-addressed arrays has received a lot of attention recently for 3-D ultrasound imaging. However, it suffers from a few intrinsic limitations: data sparsity, speckle noise, and a spatially varying point spread function. These limitations cannot be addressed by transducer design alone. In this research, we propose PL-UIS, a compensated ultrasound imaging system that combines physical modeling with data-driven spatially varying point spread function learning within a random field framework to address the limitations of row-column ultrasound imaging. Experimental results using the proposed ultrasound imaging system show the effectiveness of our proposed PL-UIS system compared to state-of-the-art compensated ultrasound imaging systems.

Ibrahim Ben Daya, John T. W. Yeow, Alexander Wong
Channel Count Reduction for Plane Wave Ultrasound Through Convolutional Neural Network Interpolation

Plane wave ultrasound imaging has helped to achieve high frame rate ultrasound, however the data required to achieve frames rates over 1000 fps remains challenging to handle, as the transfer of large amounts of data represents a bottleneck for image reconstruction. This paper presents a novel method of using a fully convolutional encoder-decoder deep neural network to interpolate pre-beamformed raw RF data from ultrasound transducer elements. The network is trained on in vivo human carotid data, then tested on both carotid data and a standard ultrasound phantom. The neural network outputs are compared to linear interpolation and the proposed method captures more meaningful patterns in the signal; the output channels are then combined with the non-interpolated channels and beamformed to form an image, showing not only significant improvement in mean-squared error compared to the alternatives, but also 10–15 dB reduction in grating lobe artifacts. The proposed method has implications for current ultrasound research directions, with applications to real-time high frame rate ultrasound and 3D ultrasound imaging.

Di Xiao, Billy Y. S. Yiu, Adrian J. Y. Chee, Alfred C. H. Yu
Segmentation of Aliasing Artefacts in Ultrasound Color Flow Imaging Using Convolutional Neural Networks

Color flow imaging is a biomedical ultrasound modality used to visualize blood flow dynamics in the blood vessels, which are correlated with cardiovascular function and pathology. This is however done through a pulsed echo sensing mechanism and thus flow measurements can be corrupted by aliasing artefacts, hindering its application. While various methods have attempted to address these artefacts, there is still demand for a robust and flexible solution, particularly at the stage of identifying the aliased regions in the imaging view. In this paper, we investigate the application of convolutional neural networks to segment aliased regions in color flow images due to their strength in translation-invariant learning of complex features. Relevant ultrasound features including phase shifts, speckle images and optical flow were generated from ultrasound data obtained from anthropomorphic flow models. The investigated neural networks all showed strong performance in terms of precision, recall and intersection over union while revealing the important ultrasound features that improved detection. This study paves the way for sophisticated dealiasing algorithms in color flow imaging.

Hassan Nahas, Takuro Ishii, Adrian Chee, Billy Yiu, Alfred Yu
Automatic Frame Selection Using MLP Neural Network in Ultrasound Elastography

Ultrasound elastography estimates the mechanical properties of the tissue from two Radio-Frequency (RF) frames collected before and after tissue deformation due to an external or internal force. This work focuses on strain imaging in quasi-static elastography, where the tissue undergoes slow deformations and strain images are estimated as a surrogate for elasticity modulus. The quality of the strain image depends heavily on the underlying deformation, and even the best strain estimation algorithms cannot estimate a good strain image if the underlying deformation is not suitable. Herein, we introduce a new method for tracking the RF frames and selecting automatically the best possible pair. We achieve this by decomposing the axial displacement image into a linear combination of principal components (which are calculated offline) multiplied by their corresponding weights. We then use the calculated weights as the input feature vector to a multi-layer perceptron (MLP) classifier. The output is a binary decision, either 1 which refers to good frames, or 0 which refers to bad frames. Our MLP model is trained on in-vivo dataset and tested on different datasets of both in-vivo and phantom data. Results show that by using our technique, we would be able to achieve higher quality strain images compared to the traditional methods of picking up pairs that are 1, 2 or 3 frames apart. The training phase of our algorithm is computationally expensive and takes few hours, but it is only done once. The testing phase chooses the optimal pair of frames in only 1.9 ms.

Abdelrahman Zayed, Hassan Rivaz
Auto SVD Clutter Filtering for US Doppler Imaging Using 3D Clustering Algorithm

Blood flow visualization is a challenging task in the presence of tissue motion. Conventional clutter filtering techniques perform poorly since blood and tissue clutter echoes share similar spectral characteristics. Thus, unsuppressed tissue clutter produces flashing artefacts in ultrasound color flow images. Eigen-based filtering was recently introduced and has shown good clutter rejection performance; however, there is yet no standard approach to robustly determine the eigen components corresponding to tissue clutter. To address this issue, we propose a novel 3D clustering based singular value decomposition (SVD) clutter filtering method. The proposed technique makes use of three key spatiotemporal statistics: singular value magnitude, spatial correlation and the mean Doppler frequency of singular vectors to adaptively determine the clutter and noise clusters and their corresponding eigen rank to achieve maximal clutter and noise suppression. To test the clutter rejection performance of the proposed filter, high frame rate plane wave data was acquired in-vivo from a subject’s common carotid artery and jugular vein region induced with extrinsic tissue motion (voluntary probe motion). The flow detection efficacy of the clustering based SVD filter was statistically evaluated and compared with current eigen rank estimation methods using the receiver operating characteristic (ROC) analysis. Results show that the clustering based SVD filter yielded the highest area under the ROC curve (0.9082) in comparison with other eigen rank estimation methods, signifying its improved flow detection capability.

Saad Ahmed Waraich, Adrian Chee, Di Xiao, Billy Y. S. Yiu, Alfred Yu
Backmatter
Metadaten
Titel
Image Analysis and Recognition
herausgegeben von
Fakhri Karray
Aurélio Campilho
Alfred Yu
Copyright-Jahr
2019
Electronic ISBN
978-3-030-27272-2
Print ISBN
978-3-030-27271-5
DOI
https://doi.org/10.1007/978-3-030-27272-2

Premium Partner