Detection of masses in mammograms with adaption to breast density using genetic algorithm, phylogenetic trees, LBP and SVM
Introduction
Breast cancer is the most frequent type of cancer among the female population. It is also the type of cancer which most kills women (Parkin, Bray, Ferlay, & Pisani, 2005). The early diagnosis of this disease is the main form of fighting it.
A mammogram is a radiography of the breast, which allows the early detection of cancer, since it is capable of displaying lesions at their initial stage, having sizes in the range of millimeters. It is made through an appropriate X-ray device, called mammographer. The precision of the mammogram depends on several factors, such as size and location of the lesion, the density of the breast tissue and on the quality of the technical resources used. Besides, the task of carefully interpreting a large number of cases demands time and an elevated degree of attention from the specialist physician.
According to Norman, et al. (Boyd et al., 2007), one of the factors that hinder the detection of masses by the specialists is the type of density of the breast, which can be dense (fibrous) or non-dense (fat). The fat tissue appears as a dark region in a mammogram. On the other hand, fibrous structures (including masses) appear as clearer regions of the mammogram. Due to these characteristics, it is more difficult for a radiologist to find lesions in dense breasts.
All these factors have motivated much research over the last decades, aiming at the development of computational systems to help the specialist physician in the task of interpreting radiological images. These Computer-Aided Detection (CAD)/Diagnosis (CADx) systems have gained more and more space in modern medicine, serving as a second information source for specialists and increasing the rates of correct detections in the identification of serious diseases, such as breast cancer (Fenton et al., 2007). However, most studies seen in the literature use the same techniques and configurations both for dense and non-dense masses, whereas those techniques could be more appropriate for a specific kind of density.
The efficiency of CAD systems depend on the image processing techniques. The available literature brings acknowledged works that deal with the same problem approached by the methodology proposed herein, that is, the development of computational methods for aiding the specialist in the detection of lesions in mammograms.
The present bibliography contains a wide variety and combination of techniques intended to detect masses in mammograms. They normally employ techniques for describing the geometry and the texture of suspect regions.
For the geometric analysis, descriptors are used to find a shape pattern capable of representing and differentiating between masses and non-masses. Some examples of geometric descriptors are: area, perimeter, circularity, shape factor (Dong et al., 2015), eccentricity, circular density, circular disproportion and density (Sampaio, Moraes Diniz, Corrêa Silva, Cardoso de Paiva, & Gattass, 2011) active contour (Liu, Xu, Liu, & Feng, 2011), template matching (Nunes, Silva, & de Paiva, 2009), contourlet (Moayedi, Azimifar, Boostani, & Katebi, 2010), generalized moment patterns (Deepak, Medathati, & Sivaswamy, 2012), density-weighted contrast enhancement (Petrick, Chan, Sahiner, & Wei, 1996), etc.
The vast majority of studies use some kind of texture analysis. These analysis usually employs statistical and geostatistical descriptors, diversity and richness-of-species indexes, with the objective of finding relations between the distribution of pixels belonging to masses and non-masses. Some examples of texture descriptors include: Local Binary Patterns (LBP) (Berbar, Reyad, & Hussain, 2012), Complete Local Binary Patterns (Liu et al., 2011), statistical fusion (Bajger, Ma, Williams, & Bottema, 2010), Gray Level Co-occurrence Matrix (GLCM) (Abdalla, Dress, Zaki, 2011, Anitha, Peter, 2015, Tai, Chen, Tsai, 2014), first-order and second-order statistical functions (Berbar, Reyad, Hussain, 2012, Jen, Yu, 2015), Gabor filters, image phase analysis, angular analysis of energy propagation, fractal analysis, Laws texture and Haralick descriptors (Banik, Rangayyan, & Desautels, 2011), gray-scale invariant ranklet texture (Masotti, Lanconelli, & Campanini, 2009), optical density transformation(Tai et al., 2014), pyramid decomposition(Lin, Chang, Yeh, Liu, & Yeh, 2014). Other descriptors are obtained by the Fourier and Wavelet transforms(Agrawal, Vatsa, Singh, 2014, Kuo, Lin, Hsu, Cheng, 2014, Lin, Chang, Yeh, Liu, Yeh, 2014), Phylogenetic Trees(Oliveira, Carvalho Filho, Silva, de Paiva, & Gattass, 2015), Vector Field Convolution (Dong et al., 2015).
The texture and geometry descriptors are, in general, used together with some machine learning technique, which will tell if the Region of Interest (ROI) analyzed belongs to a mass or to a non-mass. Some examples include the Support Vector Machine (SVM) (Agrawal, Vatsa, Singh, 2014, Berbar, Reyad, Hussain, 2012, Dong, Lu, Ma, Guo, Ma, Wang, 2015, Liu, Xu, Liu, Feng, 2011, Oliveira, Carvalho Filho, Silva, de Paiva, Gattass, 2015, Sampaio, Moraes Diniz, Corrêa Silva, Cardoso de Paiva, Gattass, 2011), Linear Discriminant Analysis (LDA) (Abdalla, Dress, Zaki, 2011, Bajger, Ma, Williams, Bottema, 2010), Artificial Neural Networks (ANN) (Abdalla, Dress, Zaki, 2011, Banik, Rangayyan, Desautels, 2011, Lin, Chang, Yeh, Liu, Yeh, 2014) and k-Nearest Neighbors (KNN) (Berbar et al., 2012), Stepwise Discriminant Analysis(Tai et al., 2014), Particle Swarm Optimization (PSO)Kuo et al. (2014).
Great efforts were joined to create image databases for open use by the scientific community. Among these databases, we highlight the Digital Database for Screening Mammography DDSM (Abdalla, Dress, Zaki, 2011, Bajger, Ma, Williams, Bottema, 2010, Berbar, Reyad, Hussain, 2012, Dong, Lu, Ma, Guo, Ma, Wang, 2015, Jen, Yu, 2015, Liu, Xu, Liu, Feng, 2011, Oliveira, Carvalho Filho, Silva, de Paiva, Gattass, 2015, Sampaio, Moraes Diniz, Corrêa Silva, Cardoso de Paiva, Gattass, 2011, Tai, Chen, Tsai, 2014), and the Mammographic Image Analysis Society (MIAS) (Agrawal, Vatsa, Singh, 2014, Anitha, Peter, 2015, Deepak, Medathati, Sivaswamy, 2012, Dong, Lu, Ma, Guo, Ma, Wang, 2015, Kuo, Lin, Hsu, Cheng, 2014, Moayedi, Azimifar, Boostani, Katebi, 2010). Some studies use private databases with the same purpose (Bajger, Ma, Williams, Bottema, 2010, Banik, Rangayyan, Desautels, 2011, Jen, Yu, 2015, Lin, Chang, Yeh, Liu, Yeh, 2014).
Analyzing the studies numbered above, we notice that most of them use the MIAS and DDSM databases. Nevertheless, none of them tries to use a methodology adapted to the density of the breast.
The cited works, in general, use supervised learning techniques to perform the classificatoin of the ROIs. The SVM classifier, which got much attention, was also used in this work.
This work used two stages (segmentation and reduction of false positives using DBSCAN), which use information of the from the image under analysis, dismissing the use of a specific knowledge base for each stage.
The work by Oliveira et al. (2015) also extracts features using phylogenetic trees. However, it only takes into account the frequency of the gray levels of the ROIs under analysis. In our work, the phylogenetic trees use the spatial relation among gray levels, as well as their occurrences, thus performing both a local and a global analysis. We must stress that the work by Oliveira et al. (2015) did not apply the stages of segmentation and reduction of false positives, and for this reason had a superior performance, since classification errors in these stages were not computed.
Opposite the present work, none of the cited work use a methodology in an attempt to optimize their training models.
Table 17 summarizes the related works presented in this section.
This work presents a CAD methodology for helping the specialist physician in the task of detecting masses in mammographic images. The technique is adapted to the density of the breast. The first stage of the methodology removes artifacts outside the breast and makes the reduction of noise. The next stage classifies the density of the breast as dense or non-dense. The segmentation chooses the regions of the image that probably contain masses by means of a micro-genetic algorithm(μGA). A first reduction of false positives (RFP) uses the Density Based Spatial Clustering of Applications with Noise (DBSCAN) and a proximity ranking of the textures extracted from the regions of interest (ROI). In the second RFP, the ROIs have their textures analyzed by the combination of Phylogenetic Trees, Local Binary Patterns and Support Vector Machines (SVM).
The next sections can be briefly described as follows. Section 2 presents the bibliographic review, containing the theoretical basis necessary for understanding the methodology. In Section 3 we present the five stages that compose the present work: image acquisition, density detection, pre-processing, segmentation of the regions of interest and reduction of false positives. In Section 4, we present and discuss the results. We also present case studies of the application of the proposed methodology. Section 5 concludes the work, presenting the main contributions and the efficiency of the methods used.
Section snippets
Theoretical basis
This section brings a quick review on Phylogenetic Trees, Local Binary Patterns (LBP), taxonomic diversity and distinction indexes and Micro-genetic Algorithm.
Proposed methodology
The rest of this section describes each one of these stages in detail.
Results and discussion
This section presents and discusses the final results, generated in the test stage and achieved according to the methodology described in Section 3.
Conclusion
This work presents a methodology for detection of masses in mammography by means of image processing techniques, pattern recognition and evolutionary algorithms which automatically adapt the density of each breast under analysis.
The proposed methodology uses images from DDSM, a public image database. However, other databases can be used, which will require only the adjustment of some parameters needed by the stages of the methodology. Besides the images, extra information is necessary, such as
Acknowledgments
The authors acknowledge CAPES, CNPq and FAPEMA for financial support.
References (34)
- et al.
Saliency based mass detection from screening mammograms
Signal Processing
(2014) - et al.
An efficient approach for automated mass segmentation and classification in mammograms
Journal of digital imaging
(2015) - et al.
Classification of breast tissues using moran’s index and geary’s coefficient as texture signatures and svm
Computers in biology and medicine
(2009) - et al.
Detection of masses in mammographic images using simpson’s diversity index in circular regions and svm
Proceedings of the Machine learning and data mining in pattern recognition
(2009) - et al.
Masses detection in digital mammogram by gray level reduction using texture coding method
jip
(2011) - et al.
Face description with local binary patterns: Application to face recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2006) - et al.
Detection of masses in digital mammogram using second order statistics and artificial neural network
International Journal of Computer Science & Information Technology (IJCSIT)
(2011) - et al.
Mammogram segmentation using maximal cell strength updation in cellular automata
Medical & biological engineering & computing
(2015) - et al.
Mammographic mass detection with statistical region merging
2010 international conference on digital image computing: Techniques and applications
(2010) - et al.
Detection of architectural distortion in prior mammograms
IEEE Transactions on Medical Imaging
(2011)