Brought to you by:
Letter

Classification of human stomach cancer using morphological feature analysis from optical coherence tomography images

, , , , and

Published 14 August 2019 © 2019 Astro Ltd
, , Citation Site Luo et al 2019 Laser Phys. Lett. 16 095602 DOI 10.1088/1612-202X/ab3638

1612-202X/16/9/095602

Abstract

Optical coherence tomography is radiation-free, and it is considered a tool of optical biopsy. Classification of normal and cancerous tissues is very important for the guidance of surgeons. Here, we develop the morphological feature analysis-based classification (MFAC) method, combining it with machine learning to identify cancerous tissues. We extract five quantitative morphological features from one OCT image through the structured analysis. Five classifiers are involved to make a classification: the support vector machine, the K-nearest neighbor, the random forest, logic regression, and the conventional threshold method. Sensitivity, specificity, and accuracy are used to evaluate these classifiers and are compared with each other. We launched the experimental research of the imaging of ex vivo patients' stomach cancerous tissue with the OCT system. The results showed the three additional features specially designed for stomach cancer are remarkably better than the traditional image feature. The best feature demonstrated over 95% accuracy under all five classifiers. The designed feature based on the layer structure of the stomach tissue is significantly effective. This MFAC method will be used to image the in vivo tissue in clinical applications in the future.

Export citation and abstract BibTeX RIS

1. Introduction

Optical coherence tomography (OCT) is a high-resolution (1–10 µm) and real-time cross-sectional imaging technique that has shown promise as a radiation-free diagnostic tool after developments over recent decades. For instance, spectral-domain OCT and swept-source OCT (SS-OCT) have been proposed with improved speed and sensitivity [15]. These improvements of OCT systems have facilitated diversity and increased the application of clinical research, including skin cancer [6, 7], skin burn, brain cancer [810], oral cavities [11], cervix [12], breast cancer [13, 14] and gastrointestinal (GI) tract [15, 16].

In 2015 alone, there were over one million new cases of stomach cancer in China, resulting in over 70 000 deaths [17]. Typical imaging techniques used to diagnose stomach cancer include ultrasound [18], MRI [19], and CT [20]. Recently, some novel methods, such as blue laser [21], fluorescence [22], and multi-photon imaging [23], have also been demonstrated to diagnose gastric cancer. Although a large amount of experiments have been performed on human GI tracts with OCT, there are just few studies utilizing OCT to study gastric cancer [2426]. Authors report the use of OCT to detect the permeability coefficient of a hyperosmotic agent in gastric cancer [27], and Osiac et al use a Doppler OCT to image gastric tumor tissue [28]. Furthermore, OCT-based identification of tumorous tissues can be applied to optical theranostic systems for precision surgery [29].

The textural quality of cancerous tissue has been an important analysis criterion to diagnose cancer. Morphological heterogeneity of OCT images involves a statistical method to mark the variance of intensity [30], and this method is extensively used to diagnose dozens of diseases. Automated analysis of ophthalmologic pathologies with OCT imaging is a developed method since ophthalmologic exams have been a conventional test for more than a decade, and involves segmentation, quantification, and separated layers [3135]. Furthermore, the OCT textural analysis and image classification are used in the processing of OCT images [3639]. Similar processing has been to illuminate samples for classifying skin dysplasia [40, 41], ovarian tissue lesions [42, 43], esophageal cancer [44] and coronary arteries [45]. Some machine learning-based algorithms have been applied into the identification of cancerous tissue. These classifiers are frequently-used [3033, 43]; the classified effect mainly depends on the quantitative morphological features.

The feasibility of applying deep learning in OCT image analysis had also been studied over the past few years. Most of these reports are about ophthalmology research because the diagnoses of ophthalmological disease with OCT imaging have been developed over decades by collecting a large number of OCT images [4648]. It proves how much training data and test data are required for the deep learning model. However, as compared with ophthalmological OCT, the stomach tumorous OCT is in the feasible research phase. Precision identification of the tumorous tissue is needed based on the morphological structure to assist surgeons in removing the tumorous tissue.

To address this issue, we explore the morphological feature analysis-based classification (MFAC) method to identify the stomach tumorous tissue. The B-scan image pre-processing incorporates itself with image enhancement and de-noising algorithms [49]. Flattening the surface of the OCT image is necessary to normalize the tissue edge against the x-axis in each B-scan. More importantly, the top portion of each image is due to the air and the information is meaningless. The region of interest (ROI) extends only to a depth of 1.2 mm in each image because any deeper information is meaningless. Additionally, normal stomach tissue is a regular structure and each layer is homogeneous. Cancerous stomach tissue is heterogeneous with alternating high and low backscattering crypts, which destroy the regular characteristic. Hence, we design three specific quantitative features to describe a stomach OCT image according to the characteristic of the stomach tissue. Five typical classifiers are used to classify cancerous stomach tissue using support vector machine (SVM), K-nearest neighbor (KNN), random forest (RF), logic regression (LR) and threshold classification. The sensitivity, specificity, and accuracy are used to evaluate the MFAC method and to compare them with each other. The ROC curve and AUC of threshold classification are also calculated to estimate its overall performance.

2. Methods and materials

2.1. SS-OCT system

The light source is a MEMS-based wavelength swept source (HSL-20-100-B, of Santec Technologies, center wavelength: 1310 nm; main scan rate: 100 kHz; output power: 20 mW; tuning rang: 91.5 nm; coherence length: 13 mm). The interferometer system, including a balanced photodetector, is from Thorlabs (INT-MSI-1300B). The signal of the balanced photodetector is collected by a data acquisition device (ATS9350, Alarztec technologies, sampling rate: 500 M s−1) and the fast Fourier transform is used to reconstruct the depth information of the tomographic images. This system has been described in our previous works [5052] (figure 1).

Figure 1.

Figure 1. The schematic diagram of the swept source OCT system for the diseased stomach tissue.

Standard image High-resolution image

Each wavelength scanning cycle is an A-scan, providing only the image of one point position and the depth information. B-scans, comprising 512 A-scans in line, can image a section of each sample. Its image appears as a 2D tomographic image in the depth direction of one scanning line. Using multiple B-scans, scanning in repetition at the same region can reduce the random noise and enhance the signal-to-noise ratio in the average calculation, improving the image quality significantly overall. The axial resolution is approximately 12 µm in the air, the lateral resolution is approximately 22 µm, and the imaging depth is approximately 3 mm. The 2D scanning mirrors (X-Y plane) change the imaging angle, the X-axis mirror controls the B-scan, and the Y-axis mirror controls the en face scan.

2.2. Sample preparation and data collection

All tissue samples were collected at the South China Hospital and South China Medical University (Guangzhou, China). Cancerous tissues were removed from patients during surgery. These tissues were tested with the OCT system immediately afterwards to ensure the integrity of the samples. We obtained 1539 OCT images of normal stomach tissue samples from six patients and 1567 OCT images of cancerous stomach tissue samples from eight patients. The scan range is approximately 5 mm  ×  5 mm and the imaging procedure is shown in figure 2. A photo of how OCT scanning is performed is shown in figure 2(a). An OCT B-scan image is shown in figure 2(b). A microscopic image of the stomach tissue under inspection is shown in figure 2(c) with the OCT imaging field marked in the box. The reconstructed en face image of the sample's surface using 3D volumetric data is shown in figure 2(d). The B-scan was along the long edge of the sample.

Figure 2.

Figure 2. (a) The sample of cancerous stomach tissue being imaged by the scan probe; (b) the single B-scan image; (c) a photo of the imaging field; (d) a 3D reconstructed surface image (en face mapping) using OCT images; (e) and (g) the histology of normal tissue and cancerous tissue; (f) and (h) the OCT images of normal tissue and cancerous tissue.

Standard image High-resolution image

After collecting OCT images with the SS-OCT system, the samples were prepared for standard pathological analysis. Biopsy samples were processed with the following procedures. Samples were fixed with 4% formalin solution, embedded in paraffin, sliced, and finally stained with hematoxylin and eosin. The histological sections are shown in figures 2(e) and (g). Sample preparation procedure led to sample tissues becoming dehydrated and deformed, which resulted in a relative distortion between the OCT images and histological images. A fresh strip of stomach sample was easy to twist and crimp. In order to keep samples flat when fixed and embedded, the samples were flattened on a small cover glass, inserted with a fine needle along the length of the samples to keep samples from distorting. The sample and whole cover glass could be immersed in formalin solution without shape distortions and provided a better surface during OCT scanning.

2.3. Feature extraction of stomach OCT images

The first feature is the standard deviation at the 40th-pixel-depth line. The second feature is the standard deviation of intensity line of 0.25  ×  20th  +  0.5  ×  40th  +  0.25  ×  60th, and the third feature is the standard deviation of all 100 contour depth lines. The three additional quantitative morphological features represent the synthetical characteristics of the layered structures.

The OCT images of stomach normal tissue and cancerous tissue appeared remarkably different. As shown in figures 3(a) and 4(a), the OCT patterns of normal tissue are characterized by visibly smoother and homogeneous intensity distribution, while the cancerous tissue has a highly degenerated, fragmented, and heterogeneous texture. The low intensity feature spanning the top of these images was due to the impact of air, but there was some pattern noise in this area. The low intensity region at the bottom was a result from that limitation of the technique when penetrating into tissues. Therefore, there was little information in this range. The surface of the OCT image was not flat because the soft sample had a different thickness. We detected the surface and marked the boundary line as a white line. The range of 100 pixels (1.2 mm) in the depth direction was considered as ROI, which was the area between the two white lines, as shown in figures 3(b) and 4(b). The area of ROI could be flattened by eliminating the air area. The surface-flattened images are shown in figures 3(c) and 4(c).

Figure 3.

Figure 3. The image processing of the normal OCT images. (a) The original OCT image; (b) the image of preprocessing; (c) ROI image; (d) the image of local standard deviation. (Scalar bar: ~1 mm.)

Standard image High-resolution image
Figure 4.

Figure 4. The image processing of the cancerous OCT images. (a) The original OCT image; (b) the pre-processing image; (c) ROI image; (d) the local standard deviation. (Scalar bar: ~1 mm.)

Standard image High-resolution image

According to the visual inspection, there was a significant change in texture observed between the OCT images for the normal and cancerous samples. In conventional image processing, the mean, the variance (σ2, equation (1)), the third moment, and the entropy were used as parameters to describe the images' texture [43, 53]. The variance or standard deviation was the most significant criterion. We used the standard deviation (σ) of the whole ROI as a quantitative feature, which was the square root of the variance for the region [43] This parameter was named S1, which was the first quantitative feature.

Equation (1)

where N is the pixels' amount, ${{x}_{i}}$ is the intensity of the ith pixel. The approach to using the standard deviation of the whole image was a global mean, but it cannot evaluate the textural information of a specific region. Local standard deviation for each pixel was used to express the degree of variation in a neighborhood of 11  ×  11 pixels. The value of each pixel's local standard deviation is shown as an image, as shown in figures 3(d) and 4(d). The local standard deviation was considered as the second quantitative feature named S2. The local standard deviation was calculated using the MATLAB function, 'stdfilt', the window size is 15 after many attempts. The texture variation of cancerous tissue was much more than that of normal tissue, so that the S1 and S2 of cancerous tissue should be larger than normal tissue.

Generally, normal stomach tissue was a regular structure, and each layer was homogeneous, but cancerous stomach tissue was heterogeneous. In the flattened images (figures 3(c) and 4(c)), the same depth of the surface-flattened image is equivalent for the same depth in real stomach tissue samples. As shown in figure 4(a), we take into consideration the imaging depth of the OCT (approximately 1.5 mm) and the thickness of the stomach tissue, and analyze the location cancerous tissue occurs (usually 0.3 mm below the surface). The major contribution of the morphological feature is usually located 0.3 mm below the surface. The contour depth line at the 20th, 40th, and 60th pixels of normal and cancerous tissues are shown in figures 5(a) and 6(a). The intensity of the three lines is shown in figures 5(b)(d)) and 6(b)(d).

Figure 5.

Figure 5. The quantitative feature analysis of a normal OCT image. (a) ROI image and contour depth lines (scalar bar: ~1 mm). (b) The intensity line of the 20th pixel. (c) The intensity line of the 40th pixel. (d) The intensity line of the 60th pixel. (e) The intensity line of the sum of the 20th, 40th and 60th pixels. (f) The intensity line of the sum of all 100 contour depth lines.

Standard image High-resolution image
Figure 6.

Figure 6. The quantitative feature analysis of a cancerous OCT image. (a) ROI image and contour depth lines (scalar bar: ~1 mm). (b) The intensity line of the 20th pixel. (c) The intensity line of the 40th pixel. (d) The intensity line of the 60th pixel. (e) The intensity line of the sum of the 20th, 40th and 60th pixels. (f) The intensity line of the sum of all 100 contour depth lines.

Standard image High-resolution image

Normal stomach tissue is continuous and the tissue's texture at the same depth is similar. The intensity of the three-contour depth lines of normal tissue was smooth, but that in the cancerous tissue appeared a dramatic variation. This pattern provided an available method to classify the cancerous tissue and can be quantified further from the standard deviation of the three lines. Three additional quantitative features were designed to describe one image. The standard deviation of the 40th pixel depth line (figures 5(c) and 6(c)) is considered as the third quantitative feature, named S3. The standard deviation of 0.25  ×  20th  +  0.5  ×  40th  +  0.25  ×  60th (figures 5(e) and 6(e)) was considered as the fourth quantitative feature, named S4. Usually, the early-stage cancerous tissue appeared in the subsurface layer and then spread upwards and downwards. Therefore, the 40th-pixel depth line was given more weight. The standard deviation of the sum of all 100 contour depth lines (figures 5(f) and 6(f)) was considered as the fifth quantitative feature, named S5. Since the stomach tissue was a regular structure, the three contour lines represented the features of different layers. S3 showed the feature of a single layer, S4 represented the synthetical feature of three layers and S5 represented the synthetical feature of the all-100 lines.

2.4. MFAC-based classification of stomach normal and cancerous tissue

Five quantitative parameters were extracted from each image and used as feature vectors to classify cancerous stomach tissue using five classifiers, i.e. SVM, KNN, RF, LR, and a typical threshold classification. For the classification of cancerous and normal tissue, a 'hold-out' validation technique was used to validate the classifiers. The holdout method divided the data into two mutually exclusive subsets: a training set and a testing set. The training set included a 500 normal tissue images and 500 cancerous tissue images, randomly selected, while the rest of the images were used as a testing set. Since the training set and testing set were separate, it avoided the overfitting and huge computation cost.

The automated classification algorithm was carried out with the MATLAB software. The SVM was realized by calling functions svmtrain and svmclassify, and the name-value pair arguments were set as 'kernel_function', 'quadratic'. For KNN, we used function 'knnclassify' with the nearest neighbor's value defined as 100. For RF, we used the function 'TreeBagger' to train the model with the parameter and value defined as 'OOBPred' and 'On'. The predict function was used to decide the testing set with the parameter defined as 'Trees'. For LR, the functions glmfit and glmval were used as the training and test procedures. Finally, the typical threshold classification analysis was conducted, where the images were labeled as cancerous tissue if the quantitative parameter exceeded the threshold. Otherwise, the image was defined as normal tissue.

The statistical results obeyed the following rules: a true negative means an image of normal stomach tissue was classified as normal, a false positive means an image of normal stomach tissue was wrongly classified as being cancerous, a true positive means an image of cancerous stomach tissue was classified as being cancerous, and a false negative means an image of cancerous stomach tissue was labeled as being non-cancerous (normal).

Equation (2)

Equation (3)

Equation (4)

The classification sensitivity (equation (2)), specificity (equation (3)), and accuracy (equation (4)) were used to evaluate these classifiers and compared with each other. With the typical threshold classification method, if the threshold value gradually increased, the ROC curve can be drawn out with the sensitivity and 1-specificity.

3. Results

3.1. Quantitative feature analysis of cancerous and normal tissue about stomach tissue

Figure 7 shows the boxplot and p  values of the five quantitative features of the OCT images of cancerous stomach tissues in the S1, S2, S3, S4 and S5. The five quantitative features in images of cancerous tissue were distinguishable from that of images bearing normal tissue. It can be observed from figures 7(a)(e) that the five quantitative features have significant differences in their mean intensity value of cancerous tissue compared to normal tissue.

Figure 7.

Figure 7. The boxplot of five quantitative features of OCT images containing cancerous and normal stomach tissue. (a)–(e) The difference of five quantitative features, which include the std-1line, std-3line, std-allline, std-wholeline, and std-partline between normal tissue and cancerous tissue.

Standard image High-resolution image

3.2. Results of the MFAC-based classification methods

To visualize the effect of the SVM identification, we used the two features, S3 and S4, to identify the OCT images. Figure 8(a) shows the classified procedure of SVM with the features of S3 and S4, and the enlarge region (figure 8(b)) demonstrated the support vectors' data points as outlined circles. Support vectors' data points were based on the training set, and the black line was the classified parameter based on the support vectors' data points, which was used to decide the testing dataset.

Figure 8.

Figure 8. The training and classification results using SVM with the features of S3 and S4. (b) The enlarged region to reveal the support vectors used.

Standard image High-resolution image

The SVM was employed to classify all five quantitative features, and the result was listed in table 1. The classification of S4 presented the best result among all five quantitative features; the sensitivity, specificity and accuracy are 95.34%, 97.21% and 96.27%, respectively. The classified effect based on the local image information was a little better than the general whole image information. The S3, S4 and S5 were specially designed for gastric cancerous tissue based on its regular structure, and the classification result of them was better than the general morphological analysis (S1 and S2).

Table 1. The results of SVM.

  S1 S2 S3 S4 S5
Sensitivity(%) 95.21 95.47 91.45 95.34 93.94
Specificity(%) 84.93 89.60 96.04 97.21 93.70
Accuracy(%) 90.12 92.56 93.72 96.27 93.82

The result of KNN, LR, RF, and threshold classification were listed in the following (tables 25), and they appeared with similar rules to the SVM. From the result of tables 15, the RF method appears to be the best result and the LR method presents the worst result. There is not enough statistical evidence demonstrating that the RF method is the better algorithm than the other four to classify the stomach cancerous OCT images. All five algorithm get very excellent results; the difference between these means was tiny. All five-classification results showed that S3, S4 and S5 are remarkably better than S1 and S2. Hence, the designed feature based on the layer structure of stomach tissue was significantly more effective.

Table 2. The results of KNN.

  S1 S2 S3 S4 S5
Sensitivity(%) 95.09 93.68 90.43 94.58 93.49
Specificity(%) 85.19 91.23 96.95 97.66 94.22
Accuracy(%) 90.18 92.47 93.66 96.10 93.58

Table 3. The results of LR.

  S1 S2 S3 S4 S5
Sensitivity(%) 89.22 91.32 90.04 94.45 93.04
Specificity(%) 85.71 89.86 94.61 95.91 92.14
Accuracy(%) 87.48 90.60 92.31 95.17 92.59

Table 4. The results of RF.

  S1 S2 S3 S4 S5
Sensitivity(%) 90.87 93.55 94.13 96.62 94.51
Specificity(%) 90.38 92.92 94.54 96.10 95.71
Accuracy(%) 90.63 93.24 94.33 96.36 95.11

Table 5. The results of threshold-based classification method.

  S1 S2 S3 S4 S5
Sensitivity(%) 89.53 93.11 94.58 97.13 94.64
Specificity(%) 89.60 91.75 92.33 95.20 92.92
Accuracy(%) 89.56 92.43 93.46 96.17 93.78

The ROC curve of the threshold classification under five features is shown in figure 9. Each curve reflects the rates at which false and true positive values are determined at different thresholds. S4 had the best performance in this test, S3 shared a similar result with S5. S1 and S2 were worse than the previous three parameters. The AUC of the ROC curve was then utilized to estimate the classification method (table 6). The AUC of S4 was up to 99.65%. The ROC curve demonstrates that S3, S4 and S5 have a better performance than S1 and S2.

Table 6. The AUC results of the conventional threshold classification.

  S1 S2 S3 S4 S5
AUC(%) 96.92 98.55 98.98 99.65 99.01
Figure 9.

Figure 9. The ROC curve of the conventional threshold classification.

Standard image High-resolution image

4. Discussion

We collected normal stomach tissue samples and cancerous tissues. The texture characteristics between normal and cancerous tissue are significantly different. Surface flattening and defining the ROI are helpful to exclude an unnecessary image range and low effective information. According to the regular structure characteristic of the stomach tissue, each layer is homogeneous, while the stomach cancerous tissue destroys the texture of the mucosa of typical samples, resulting in a texture disorder. Based on the structural differences, we define three features based on the same-depth lines to quantitatively measure each image. Standard deviation of the ROI and local standard deviation are also extracted as they aid in the quantization of each OCT image. Hence, five quantitative features are described in the OCT image, and five classifiers are classified OCT images; labeled normal and cancerous tissues.

Five classifiers (SVM, KNN, RF, LR and the threshold classification) display great performance under the estimated parameters of sensitivity, specificity, and accuracy. All five classifiers are the common method in image identification and the excellent results mainly depend on the quantitative features designed according to the characteristics of the stomach cancerous OCT images. The features of S3, S4, and S5 are specially designed for gastric cancer tissue, based on its regular structure, and the classification result of them is better than a general morphological analysis (S1 and S2). The best classification results from using the S4 feature shows an over 95% accuracy under all five classifiers. For the threshold-based classification method, the AUC is up to 99.65%. Human stomach cancerous tissue has different morphological features compared with the esophageal wall after chemo-radiation therapy [44]. OCT can differentiate between healthy tissue, fibrotic tissue, and residual cancer with a sensitivity and specificity of 79% and 67%, respectively. Our proposed method has a specificity and sensitivity of approximately 90% or more. The result suggests that the MFAC method could be a potential method for the computer-aided diagnosis of stomach cancer.

There is limited research about applying OCT to the image human stomach cancerous tissue with limited cases. It is a long way to reach an in vivo stomach cancerous diagnosis with OCT. However, the first step is a feasible study to the classification of human stomach cancer with ex vivo OCT. Another advantage of OCT is that OCT is free of staining and real-time image, which can provide guidance for histopathology. By acquiring more clinical OCT images of the stomach tumorous tissues, we will try to train on one set of patients and test on another. In future, an endoscopic OCT for stomach tissue examination will be designed, it can integrate with other operative instruments for theranostics [29, 54] systems to therapy of diseased tissues. The feasibility of applying deep learning in OCT image analysis has also been studied over the past few years. Most of the reports are about ophthalmology because the diagnoses of OCT imaging retina have been developed over decades, thus a large number of OCT images have been collected [4648]. It is proved with enough training data and test data for the deep learning model. The stomach cancerous OCT is in a relatively early phase and the limited data will result in over-fitting and poor classified results. Since the deep learning algorithm is generally considered as an advanced method in classification of many kinds, it is worth trying in future research as more data are obtained about the stomach.

5. Conclusion

We propose the MFAC identification method of normal and cancerous tissues with quantitative feature extraction combined with machine learning in ex vivo stomach cancerous tissue. This identification method includes the quantitative structural feature of the stomach tissue and supervised classification methods. Five quantitative features are chosen as descriptors of the OCT image for stomach cancerous tissue. Five classifiers are used for classifying these OCT images of stomach cancerous tissue. The results demonstrated that the MFAC method has good classification effects and performance. This MFAC method has huge potential for the clinical application of in vivo stomach diseased tissue examination in the future.

Acknowledgments

This work was supported in part by National Natural Science Foundation of China (No. 61575107), National Natural Science Foundation of China (Grant No. 81427803, 81771940), Beijing Municipal Science and Technology Commission (Z151100003915079), Beijing Municipal Natural Science Foundation (7172122), and National Key Research and Development Program of China (2017YFC0108000), State's Key Project of Research and Development Plan (2017YFC0108300, 2017YFC0108301), and China Postdoctoral Science Foundation (Nos. 2018M643846, 2019T120982). The authors thank Hao Liu, Yu Zhu, Guoxin Li of Southern Medical University, Nanfang Hospital, Department of General Surgery for providing the histological sections of stomach tissue and thanks to Hui Zhao, Xin An, Huikai Xie, Peng Li, Mark A Silver, Duo Zhang, Xiao Wang, Jianyu Tang for helping revise the manuscript.

Competing interests

The authors have declared that no competing interests exist.

Please wait… references are loading.