Top

Artificial Intelligence Review

Published in:

Open Access 01-03-2024

Deep learning for thyroid nodule examination: a technical review

Authors: Debottama Das, M. Sriram Iyengar, Mohammad S. Majdi, Jeffrey J. Rodriguez, Mahmoud Alsayed

Published in: Artificial Intelligence Review | Issue 3/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

In recent years, the incidence of thyroid cancer has increased dramatically, resulting in an increased demand for early thyroid nodule examination. Ultrasound (US) imaging is the modality most frequently used to image thyroid nodules; However, the low image resolution, speckle noise, and high variability make it difficult to utilize traditional image processing techniques. Recent advances in deep learning (DL) have increased research into the automated processing of thyroid US images. We review three main image processing tasks for thyroid nodule analysis: classification, segmentation, and detection. We discuss the advantages and limitations of the recently proposed DL techniques as well as the data availability and algorithmic efficacy. In addition, we investigate the remaining obstacles and future potential for automated analysis of thyroid US images.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The human thyroid is an endocrine gland located at the front of the lower neck and measures approximately 5 cm in diameter (Chang et al. 2010). Thyroid hormone, the hormone produced by the thyroid gland, regulates vital functions such as heart rate, calorie burning rate, fertility, digestion, and cognitive function. The prevalence of thyroid nodules in the general population ranges from 19 to 68%, and 7 to 15% of these nodules are malignant (Haugen et al. 2016).

Recently, thyroid cancer has increased at an alarming rate. Recent cancer data indicate 43,800 new cases of thyroid cancer (1185 men and 31,940 women) and 2230 deaths in the United States in 2022 (Siegel et al. 2022). Consequently, proper analysis of thyroid nodules for diagnosis, monitoring, and surgical management is becoming increasingly important.

Ultrasound (US) imaging, also known as ultrasonography (see Fig. 1), is a simple, noninvasive method for imaging soft tissues and is therefore often used to evaluate thyroid nodules. Several sonographic characteristics may indicate possible signs of malignancy in thyroid nodules (see Figs. 2 and 3). Radiologists search for increased blood flow (using Doppler ultrasound imaging), punctate echogenic foci (malignant nodules tend to have more calcifications than benign nodules), uneven borders (due to the irregular shape of malignant nodules compared to the smoother, well-defined edges of benign nodules), a taller than wide form (due to the irregular shape of malignant nodules compared to the typically round or oval shape of benign nodules), and echogenicity in thyroid US images (Gaitini et al. 2011). However, none of these characteristics alone can definitively indicate malignancy; a biopsy is required for a definitive diagnosis. There are several classification systems available to evaluate thyroid nodules based on their US characteristics, such as the American Thyroid Association (ATA) guidelines (Haugen et al. 2016), the Thyroid Imaging Reporting and Data System (TI-RADS) of the American College of Radiology (Tessler et al. 2018), and the European Thyroid Association (ETA) guidelines (Russ et al. 2017). These systems provide a standardized method of evaluating thyroid nodules and can help radiologists determine whether a biopsy is necessary to confirm the presence of cancer. Because the specific criteria and recommendations for each system may vary, it is essential to seek the advice of a medical professional.

Additionally, the application of these classification systems is time consuming and heavily dependent on the radiologist's interpretation. Consequently, the results of the classification system can be variable and are not always reliable. The interest in developing computer-aided diagnostic (CAD) techniques to address these problems has increased among researchers. Initial research on automatic thyroid nodule detection focused on contour- and region-based techniques (Chen et al. 2020). Contour-based techniques examine the shape and size of the nodules to identify possible cancers (Tsantis et al. 2006). Alternatively, region-based techniques (Zhao et al. 2013) analyze the statistical characteristics of gray-level values within an image to identify potential cancers.

The classical approach depends on manually identifying key features of the thyroid nodule (such as size, shape, and presence of calcifications). However, this process is challenging due to speckle noise, acoustic shadow, and refraction blur, which make it difficult to see the boundaries between tissues and, subsequently, difficult to accurately assess the characteristics of nodules (Wu et al. 2013). DL techniques have shown promising results in medical image analysis. In this paper, we review some of the most relevant studies that use DL techniques for thyroid nodule examination (Chai et al. 2020).

It may be helpful to identify specific terms associated with current automated data analysis. Artificial intelligence (AI) refers to the use of a computing device to model intelligent behavior with limited human interference. Machine learning (ML) is a subfield of artificial intelligence that uses algorithms to build a model based on sample data, known as training data, to classify new data without being explicitly programmed to do so. DL (DL) is a subset of ML in which multiple stages of learning are applied to automatically learn a set of model features. Specifically, a DL neural network (DLNN) is a DL system based on artificial neural networks with multiple layers of processing between the input and output. The specific structure and computational techniques used in the multiple layers of processing is referred to as the architecture of the DLNN.

Through automatic learning of complicated patterns, DL systems can mitigate or eliminate some challenges associated with conventional diagnosis (such as lack of expertise, human bias, time, etc.).

The adoption of DL algorithms in the medical field has been rapid and widespread. DL methods have shown promising results in the evaluation of thyroid nodules (Chen et al. 2020; Liu et al. 2019; Anari et al. 2022; Gulame et al. 2021; Cleere et al. 2022; Baldota and Malathy 2021; Zheng et al. 2022a; Sharifi et al. 2021). However, there are still a great number of hurdles and constraints that must be overcome. For example, there is a scarcity of high-quality annotated thyroid ultrasound image datasets, which are necessary for training and evaluating DL systems. Additionally, the inherent variability of ultrasound images can make it difficult for DL algorithms to generalize to new instances.

The three main applications of image processing for thyroid nodule diagnosis include image classification, segmentation, and nodule detection (Liu et al. 2019). In this context, classification is the process of assigning a label or category to an image (e.g., benign, or malignant), segmentation is the process of determining the boundaries of objects in an image, and detection involves locating and recognizing objects in an image. These various image processing techniques can be used together or independently to examine thyroid nodules and help with diagnosis and therapy planning.

In this study, we present a comprehensive overview of the most relevant publications that examine thyroid nodules using DL approaches. Additionally, the challenges and future directions of DL approaches in this field are discussed. A thorough examination of the three aforementioned image analysis approaches, paying particular emphasis to detection frameworks has been conducted in this paper. Additionally, we examine preprocessing methodologies, domain-specific developments in core architectures, data set characteristics, experimental results, and DL constraints for thyroid image analysis. There have been prior reviews of segmentation and classification of thyroid nodules (Chen et al. 2020), but this paper uniquely focuses on thyroid nodule detection by deep-learning techniques.

Section 2 provides an overview from a clinical perspective. Section 3 discusses the study method. Section 4 includes a summary of the types of data available. Section 5 discusses the DL approach adopted for thyroid image analysis. Finally, Sect. 6 provides concluding remarks and thoughts on future research.

2 Clinical overview

2.1 Symptoms and causes

Most nodules are asymptomatic and can be identified during a normal medical check. There is no correlation between the size, morphology and function of the nodule and the severity of symptoms (Hegedus et al. 2003). A large nodule can cause important organs in the neck or upper thoracic cavity to become compressed. Compression of the trachea may cause shortness of breath, coughing, choking feelings, and other symptoms. In patients with large intrathoracic nodules, esophageal compression causes difficulty swallowing. Infrequent signs of malignancy include vocal cord paralysis caused by stretching and compression of the laryngeal nerve, phrenic nerve paralysis, and Horner syndrome induced by sympathetic nerves (Hegedus et al. 2003). In patients with benign nodules, family history, radiation history, and other comorbidities help define future risk. A family history of thyroid cancer (Nagy and Ringel 2015) and early radiation exposure (Kang et al. 2009) increase the probability of developing a malignant thyroid nodule.

Thyroid nodules are part of a complex group of diseases. Complex diseases are prevalent, but their severity varies considerably. Genetic and environmental variables are involved in the genesis of thyroid nodules (Brix and Hegedüs 2000). Iodine deficiency is one of the most prevalent environmental causes of thyroid nodules (Lo et al. 2000). Smoking, naturally occurring goitrogens, some types of medications, and infection are additional environmental influences. The interaction between genetic predisposition and environmental variables could result in the development of thyroid nodules. Age and sex are other risk factors for thyroid nodules. According to research, women are more likely than men to have thyroid nodules. Similarly, children and older individuals are more likely to receive a diagnosis of a thyroid nodule (Hegedus et al. 2003).

2.2 Diagnostic tests

Ultrasound imaging is a noninvasive diagnostic technique that uses high-frequency sound waves to produce images of the body's interior. It is commonly used to identify thyroid nodules by examining the thyroid gland and its surrounding tissues. Ultrasound is a common imaging technique for thyroid nodules because it is safe, painless, and relatively inexpensive. Additionally, it is widely accessible, making it a practical option for patients with suspected nodules. Ultrasound imaging can provide valuable information regarding the size, shape, and location of thyroid nodules, as well as their characteristics such as echogenicity and the presence of suspicious characteristics. This information can help physicians determine the best course of treatment for patients with thyroid nodules. For instance, the need for fine needle aspiration (FNA) depends on the level of risk associated with a detected nodule. For instance, a hypogenic nodule is more likely to be malignant if it has irregular boundaries, microcalcifications, and a form that is taller than wide. Preoperative CT imaging can also be performed if an aggressive form of thyroid cancer is detected.

FNA is a minimally invasive diagnostic procedure. During FNA, samples are obtained from a location with aberrant tissues or body fluids using a small needle to determine the type of nodule (i.e., benign, or malignant) and extract other important information. The American Thyroid Association in 2015 advised FNA for nodules 1 cm or larger with a high or intermediate suspicious pattern on US images, 1.5 cm or larger with a low suspicious pattern, and 2 cm or larger with an extremely low suspicious pattern (Haugen et al. 2016).

2.3 Surgical treatment

The most common and longest-lasting treatment for larger nodules that cause organ compression and aesthetic problems is surgery. Hemithyroidectomy involves the removal of a portion of the thyroid gland. This is considered a low-risk surgery. After surgery, nerve injury to the recurrent laryngeal nerve has been observed in a small percentage of patients (0 to 5.8%) (Lo et al. 2000). Total thyroidectomy refers to the removal of the thyroid gland in its entirety. Multiple hyperfunctioning nodules require a total thyroidectomy to remove them. This will result in hypothyroidism and lifelong thyroid replacement therapy (Trimboli 2022).

3 Research methodology

DL applications in medical US image analysis include classification, segmentation, detection, registration, biometric measurements, and quality assessment, as well as emerging applications such as image-guided interventions and therapy (Anas et al. 2015). In US thyroid image analysis, classification, segmentation, and detection are the three primary tasks for DL systems (Liu et al. 2019).

In our study, a method review has been conducted based on relevant English-language articles in thyroid US image analysis published through May 2022 and indexed in PubMed, Scopus, ScienceDirect, and IEEE databases. The criteria for inclusion of papers included the following: (1) involves DL for US image analysis of thyroid nodules, (2) has one or more relevant keywords in the title, and (3) can be categorized as focusing on classification, segmentation, and/or detection. Keywords included ‘‘thyroid”, ‘‘deep learning,” ‘‘ultrasonography”, and related terms.

For each paper, we have carefully investigated the aims of the research, diversity and availability of the datasets, preprocessing procedures, DL architectures, performance measures, limitations of the study, and any comparative analysis conducted (e.g., non-DL vs. DL or DL vs. domain experts).

4 Data availability

Data plays an important role in any study that involves DL. More data facilitates an architecture's ability to generate accurate outputs for previously unseen inputs. Raw data is often manipulated and transformed to obtain additional representations, also known as ‘‘features”, that can be used in supervised learning. This process is known as ‘‘feature engineering”.

The data is typically divided into three subsets: training, validation, and testing sets. A learning algorithm uses the training set (for determining a set of weights and biases) and the validation set (for fine-tuning the hyperparameters) to accurately model the data. The test set is an independent subset of data used to evaluate the performance of the optimized model. If the model fits both the training and testing sets well, this indicates that the training data were not overfitted. Overfitting occurs when a model is overly complex and can memorize the training data, but performs poorly on new, unobserved data. A model that can adequately fit both the training and testing sets is more likely to generalize to new data and perform well across various tasks.

Thyroid nodule analysis can involve the use of both image and pathology data. The datasets used in the studies we reviewed are listed in Table 1, which shows the number of patients who participated in each study, the partitioning of training and testing data, and the availability of the data. Table 1 indicates that more than 75% of the images were used for training.

Table 1

Datasets used in recent publications

Task	Ref	No. of Patients	No. of Images	Training Images	Testing/Validation Images	Data Availability
Classification	Zhao et al. (2022)	1874	–	1686	188	Private
	Song et al. (2015)	–	155	–	–	–
	Peng et al. (2021)	–	–	18,049	4305	On Request
Segmentation	Sun et al. (2022)	–	3786	2520	280/986	Private
	Pan et al. (2022)	240	727	615	112	Private
	Pan et al. (2021)	289	–			Public
	Pan et al. (2021)	–	400	–	–	Private
	Lu et al. (2022)	–	–	3644	910	–
	Yu et al. (2022)	–	–	–	–	Private
Detection	Li et al. (2018)	300	4670	200 patients	100 patients	Private
	Buda et al. (2019)	1230	1377	1139 patients	91 patients	Private
	Xie et al. (2019)	609	1110	888	222	Private
	Song et al. (2018)	1580	4309	–	–	Private
	Zhao et al. (2021)	718	1730	–	–	Private
	Abdolali et al. (2020)	100	–	2461	820	Private
	Zheng et al. (2022b)	–	1408	1127	281	Private
	Zhang et al. (2021)	–	8000	7200	800	Private
	Song et al. (2022)	–	377	–	–	Private
	Song et al. (2022)	299	347	–	–	Public
	Shahroudnejad et al. (2021)	700	3941	2534	565/842	Private
	Wang et al. (2019)	–	840	–	–	Private

4.1 Pathology data

Pathology reports are widely used to determine specific information about tumor type. When a nodule satisfies certain criteria, FNA samples are obtained during the standard clinical workflow. The cytologic results obtained are reviewed by a pathologist to determine the benignity or malignancy of the nodules (Buda et al. 2019). In certain studies, in which nodules do not meet the criteria for FNA but are highly suspected of malignancy by ultrasound imaging, FNA is recommended prior to further treatment (Wang et al. 2019). In some cases, additional screening steps are added during the preprocessing phase (Zhao et al. 2022). The radiologist uses pathology data to confirm the labels of the images. Pathology data are often absent for benign nodules that are less suspicious. Pathology data can aid in the development of an architecture that takes advantage of domain-specific knowledge.

4.2 Image data

Technical and medical researchers frequently work together to investigate various aspects of ultrasound image analysis, including the development and evaluation of new algorithms for image processing or diagnosis. Collaboration between technical and medical researchers can be advantageous because it allows the integration of technical and medical knowledge. These collaborations also lead to the collection of patient data, which is often not made publicly available due to patient privacy restrictions. Some of the datasets are only available upon request for specific research purposes. This strategy helps to ensure that data is used ethically and responsibly, and that patient privacy is protected. However, with recent advances in DL and calls for reproducible and open science, many institutes have begun releasing deidentified versions of their datasets (Pedraza et al. 2015; “Thyroid for pretraining” 2022). In most studies, thyroid ultrasound imaging involves a transverse or sagittal view of B-mode ultrasound imaging (see Figs. 2 and 3). Zhang (Zhang et al. 2021) uses B-mode as well as Doppler ultrasound images (see Fig. 4) for the analysis.

Objects in an image can be annotated with bounding boxes or polygons, depending on the task. In medical image segmentation and detection, polygon annotation is preferred. In contrast, classification requires annotation at the image-level, i.e., annotation of the entire image. The objects in the US images are manually annotated by a skilled sonographer and radiologist with the aid of pathology data. Various tools, such as the VGG Image Annotator (VIA) (Dutta et al. 2016), are used to effectively annotate image data. Frequently, annotated results are saved in XML or JSON format. Sometimes the experience of experts exceeds 10 years (Zhang et al. 2021). TUN-Det is trained on a dataset manually annotated and validated by five sonographers and three radiologists (Shahroudnejad et al. 2021). Annotation varies based on the radiologist's experience and is time consuming; therefore, one study (Song et al. 2022) recommends the implementation of pseudo-labeling techniques. Typically, to locate and classify thyroid nodules, training images that contain at least one nodule are chosen (Peng et al. 2021; Wang et al. 2019). In some instances, the architectures are trained on images with nodules, but tested on images with or without nodules (Li et al. 2018).

5 Preprocessing strategies

DL models follows the concept of “garbage in, garbage out.” which means the best DL models will perform badly if training data is of low quality. Thus, to achieve high accuracy, it is important to reduce the image acquisition artifacts and standardize images within the data set. Commonly employed preprocessing techniques to standardize the data include rescaling, resizing, and subtraction of the dataset's mean value (Sun et al. 2022; Abdolali et al. 2020). These preprocessing techniques must be carefully chosen and applied to avoid having a negative impact on the data quality.

Some common preprocessing steps in US image analysis include removing patient information and interfering markings, detecting the region of interest (ROI), and square cropping the images. The detection of ROI is an essential step, as it identifies image regions that may contain nodules. Additionally, preprocessing may include a step to eliminate images without annotation (Pan et al. 2021).

Data augmentation (Eaton-Rosen et al. 2018) is a common DL technique used to diversify the training dataset. This is achieved by applying small transformations to images within the dataset, such as rotating or distorting them, adding noise, or adjusting the brightness or contrast. These transformations can help make the DL model more robust and improve its performance on new, unobserved data (Peng et al. 2021). When working with limited training data, data augmentation can prevent overfitting and improve a model's generalizability. It is also possible to perform independently designed augmentation transformations before each training epoch (Sun et al. 2022).

Augmentation transformations should be chosen with care. For example, data augmentation techniques such as steering, elastic deformation, and noise addition have been shown to negatively affect the performance of the Mask R-CNN for nodule detection. This may be because these transformations can introduce additional noise and variability into the dataset, which hinders the ability of the DL model to detect nodules precisely (Abdolali et al. 2020).

Dimensionality reduction is another frequently used preprocessing technique in machine learning and data analysis that reduces the number of features or dimensions present in a dataset. This can aid in reducing redundancy and avoiding the "curse of dimensionality," which refers to the difficulties associated with working with high-dimensional data. Various techniques, such as principal component analysis (PCA) (Falcó-Gimeno and Vallbé 2013) and chi-square tests (Farrow et al. 2020), can be used to reduce dimensions. PCA is a common technique that projects data onto a lower dimensional space while preserving as much of the original data's variability as possible. On the other hand, chi-square tests can be used to determine the significance of each feature in the dataset and select a subset of the most pertinent features for further analysis. These techniques can improve the performance of machine learning algorithms by compressing data and reducing noise (Zhao et al. 2022).

Anchor boxes are frequently employed to facilitate object detection by representing the size and aspect ratio of objects of interest. By analyzing all possible combinations of object sizes within a training dataset, the anchor boxes are optimized for each application. In certain cases, the size of the nodules is investigated as a preprocessing step to adjust the scale and aspect ratio of the anchor boxes (Lu et al. 2022).

6 DL architectures in thyroid US image analysis

DL is a subfield of machine learning that excels at analyzing complex data, such as images, videos, and audio. DL leverages neural network architectures with multiple interconnected layers to process and comprehend intricate patterns within data. These architectures allow DL algorithms to automatically identify and extract various levels of information from the input data. This is achieved through a hierarchical representation of features, where lower layers capture rudimentary details while higher layers progressively assemble these details into more complex, abstract representations. For instance, when applied to image analysis, DL algorithms can discern basic features like edges, corners, and textures in the lower layers. As data flows through the network, subsequent layers consolidate these simple features into more elaborate elements like shapes, objects, and context. This layer-by-layer learning enables DL models to recognize complex structures within images without explicit programming. In this section, we discuss various DL architectures and their use in analyzing thyroid nodules in US images.

DL algorithms can automatically classify nodules in thyroid ultrasound (US) images. DL techniques can be divided into three categories: supervised, unsupervised, and hybrid (Liu et al. 2019). Each of these architectures has its strengths and weaknesses and can be applied to different types of tasks and data.

Supervised DL algorithms are trained in labeled data, in which the correct label or class for each image is known beforehand. Supervised learning is a powerful approach for thyroid ultrasound image analysis, providing accurate and consistent nodule detection and classification, personalized patient care, efficient screening, and the potential for early detection. It complements the expertise of healthcare professionals and enhances the quality of thyroid-related healthcare services. In contrast, unsupervised DL algorithms do not require labeled data and can learn to identify patterns and structures in the data without any prior knowledge.

Most frequently, supervised learning algorithms are preferred for ultrasound image analysis. Supervised DL algorithms discover the relationship between input data and desired output in order to make decisions on new, unseen data. A common application of supervised DL algorithms is classification, where the objective is to determine the class or category of an input image. However, collecting a large dataset of labeled images is difficult, making the implementation of these techniques challenging.

In contrast, unsupervised DL algorithms do not require labeled data and can learn to identify patterns and structures in the data without any prior knowledge. These algorithms are frequently used for clustering, where the objective is to group similar data samples without prior knowledge of the groups.

Hybrid DL architectures combine elements of supervised and unsupervised learning, enabling models with greater flexibility and versatility. These architectures can be trained on both labeled and unlabeled data and are applicable to a variety of tasks. When there is a lack of labeled data or when the data is complex and difficult to accurately classify hybrid DL algorithms can be especially beneficial.

In this study, we are mainly focusing on supervised learning techniques. The subsequent portion of this section delves into various deep learning architectures employed to fulfill three fundamental tasks: classification, segmentation, and detection, all within the specific realm of analyzing thyroid US images.

6.1 Classification

Historically, ML-based classifiers required the designer to define the key features used (Chai et al. 2020). Song et al. (2015) applies six statistical techniques, such as a support vector machine (SVM), to the output of the gray co-occurrence matrix (Singh et al. 2017) to classify nodules as benign or malignant. A recent multicenter study of thyroid ultrasound images demonstrated that the combined architectures of three networks trained with vast amounts of data can outperform a radiologist (Peng et al. 2021).

In addition to using DL algorithms as standalone classifiers, researchers have examined the use of image processing techniques and convolutional neural networks (CNNs) in tandem, resulting in an increase in the accuracy and performance of DL models for thyroid US image analysis. One such example is Feature Fusion ResNet (Zhao et al. 2022). Due to the similarity of texture information associated with lung and thyroid nodules, ResNet is initially trained on a Kaggle dataset for lung nodule recognition and then refined for thyroid nodule classification. Feature extraction techniques include the gray-level size zone matrix, the gray-level co-occurrence matrix, and the gray-level run length matrix. These are used to extract features that are then sent to a fully linked layer in a neural network. Principal component analysis to is used to eliminate redundant characteristics. Feature Fusion ResNet outperformed VGG16, ResNet, and VGG16.

6.2 Segmentation

Thyroid nodule segmentation is an integral part of ultrasound image diagnosis of thyroid diseases (Russ et al. 2017). In recent years, the segmentation of additional tissues related to the thyroid gland (Ma et al. 2022) has been the subject of investigation. A comprehensive review (Chen et al. 2020) focusing on a comparative analysis of 28 studies on CAD-based systems and DL demonstrates the evolution of thyroid gland and nodule segmentation until 2019. That review highlights the similarities and contrasts between the approaches and discusses the benefits and drawbacks of each system.

U-Net (Chu et al. 2021) is a DL architecture that was specifically designed for image segmentation tasks. It consists of a CNN with an encoder-decoder structure, where the encoder extracts feature from the input image and the decoder reconstructs a precise segmentation of the input image. U-Net has been shown to be highly effective and precise for ultrasound (US) image segmentation tasks involving thyroid nodules. This is because it can automatically learn and extract relevant features from the data (such as the shape and location of thyroid nodules) without requiring prior knowledge or manual feature engineering (Chu et al. 2021).

The Semantic Guided U-Net (SG-UNET) (Pan et al. 2021) was recently proposed in response to its success. The use of average pooling and leaky ReLU reduces noise and attenuates the unfavorable filter response. To reduce noise interference caused by the mirror structure of the U-Net, a side network accepts high-dimensional features and transforms them into one-dimensional semantic features. The suggested architecture outperforms U-Net and U-Net++ in terms of performance. In addition to fully automated systems, there are semi-automated systems, such as mark-guided U-Net-based segmentation systems (Lu et al. 2022). The network proposed by Chu (Lu et al. 2022) showed segmentation precision of 0.9785.

Sun's TNSnet (Sun et al. 2022) is a dual network comprising two sub-networks: a form network and a regional network. The form network oversees figuring out an object's general shape or form, while the regional network is in charge of figuring out the object's unique details and traits. Compared to conventional single-network architectures, this dual network architecture permits more precise and accurate object detection.

Recently, researchers used a two-stage network to detect a specific form of thyroid cancer, medullary thyroid carcinoma, which is the second most prevalent (yet rare) form of thyroid cancer (Pan et al. 2022). The segmentation map is generated using a coarse-to-fine segmentation network (C2F-SegNet), which is a combination of CoarseNet and FineNet. A classifier based on past knowledge is then used to classify the nodules. The backbone structure of the knowledge-based classifier network is Resnet-34, which has been pre-trained on ImageNet (Deng et al. 2009). The suggested network not only accurately differentiates the malignant nodule but also the papillary and medullary forms of thyroid cancer. The suggested network's segmentation architecture has a greater IoU and DSC performance than U-Net and U-Net++, while U-Net++ has a higher recall and precision. In contrast, the classification architecture surpasses ResNest34 and ResNest50 in all aspects of evaluation.

A weakly supervised model can help with the management of over-or under-segmentation of thyroid nodules (Yu et al. 2022). Semantic features are extracted using image-level classification labels. The nodule site is activated using a dual branch soft erase module (DBSM) and a scale feature adaptation module (SFAM). The edge self-attention module (ESAM) is implemented to handle blurred edges.

For further details regarding thyroid gland segmentation and thyroid nodule segmentation methods for medical ultrasound images, see the existing comprehensive review (Chen et al. 2020).

6.3 Detection

The detection task determines the location and identification of anomalies (e.g., lesions and tumors) and other anatomical objects (e.g., fetal standard plans, organs, tissues, etc.) in US image analysis. A new era in thyroid nodule detection has begun with the introduction of DL architectures such as region-based convolution neural networks (R-CNN) (Girshick et al. 2014), Faster R-CNN (Ren et al. 2015), and you only look once (YOLO) (Redmon et al. 2016).

Li et al. (2018) proposed an enhanced version of Faster R-CNN for the detection of thyroid nodules. A spatial constraint layer is implemented to extract the characteristics of the surrounding region. Combining the shallow and deep layers of the networks allows identification of small hazy nodules. Standard, Faster R-CNN with a layer concatenation technique was unable to detect solid nodules with uneven borders. Buda et al. (2019) suggests a three-part model. The first component is a Faster R-CNN trained to recognize capillaries using Resnet101 as its backbone. The recovered square image of the nodules is then sent to a second network. The second network is a multitask deep convolutional neural network that is utilized to detect whether a nodule is malignant. The third section focuses on risk classification using the ACR TI-RAD output. In comparison to the consensus of three radiologists, the proposed network generates similar results. In addition, the risk classification network improves the specificity of the referral for thyroid nodule biopsy.

Xie et al. (2019) proposed three completely convolutional networks based on single-shot multi-box detectors (SSD) (Liu et al. 2016), namely SSD300, SSD300 cov3, and SSD512. The base model is a typical image classification model with a fully linked layer. A convolutional layer stack is applied to extract additional features. These layers downsample the photos to generate a diverse range of large and small features. At the end of the network, a convolutional predictor is added to produce a class score for certain feature maps. The suggested architecture's training loss function combines smooth L1 loss and class-weighted entropy loss. The SSD300 was unable to identify the tiny nodules. Sometimes, SSD300 cov3 and SSD512 identified false positives among smaller nodules. The SSD300 was unable to identify certain dense nodules. The application of SSD showcases their suitability for capturing both global and local features present in thyroid ultrasound images.

Song et al. (2018) suggested a CNN with several cascaded architecture to detect thyroid nodules, based on SSD and the multi-box framework. The detection layer of the SSD is reconstructed by adding several convolution layers and anchor-generated layers to extract local and global features. The two-stage network designed to localize and classify nodules in a pyramidal structure. After the first localization, the potential region of interest is input into a spatial pyramid supplemented by CNNs to achieve adequate recognition of the thyroid. The architecture was constructed using a huge dataset. The architecture was unable to detect excessively small or large nodules. To combat the lack of high-quality data, the authors proposed investigating the use of transfer learning as future work. Multitasking within deep learning models involves training a single architecture to perform multiple related tasks concurrently. This approach offers advantages such as efficient resource utilization, shared information across tasks, and regularization against overfitting. However, it introduces trade-offs that necessitate careful consideration. Efficient resource allocation is crucial to prevent compromising task-specific performance, and increased model complexity can hinder interpretability and optimization. Task interference, stemming from dissimilarities between tasks or conflicting patterns, must be balanced. Striking a balance between resource efficiency and maintaining task-specific performance is pivotal.

Zaho et al. (2021) recently presented a two-part model. SSD with ResNet50 as backbone is used for thyroid nodule detection. The last few layers of the ResNet50 architecture are replaced with a residual block and a few more blocks. To account for the various aspect ratios of the thyroid nodule, anchor boxes between 0.3 and 3 are produced. Nodules are clipped with a 50-pixel margin, and a 256 × 256 image is sent to the classification network. For classification, A-ResNet50-F, a modified version of ResNet50, was used. A-ResNet50-F is made up of an additional attention block and a fire block. The decision to clip nodules with a 50-pixel margin and process them as 256 × 256 images is a reasonable preprocessing step, likely aiding in standardizing the input size for the classification network. The introduction of A-ResNet50-F, which includes an attention block and a fire block, is intriguing as it implies a focus on capturing crucial nodule-specific features and optimizing the network for the classification task. The suggested architecture demonstrates a higher average precision than SSD (backbone vgg16), Faster R-CNN (backbone ResNet50), and YOLO v3 (with Darknet53). For semi-solid nodules with irregular edges, the identification system detected a somewhat larger bounding box. However, the classification network has beaten well-known networks such as VGG16 and Inception v3. In terms of classification, the proposed technique also beats expert radiologists.Mask R-CNN (He et al. 2017) is a two-stage network repeatedly used for object detection and segmentation. The increasing popularity of Mask R-CNN has paved the way for a new direction in thyroid nodule detection (Abdolali et al. 2020). The R-CNN mask combines classification, segmentation, and localization loss, which has been shown to be advantageous for the detection of thyroid nodules. The combined loss function favors detection over-segmentation. Resnet50 beat all other pre-trained backbone networks, including Resnet, U-Net, Mobilenet, and Inception V2, due to the smaller data size. Comparing the performance of mask R-CNN to R-CNN with the new loss function, which is faster, demonstrates a focus on both accuracy and efficiency. However, the noted limitation of the network's suboptimal performance in primarily solid nodules reveals an area that requires attention. The absence of thyroid parenchyma and microcystic nodules in the dataset used is also a relevant consideration for the generalizability of the results. The future direction outlined by the authors, which involves reducing the complexity of mask R-CNN and addressing overfitting, provides a proactive approach to refining the model. Simplifying the architecture and enhancing its generalization capabilities can contribute to the model's robustness and usability in diverse clinical scenarios. As a solution to inaccurate localization, Zheng et al. (2022b) proposes a more efficient cascade mask R-CNN. The structure is composed of two stages. In the first stage, an ROI is determined using FPN (Lin et al. 2017), the backbone, and RPN. The subsequent stage is the multi-cascading detector network, which provides a more specific location for the nodule. Using a modified version of L1 loss, the gradient of the easy samples is increased, providing a balance between the easy and difficult samples during the training phase. Experimental results demonstrate that using more than three cascading networks affects performance. A "soft NMS" is used to reduce the likelihood of an object being missed due to non-maximal suppression (NMS). The baseline with soft NMS did not improve localization on its own. Using a novel detector with five convolutional layers and one fully linked layer, together with L1 regularization, results in a considerable improvement in localization. The authors claim that the proposed network is more precise than Mask R-CNN, Faster R-CNN, and Libra R-CNN (Pang et al. 2019) for medium and small nodules. The proposed network has a lower true negative rate than Mask R-CNN.

Wang et al. (2019) proposed a modified version of YOLO to identify thyroid nodules with greater precision. YOLO is a one-stage model that decreases temporal complexity and enhances precision. Detecting thyroid nodules requires more precision than quickness. YOLO v2 and Resnet v2-50 are combined to create this model. The feature maps from the deep layers are blended with those from the shallow layers to generate more accurate feature maps. This strategy capitalizes on the diverse information captured across different layers, aiming to generate more accurate and comprehensive feature maps. The suggested network is quick and claims to reach the same sensitivity, positive predictive value, and accuracy as the radiologist. The dataset is divided into two classes: nodules and without nodules. The nodule classes are also classified as benign or malignant. The dataset's division into nodules and without-nodules classes, along with the classification of nodule classes into benign and malignant, highlights a structured and comprehensive dataset setup. This approach likely contributes to the model's ability to differentiate between various nodule types effectively. The network is effective in detecting images without nodules; in only two instances the network does not detect the absence of a nodule. The network is effective in localizing cancerous nodules. Most malignant thyroid cases in the dataset are papillary thyroid carcinoma. The work does not explore the improved performance of the radiologist using the suggested network.

Redmon and Farhadi developed YOLO v3 (Redmon and Farhadi 2018), which is renowned for having greater precision than its predecessors. Thus, the researchers developed YOLO-HRNet by using YOLO v3 as the base model (Zhang et al. 2021). It consists of five distinct layers: input, downsampling, feature extraction, multi-scale detection, and prediction. The downsampling layer is used to scale the network's parameters. The incorporation of HRNet (Wang et al. 2020), known for extracting high-level semantic features, is a strategic choice to enhance the network's ability to understand and capture complex visual characteristics. This choice aligns with the objective of achieving superior performance over previous architectures. In all respects, YOLO-HRNet outperforms the Yolo V3 baseline. However, SSD beats YOLO-HRNet in terms of speed, while faster R-CNN with Resnet50 as its backbone outperforms the network in terms of precision. The network generates larger bounding boxes for small hypogenic nodules with fuzzy edges. Redmon and Farhadi anticipate more tagged medical data and the use of thyroid antibodies for detection.

Song et al. (2022) proposed a feature-enhanced dual branch network (FDnet) to detect thyroid nodules. The detection network includes a semantic segmentation network and a feature enhancement method. A more precise localization is achieved by using an iterative training technique that blends the ground truth with the branch result. Resnet and FPN serve as the backbone structures. An RPN is utilized to target the nodule region. Using a pseudo-labeling system, the pseudo-labels generated in an epoch are used as the ground truth in the subsequent epoch. Given their proven effectiveness in feature extraction tasks, using ResNet and FPN as backbone structures is a prudent choice. Additionally, incorporating an RPN (Region Proposal Network) to focus on the nodule region suggests a strategic approach to narrowing the areas of interest. The introduction of a pseudo-labeling system is an intriguing aspect of the approach. This strategy, where pseudo-labels generated in one epoch are used as ground truth in the next epoch, demonstrates an innovative approach to augmenting the training data. It's particularly impressive that the proposed network exhibits strong performance even when trained on a small dataset, potentially making it more accessible and practical for real-world medical applications. It outperforms the most widely used R-CNN designs. CornerNet (Law and Deng 2018) and DETR (Zhu et al. 2020) surpass FDNet in mAP and F1 score, despite requiring 240 and 150 training epochs, respectively. In addition to saving processing time, Song utilized a pseudo-labeling scheme to construct further unsupervised schemes.

Shahroudnejad et al. (2021) developed a one-stage FPN model, known as TUD-Net, to gather multiscale characteristics from multiple resolution feature maps. The suggested architecture consists of three parallel layers of RSU designed to replace a CNN for classification and regression tasks. The model recognizes heterogeneous, large, and hypoechoic nodules successfully. In terms of average precision, the model outperforms models like Retinanet and Faster R-CNN. TUD-Net performs well under different IoU thresholds.

Lu et al. (2022) recently applied a GAN-guided CAM-based technique to the diagnosis of thyroid nodules. The architecture includes class activation mapping (CAM) (Zhou et al. 2016) to identify discriminatory features in thyroid nodules, and a GAN (Yi et al. 2019) guided deformable module to capture finer-grained distinctions between benign and malignant nodules. CAM supplies the saliency map to the outer layers of the deformable network also known as deformable convolution layers. The CAM and the deformable module together successfully detect the subtle differences between the nodules. Real samples are generated by augmenting the ground truth mask using prior knowledge. The number of augmentation techniques used are limited and can be easily captured by the GAN. Thus, a simple augmentation technique such as zooming at the nodule can deceive the discriminator. If for a particular application the nodule shape is not relevant for diagnosis, or the boundaries are regular, the GAN may prevent the deformable module from capturing the meaningless features from the images.

Integrating the CAM mechanism to identify discriminatory features in thyroid nodules is a strategic step. The CAM's ability to generate saliency maps contributes to a focused understanding of relevant areas within the nodule images. A notable advancement is the subsequent collaboration with a GAN-guided deformable module to capture nuanced distinctions between benign and malignant nodules. By incorporating the deformable convolution layers with the saliency information from the CAM, the model seems well-equipped to discern subtle differences between nodules. The inclusion of real sample generation through augmented ground truth masks, based on prior knowledge, highlights a data augmentation strategy. This approach likely contributes to a more diverse training dataset, which can be particularly useful in addressing potential overfitting and enhancing the model's generalization capabilities.

Based on the research work cited in this review, it is evident that there is a need for standardized evaluation matrices. DL networks using the same base model have been evaluated inconsistent performance matrices; for example, see (Xie et al. 2019; Song et al. 2018). It is very difficult to draw any kind of comparison between the networks by just looking at the results. Apart from that, there is a need for a standardized data set for testing all the networks.

7 Conclusion

Early detection is crucial in preventing complications associated with thyroid cancer. Although most cases exhibit slow growth and good differentiation, the disease can rapidly spread to surrounding tissues if left undiagnosed. Timely and accurate diagnosis can significantly increase the chances of successful treatment, leading to a higher cure rate.

The focus of present-day research is to develop computer-aided diagnostic tools to assist radiologists in thyroid nodule detection, thereby reducing the crucial diagnosis time. DL models can also be used to train and improve radiologists' performance. Research shows that AI significantly improves the performance of junior radiologists. In this paper, we have reviewed papers from 2018 to 2022. There have been prior reviews of segmentation and classification of thyroid nodules, but this paper focuses on thyroid nodule detection by deep-learning techniques.

Although there have been significant advances in analyzing thyroid ultrasound images using DL techniques, existing methods have certain limitations such as amount of data, availability of public and valid datasets, and absence of standard evaluation metrics which need to be addressed in future work.

The limited amount of labeled data is a frequent bottleneck in applying DL to medical image analysis. The design of more precise DL architectures will depend on the ability to collect larger quantities of labeled data. A multicenter diagnostic study is an effective way to ensure a large database with a wide variety of data to train the architectures. To automate the labeling process, the field of unsupervised learning has become increasingly important. The use of an Auto Encoder (AE) or Restricted Boltzmann Machine (RBM), which have been proven to be effective for feature extraction, can provide a solution for automatically labeling image data. In addition, development of more advanced preprocessing techniques may lead to improved performance.

In a real-world setting, radiologists often perform better when they are analyzing videos. There has been some recent research focused on video processing (Wu et al. 2021), but there are still many opportunities for application of video processing for thyroid nodule analysis.

To develop more effective DL solutions, data scientists and engineers must collaborate closely with medical experts. Together they will continue to discover novel ways to incorporate domain-based knowledge into DL architectures for thyroid nodule analysis.

Declarations

Competing interest

The authors declare no competing interests.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article Semi-supervised attribute reduction for hybrid data

next article Elk herd optimizer: a novel nature-inspired metaheuristic algorithm

Abdolali F, Kapur J, Jaremko JL, Noga M, Hareendranathan AR, Punithakumar K (2020) Automated thyroid nodule detection from ultrasound imaging using deep convolutional neural networks. Comput Biol Med 122:103871. https://doi.org/10.1016/j.compbiomed.2020.103871CrossRefPubMed

Anari S, Tataei Sarshar N, Mahjoori N, Dorosti S, Rezaie A (2022) Review of deep learning approaches for thyroid cancer diagnosis. Math Probl Eng. https://doi.org/10.1155/2022/5052435CrossRef

Anas EMA, Seitel A, Rasoulian A, John PS, Pichora D, Darras K et al (2015) Bone enhancement in ultrasound using local spectrum variations for guiding percutaneous scaphoid fracture fixation procedures. Int J Comput Assist Radiol Surg 10(6):959–969. https://doi.org/10.1007/s11548-015-1181-6CrossRefPubMed

Baldota S, Malathy C (2021) Classification of ultrasound thyroid nodule images by computer-aided diagnosis: a technical review. Comput vis Bio-Inspir Comput. https://doi.org/10.1007/978-981-33-6862-0_30CrossRef

Brix TH, Hegedüs L (2000) Genetic and environmental factors in the aetiology of simple goiter. Ann Med 32(3):153–156CrossRefPubMed

Buda M, Wildman-Tobriner B, Hoang JK, Thayer D, Tessler FN, Middleton WD, Mazurowski MA (2019) Management of thyroid nodules seen on us images: deep learning may match performance of radiologists. Radiology 292(3):695–701. https://doi.org/10.1148/radiol.2019181343CrossRefPubMed

Chai YJ, Song J, Shaear M, Yi KH (2020) Artificial intelligence for thyroid nodule ultrasound image analysis. Ann Thyroid 5(8):1–5. https://doi.org/10.21037/aot.2020.04.01CrossRef

Chang CY, Lei YF, Tseng CH, Shih SR (2010) Thyroid segmentation and volume estimation in ultrasound images. IEEE Trans Biomed Eng 57(6):1348–1357. https://doi.org/10.1109/TBME.2010.2041003CrossRefPubMedADS

Chen J, You H, Li K (2020) A review of thyroid gland segmentation and thyroid nodule segmentation methods for medical ultrasound images. Comput Methods Programs Biomed 185:105329. https://doi.org/10.1016/j.cmpb.2020.105329CrossRefPubMed

Chu C, Zheng J, Zhou Y (2021) Ultrasonic thyroid nodule detection method based on U-net network. Comput Methods Progr Biomed 199:105906. https://doi.org/10.1016/j.cmpb.2020.105906CrossRef

Cleere EF, Davey MG, O’Neill S, Corbett M, O’Donnell JP, Hacking S et al (2022) Radiomic detection of malignancy within thyroid nodules using ultrasonography—a systematic review and meta-analysis. Diagnostics 12(4):794. https://doi.org/10.3390/diagnostics12040794CrossRefPubMedPubMedCentral

Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255

Dutta A, Gupta A, Zissermann A (2016) VGG image annotator (VIA). http://www.robots.ox.ac.uk/vgg/software/via, 2

Eaton-Rosen Z, Bragman F, Ourselin S, Cardoso MJ (2018) Improving data augmentation for medical image segmentation

Falcó-Gimeno A, Vallbé JJ (2013) Coalition agreements and party preferences: a principal components analysis approach. In: EPSA 2013 annual general conference, paper 754

Farrow E, Li J, Zaki F, Lall A (2020) Accessible streaming algorithms for the chi-square test. In: 32nd international conference on scientific and statistical database management, pp 1–12. https://doi.org/10.1145/3400903.3400905

Gaitini D, Evans RM, Ivanac G (2011) Chapter 16: thyroid ultrasound. EFSUMB Course Book

Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

Gulame MB, Dixit VV, Suresh M (2021) Thyroid nodules segmentation methods in clinical ultrasound images: a review. Mater Today: Proc 45:2270–2276. https://doi.org/10.1016/j.matpr.2020.10.259CrossRef

Haugen BR, Alexander EK, Bible KC, Doherty GM, Mandel SJ, Nikiforov YE et al (2016) 2015 American Thyroid Association Management Guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: the American Thyroid Association Guidelines Task Force on thyroid nodules and differentiated thyroid cancer. Thyroid 26(1):1–133. https://doi.org/10.1089/thy.2015.0020CrossRefPubMedPubMedCentral

He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

Hegedus L, Bonnema SJ, Bennedbaek FN (2003) Management of simple nodular goiter: current status and future perspectives. Endocr Rev 24(1):102–132. https://doi.org/10.1210/er.2002-0016CrossRefPubMed

Kang TW, Rhim H, Kim EY, Kim YS, Choi D, Lee WJ, Lim HK (2009) Percutaneous radiofrequency ablation for the hepatocellular carcinoma abutting the diaphragm: assessment of safety and therapeutic efficacy. Korean J Radiol 10(1):34–42. https://doi.org/10.3348/kjr.2009.10.1.34CrossRefPubMedPubMedCentral

Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750

Li H, Weng J, Shi Y, Gu W, Mao Y, Wang Y et al (2018) An improved deep learning approach for detection of thyroid papillary cancer in ultrasound images. Sci Rep 8(1):1–12. https://doi.org/10.1038/s41598-018-25005-7CrossRefADS

Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

Liu S, Wang Y, Yang X, Lei B, Liu L, Li SX et al (2019) Deep learning in medical ultrasound analysis: a review. Engineering 5(2):261–275. https://doi.org/10.1016/j.eng.2018.11.020CrossRef

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

Lo CY, Kwok KF, Yuen PW (2000) A prospective evaluation of recurrent laryngeal nerve paralysis during thyroidectomy. Arch Surg 135(2):204–207. https://doi.org/10.1001/archsurg.135.2.204CrossRefPubMed

Lu J, Ouyang X, Shen X, Liu T, Cui Z, Wang Q, Shen D (2022) GAN-guided deformable attention network for identifying thyroid nodules in ultrasound images. IEEE J Biomed Health Inform 26(4):1582–1590. https://doi.org/10.1109/JBHI.2022.3153559CrossRefPubMed

Ma L, Tan G, Luo H, Liao Q, Li S, Li K (2022) A novel deep learning framework for automatic recognition of thyroid gland and tissues of neck in ultrasound image. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2022.3157828CrossRefPubMedPubMedCentral

Nagy R, Ringel MD (2015) Genetic predisposition for nonmedullary thyroid cancer. Hormones Cancer 6(1):13–20. https://doi.org/10.1007/s12672-014-0205-yCrossRefPubMed

Pan L, Cai Y, Lin N, Yang L, Zheng S, Huang L (2022) A two-stage network with prior knowledge guidance for medullary thyroid carcinoma recognition in ultrasound images. Med Phys 49(4):2413–2426. https://doi.org/10.1002/mp.15492CrossRefPubMed

Pan H, Zhou Q, Latecki LJ (2021) SGUNET: semantic guided UNet for thyroid nodule segmentation. In: 2021 IEEE 18th international symposium on biomedical imaging (ISBI). IEEE, pp 630–634. https://doi.org/10.1109/ISBI48211.2021.9434051

Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 821–830

Pedraza L, Vargas C, Narváez F, Durán O, Muñoz E, Romero E (2015) An open access thyroid ultrasound image database. In: 10th international symposium on medical information processing and analysis. SPIE, vol 9287, pp 188–193

Peng S, Liu Y, Lv W, Liu L, Zhou Q, Yang H et al (2021) Deep learning-based artificial intelligence model to assist thyroid nodule diagnosis and management: a multicenter diagnostic study. Lancet Digit Health 3(4):e250–e259. https://doi.org/10.1016/S2589-7500(21)00041-8CrossRefPubMed

Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767

Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788. http://arxiv.org/abs/1506.02640

Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in neural information processing systems, p. 28

Russ G, Bonnema SJ, Erdogan MF, Durante C, Ngu R, Leenhardt L (2017) European thyroid association guidelines for ultrasound malignancy risk stratification of thyroid nodules in adults: the EU-TIRADS. Eur Thyroid J 6(5):225–237. https://doi.org/10.1159/000478927CrossRefPubMedPubMedCentral

Shahroudnejad A, Qin X, Balachandran S, Dehghan M, Zonoobi D, Jaremko J et al (2021) TUN-det: a novel network for thyroid ultrasound nodule detection. In: International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp. 656–667. https://doi.org/10.1007/978-3-030-87193-2_62

Sharifi Y, Bakhshali MA, Dehghani T, DanaiAshgzari M, Sargolzaei M, Eslami S (2021) Deep learning on ultrasound images of thyroid nodules. Biocybern Biomed Eng 41(2):636–655. https://doi.org/10.1016/j.bbe.2021.02.008CrossRef

Siegel RL, Miller KD, Fuchs HE, Jemal A (2022) Cancer statistics, 2022. CA: A Cancer J Clin 72(1):7–33. https://doi.org/10.3322/caac.21708CrossRef

Singh S, Srivastava D, Agarwal S (2017) GLCM and its application in pattern recognition. In: 2017 5th international symposium on computational and business intelligence (ISCBI). IEEE, pp 20–25. https://doi.org/10.1109/ISCBI.2017.8053537

Song G, Xue F, Zhang C (2015) A model using texture features to differentiate the nature of thyroid nodules on sonography. J Ultrasound Med 34(10):1753–1760. https://doi.org/10.7863/ultra.15.14.10045CrossRefPubMed

Song W, Li S, Liu J, Qin H, Zhang B, Zhang S, Hao A (2018) Multitask cascade convolution neural networks for automatic thyroid nodule detection and recognition. IEEE J Biomed Health Inform 23(3):1215–1224. https://doi.org/10.1109/JBHI.2018.2852718CrossRefPubMed

Song R, Zhu C, Zhang L, Zhang T, Luo Y, Liu J, Yang J (2022) Dual-branch network via pseudo-label training for thyroid nodule detection in ultrasound image. Appl Intell. https://doi.org/10.1007/s10489-021-02967-2CrossRef

Sun J, Li C, Lu Z, He M, Zhao T, Li X et al (2022) TNSNet: thyroid nodule segmentation in ultrasound imaging using soft shape supervision. Comput Methods Progr Biomed 215:106600. https://doi.org/10.1016/j.cmpb.2021.106600CrossRef

Tessler FN, Middleton WD, Grant EG (2018) Thyroid imaging reporting and data system (TI-RADS): a user’s guide. Radiology 287(1):29–36. https://doi.org/10.1148/radiol.2017171240CrossRefPubMed

Thyroid for pretraining. https://www.kaggle.com/tingzen/thyroid-for-pretraining. Accessed 27 Apr 2022

Trimboli P (2022) Risk stratification of thyroid nodule: from ultrasound features to TIRADS. MDPI. https://doi.org/10.3390/books978-3-0365-3759-7CrossRef

Tsantis S, Dimitropoulos N, Cavouras D, Nikiforidis G (2006) A hybrid multi-scale model for thyroid nodule boundary detection on ultrasound images. Comput Methods Programs Biomed 84(2–3):86–98. https://doi.org/10.1016/j.cmpb.2006.09.006CrossRefPubMed

Wang L, Yang S, Yang S, Zhao C, Tian G, Gao Y et al (2019) Automatic thyroid nodule recognition and diagnosis in ultrasound imaging with the YOLOv2 neural network. World J Surg Oncol 17(1):1–9. https://doi.org/10.1186/s12957-019-1558-zCrossRef

Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43(10):3349–3364CrossRef

Wu S, Zhu Q, Xie Y (2013) Evaluation of various speckle reduction filters on medical ultrasound images. In: 2013 35th annual international conference of the IEEE engineering in medicine and biology society (EMBC), IEEE, pp 1148–1151. https://doi.org/10.1109/EMBC.2013.6609709.

Wu X, Tan G, Zhu N, Chen Z, Yang Y, Wen H, Li K (2021) CacheTrack-YOLO: real-time detection and tracking for thyroid nodules and surrounding tissues in ultrasound videos. IEEE J Biomed Health Inform 25(10):3812–3823. https://doi.org/10.1109/JBHI.2021.3084962CrossRefPubMed

Xie S, Yu J, Liu T, Chang Q, Niu L, Sun W (2019) Thyroid nodule detection in ultrasound images with convolutional neural networks. In: 2019 14th IEEE conference on industrial electronics and applications (ICIEA). IEEE, pp 1442–1446. https://doi.org/10.1109/ICIEA.2019.8834375

Yi X, Walia E, Babyn P (2019) Generative Adversarial Network in medical imaging: a review. Med Image Anal 58:101552. https://doi.org/10.1016/j.media.2019.101552CrossRefPubMed

Yu M, Han M, Li X, Wei X, Jiang H, Chen H, Yu R (2022) Adaptive soft erasure with edge self-attention for weakly supervised semantic segmentation: thyroid ultrasound image case study. Comput Biol Med 144:105347. https://doi.org/10.1016/j.compbiomed.2022.105347CrossRefPubMed

Zhang L, Zhuang Y, Hua Z, Han L, Li C, Chen K et al (2021) Automated location of thyroid nodules in ultrasound images with improved YOLOV3 network. J X-Ray Sci Technol 29(1):75–90. https://doi.org/10.3233/XST-200775CrossRef

Zhao J, Zheng W, Zhang L, Tian H (2013) Segmentation of ultrasound images of thyroid nodule for assisting fine needle aspiration cytology. Health Inf Sci Syst 1(1):1–12. https://doi.org/10.1186/2047-2501-1-5CrossRef

Zhao Z, Yang C, Wang Q, Zhang H, Shi L, Zhang Z (2021) A Deep learning-based method for detecting and classifying the ultrasound images of suspicious thyroid nodules. Med Phys 48(12):7959–7970. https://doi.org/10.1002/mp.15319CrossRefPubMed

Zhao X, Shen X, Wan W, Lu Y, Hu S, Xiao R et al (2022) Automatic thyroid ultrasound image classification using feature fusion network. IEEE Access 10:27917–27924. https://doi.org/10.1109/ACCESS.2022.3156096CrossRef

Zheng Y, Qin L, Qiu T, Zhou A, Xu P, Xue Z (2022b) Automated detection and recognition of thyroid nodules in ultrasound images using improve cascade mask R-CNN. Multimed Tools Appl 81(10):13253–13273CrossRef

Zheng Z, Chen J, Weng Z, Zhang Y (2022) Comparison and analysis of ultrasound diagnosis networks for thyroid nodules based on different computer vision task types. In: 2022 3rd international conference on electronic communication and artificial intelligence (IWECAI), IEEE, pp 409–413. https://doi.org/10.1109/IWECAI55315.2022.00086

Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929

Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159. https://doi.org/10.48550/arXiv.2010.04159

Title: Deep learning for thyroid nodule examination: a technical review
Authors: Debottama Das
M. Sriram Iyengar
Mohammad S. Majdi
Jeffrey J. Rodriguez
Mahmoud Alsayed
Publication date: 01-03-2024
Publisher: Springer Netherlands
Published in: Artificial Intelligence Review / Issue 3/2024
Print ISSN: 0269-2821
Electronic ISSN: 1573-7462
DOI: https://doi.org/10.1007/s10462-023-10635-9

Springer Professional

Deep learning for thyroid nodule examination: a technical review

Abstract

Publisher's Note

1 Introduction

2 Clinical overview

2.1 Symptoms and causes

2.2 Diagnostic tests

2.3 Surgical treatment

3 Research methodology

4 Data availability

4.1 Pathology data

4.2 Image data

5 Preprocessing strategies

6 DL architectures in thyroid US image analysis

6.1 Classification

6.2 Segmentation

6.3 Detection

7 Conclusion

Declarations

Competing interest

Publisher's Note

Premium Partner

Springer Professional

Abstract

Publisher's Note

1 Introduction

2 Clinical overview

2.1 Symptoms and causes

2.2 Diagnostic tests

2.3 Surgical treatment

3 Research methodology

4 Data availability

4.1 Pathology data

4.2 Image data

5 Preprocessing strategies

6 DL architectures in thyroid US image analysis

6.1 Classification

6.2 Segmentation

6.3 Detection

7 Conclusion

Declarations

Competing interest

Publisher's Note

Other articles of this Issue 3/2024

Categorization and correlational analysis of quality factors influencing citation

Unlabeled learning algorithms and operations: overview and future trends in defense sector

Gene selection for high dimensional biological datasets using hybrid island binary artificial bee colony with chaos game optimization

An integrated multi-polar fuzzy N-soft preference ranking organization method for enrichment of evaluations of the digitization of global economy

Fashion intelligence in the Metaverse: promise and future prospects

Energy-efficient scheduling model and method for assembly blocking permutation flow-shop in industrial robotics field

Premium Partner