Introduction

Artificial intelligence (AI) shows great promise in revolutionizing the science and practice of dermatology [1]. Although most of the clinical application of AI in dermatology to date has focused on the diagnosis of melanoma, its potential in evaluating and determining treatments for nonmelanoma skin cancers (NMSCs) [2] is also critical [3].

In this review, we define NMSC as inclusive of two of the most common causes of human skin cancers, basal cell carcinoma (BCC), and squamous cell carcinoma (SCC), in addition to Merkel cell carcinoma (MCC), a rare but critical diagnosis [4]. We will explore the current performance, risks, and benefits of AI for NMSC diagnosis and treatment and evaluate the role of policy and guidelines in guiding future development.

AI as a Diagnostic Tool

Diagnostic AI algorithms are trained on existing images taken of cases encountered by clinicians in the field. The quality of images and comparative “gold standard” varies from patient-generated cellular phone images with clinical diagnosis to high-quality photographs and dermatoscopic images with histologic confirmation. Although a complete recount of AI development is outside the scope of this work, it is important to note that AI can be developed using machine learning (ML), which can broadly be conceptualized as a computer algorithm that trains itself based on analysis of data (training sets) that are fed to the algorithm [5••]. AI modalities currently used in dermatology are often adjunctive to diagnostic tools but have broader potential [6].

The quality of any AI is largely dependent on the number, quality, and diagnostic accuracy of the images used to train it. A recent prospective study conducted on telehealth visits during the COVID-19 pandemic evaluated the performance of AI using smartphone images versus dermoscopy to diagnose NMSC lesions [7]. The performance of the algorithm demonstrated that AI used on dermatoscope-acquired images performed better in terms of accuracy as well as sensitivity with a score of 95.3% versus 75.3% in smartphone image performance with a similar specificity for both [7]. These findings are one of few evaluations of real-world performance for AI in skin cancer diagnosis, with the remainder being only in silico, retrospective studies [5••]. These findings demonstrate both the potential for AI diagnostics while underscoring the potential gap between theoretical and real-world performance.

One way of addressing this challenge is the emerging concept of “translational machine learning,” which seeks to bridge the gap between ML algorithms and clinical practice [8]. This approach considers AI to be more “augmented” intelligence than “artificial” intelligence in its implementation. AI is often not meant to supplant expert clinician guidance, but rather is intended to be used in conjunction with clinician evaluation to improve accuracy in detection [9]. By improving AI diagnostic capabilities and fostering trust in clinician use of AI, the goal is to develop AI models that are both generalizable and reproducible for clinical application [10].

AI Utility in Treatment of NMSC

Beyond diagnosis, AI has potential in guiding the non-invasive diagnostic workup of NMSC, in offering a therapeutic response prediction, and in accelerating the development of new therapeutics. Multidisciplinary approaches to the evaluation of complex NMSC cases have been demonstrated to improve patient outcomes and are quickly becoming standard of care, including in the UK National Health Service [11]. In the absence of multidisciplinary teams, AI-based algorithms may have a role in guiding care. In one study, supervised ML algorithms were used to create a risk-stratification model that resulted in a 45.1% reliable prediction of a multi-disciplinary medical team choosing conventional treatment methods, surgical resection, or radiotherapy for treating complicated BCC cases, as well as a 37.5% prediction reliability in triaging for Mohs micrographic surgery [11].

An additional use for AI in treatment of NMSC is in treatment design and prediction to response. Computational drug design approaches have been used to identify natural antiviral drug candidates in the search for a cure for MCC by identifying natural therapeutics capable of perturbing the MCPyV LT protein’s possible oncogenic function [4]. Another algorithm demonstrated the ability to predict response to radiotherapy following SCC diagnosis and treatment with an 85.7% sensitivity, 97.6% specificity, and 91.7% accuracy [12]. Although these approaches are nascent, the potential for AI as a toll in diagnosis, evaluation, and treatment of NMSC is clear.

Areas for Further Improvement in AI for NMSC

The Role of Guidelines

Despite its diagnostic capabilities and multidisciplinary applications, the reliability and methods behind AI’s functionality are complex and poorly understood, hindering its widespread adoption [13•]. The mechanism of action of AI has been described as a “black-box,” as the algorithms themselves and the training sets they are based on are not accessible for evaluation by the physicians using them [1]. As a result, physicians may not be able to identify simple errors by the AI, such as a diagnostic decision inappropriately influenced by surgical skin markings or other artifacts found in training datasets rather than the visual characteristics actually pertaining to NMSC [13•]. In real-life applications, this can translate to possible misdiagnoses in the clinic, increased false positives or negatives in AI performance studies, and overall decreased validity [13•]. For these reasons, guidelines and standardized datasets are essential to develop AI models that can be reliably applied in the clinical setting.

Guidelines have been proposed to aid in transparent creation and evaluation of AI tools. The CLEAR criteria, developed in 2022, proposes a list of 25 items that should be reported as part of “best practices of image-based AI development and assessment in dermatology.” [10] These items, which cover the data put into the model, the algorithm development technique, methods of technical assessment, and application, seek to provide insight into the creation of otherwise inscrutable algorithms [10]. Including these checklists in publications will differentiate high-quality algorithms from weaker ones and may help promote clinician and patient trust and acceptance of AI models in the diagnosis and treatment of dermatologic disorders [14].

Diagnosing NMSC in Skin of Color

Despite having a lower incidence in patients with darker skin, diagnosis of NMSC is often delayed in patients of color, resulting in higher morbidity and mortality in those patients when compared to their lighter skin counterparts [15]. A 2021 study in a Hawaiian multiethnic population concluded that heterogeneity of complexion types, particularly Fitzpatrick skin types 4–6, were lacking in publicly available AI training sets, which negatively impacted their AI performance when distinguishing between melanoma and nonmelanoma skin cancers on these skin types [16]. Limited incidence directly results in fewer images for algorithm training and worse AI performance for NMSC detection among darker skin tones [16, 17•]. For AI to perform accurately across all demographics, more inclusive image repositories are necessary, especially ones that include images and information on skin cancers in patients with skin of color [17•].

While long-term solutions to these problems require improved data curation, other approaches may augment the ability of AI among patients of color. A recent effort included artificially darkening the skin tone of existing images of patients with light skin in order to test the ability of AI to distinguish between melanoma and BCC in skin of color [18]. After training the AI tool with this dataset, AI diagnoses on actual patients with darker skin tones were of higher sensitivity, specificity, positive predictive value, and negative predictive value [18]. While training AI with artificially darkened photos may be a temporary solution, the ultimate goal should be to build datasets from real patients with skin of color to capture the true variability AI will be exposed to in clinical practice.

Some studies have offered solutions to potentially mitigate disparities in clinical image collection by diverting data collection characteristics to more quantifiable markers, such as specific genes. As an example, one study demonstrated an ability to use genetic biomarkers as prognostic and diagnostic indicators for SCC using a form of explainable AI on machine learning models known as XGBoost, which are models that are trained on binary classification datasets [19]. Binary classification datasets are defined by two classes or labels [19]. In this case, genetic biomarkers may be limited to specific patient populations because only certain genetic sequences have been identified [19]. This approach highlights the necessity of diverse data acquisition, which as discussed below is an endeavor that should ultimately begin in the clinic. Nevertheless, it is a step in the direction of widespread AI adoptability in treating NMSC.

Curating data

The performance of any AI algorithm is ultimately determined by the quality of the training data, and individuals and clinical scenarios that fall outside of the characteristics included in training datasets may not be reliably diagnosed by AI [20]. One approach to this challenge is active investment in data curation by dermatologists. In contrast to clinicians who lack specialty expertise, dermatologists see more NMSC cases across a broader range of skin tones, have more access to information and findings in the realm of skin cancers, and can contribute more to the standardization of datasets embedded in AI algorithms [21••]. Economic models that seek to compensate physicians for phenotyping and data curation and patients for sharing their medical information should be considered, especially if the algorithms are to be used for commercial intent.

Conclusion

AI holds tremendous promise as more than a diagnostic tool in improving morbidity, mortality, and outcomes in patients with NMSC. Although there are still milestones left to achieve, at present AI algorithms demonstrate accuracy and overall performance in silico in line with dermatologists [6, 16]. In addition to diagnostic utility, AI may have a role in many other aspects of NMSC care. Continued emphasis on maintaining accuracy and accessibility across a range of skin tones is critical as we seek to reduce, not recreate, disparities in care with this next-generation technology. No matter the direction the future of AI may shift towards, dermatologists should be at the forefront to help guide its application and maintain clinically reasonable standards along the way. The datasets supporting the conclusion of this article are available in the Topical Collection on Skin Cancer repository.