Background: Artificial intelligence has shown promise in numerous experimental studies, particularly in skin cancer diagnostics. Translation of these findings into the clinic is the logical next step. This translation can only be successful if patients’ concerns and questions are addressed suitably. We therefore conducted a survey to evaluate the patients’ view of artificial intelligence in melanoma diagnostics in Germany. Participants and Methods: A web-based questionnaire was designed using LimeSurvey, sent by e-mail to university hospitals and melanoma support groups and advertised on social media. The anonymous questionnaire evaluated patients’ expectations and concerns towards artificial intelligence in general as well as their attitudes towards different application scenarios. Descriptive analysis was performed with expression of categorical variables as percentages and 95% confidence intervals. Statistical tests were performed to investigate associations between sociodemographic data and selected items of the questionnaire. Results: 298 people (154 with melanoma diagnosis, 143 without) responded to the questionnaire. About 94% [95% CI = 0.913 – 0.967] of respondents supported the use of artificial intelligence in medical approaches. 88% [95% CI = 0.846 – 0.919] would even make their own health data anonymously available for the further development of AI-based applications in medicine. Only 41% [95% CI = 0.350 – 0.462] of respondents were amenable to the use of artificial intelligence as stand-alone system, 94% [95% CI = 0.917 – 0.969 to its use as assistant system for physicians. In sub-group analyses, only minor differences were detectable. Respondents with a previous history of melanoma were more amenable to the use of AI applications for early detection even at home. They would prefer an application scenario where physician and AI classify the lesions independently. With respect to AI-based applications in medicine, patients were concerned about insufficient data protection, impersonality and susceptibility to errors, but expected faster, more precise and unbiased diagnostics, less diagnostic errors and support for physicians. Conclusions: The vast majority of participants exhibited a positive attitude towards the use of artificial intelligence in melanoma diagnostics, especially as an assistance system.
Zusammenfassung Das maligne Melanom ist diejenige Form von Hautkrebs, an der die meisten Menschen sterben. Im Frühstadium ist das Melanom gut behandelbar, eine Früherkennung ist also lebenswichtig. Kritiker bemängeln, dass seit der bundesweiten Einführung des Hautkrebs‐Screenings häufiger Melanome diagnostiziert werden, die Sterblichkeit am malignen Melanom jedoch nicht zurückgegangen ist. Sie führen dies vor allem auf Überdiagnosen zurück. Ein Grund ist die zum Teil komplexe Unterscheidung zwischen benignen und malignen Läsionen. Hinzu kommt, dass es auch Übergangsformen zwischen eindeutig gut‐ oder bösartigen Läsionen geben kann, und dass einige bösartige Läsionen so wenig aggressiv wachsen, dass sie nie lebensbedrohlich geworden wären. Bisher kann mangels entsprechender Biomarker nicht festgestellt werden, bei welchen Melanomen dies der Fall ist. Auch die Progressionswahrscheinlichkeit eines In‐situ‐Melanoms zu einem invasiven Tumor kann bisher nicht sicher beurteilt werden. Die Konsequenzen für überdiagnostizierte benigne Läsionen sind unnötige psychische und körperliche Belastungen für die Betroffenen und anfallende Therapiekosten. Umgekehrt können Unterdiagnosen zu gravierenden Einschränkungen der Prognose der Betroffenen und zur Notwendigkeit belastender(er) Therapien führen. Präzisere Diagnosemöglichkeiten könnten die Anzahl der korrekten Diagnosen erhöhen. Hier haben sich in Studien mit auf künstlicher Intelligenz basierenden Assistenzsystemen bereits erste Erfolge gezeigt, die allerdings noch in die klinische und pathologische Routine übertragen werden müssen.
Recent years have been witnessing a substantial improvement in the accuracy of skin cancer classification using convolutional neural networks (CNNs). CNNs perform on par with or better than dermatologists with respect to the classification tasks of single images. However, in clinical practice, dermatologists also use other patient data beyond the visual aspects present in a digitized image, further increasing their diagnostic accuracy. Several pilot studies have recently investigated the effects of integrating different subtypes of patient data into CNN-based skin cancer classifiers.
A basic requirement for artificial intelligence (AI)-based image analysis systems, which are to be integrated into clinical practice, is a high robustness. Minor changes in how those images are acquired, for example, during routine skin cancer screening, should not change the diagnosis of such assistance systems.To quantify to what extent minor image perturbations affect the convolutional neural network (CNN)-mediated skin lesion classification and to evaluate three possible solutions for this problem (additional data augmentation, test-time augmentation, anti-aliasing).We trained three commonly used CNN architectures to differentiate between dermoscopic melanoma and nevus images. Subsequently, their performance and susceptibility to minor changes ('brittleness') was tested on two distinct test sets with multiple images per lesion. For the first set, image changes, such as rotations or zooms, were generated artificially. The second set contained natural changes that stemmed from multiple photographs taken of the same lesions.All architectures exhibited brittleness on the artificial and natural test set. The three reviewed methods were able to decrease brittleness to varying degrees while still maintaining performance. The observed improvement was greater for the artificial than for the natural test set, where enhancements were minor.Minor image changes, relatively inconspicuous for humans, can have an effect on the robustness of CNNs differentiating skin lesions. By the methods tested here, this effect can be reduced, but not fully eliminated. Thus, further research to sustain the performance of AI classifiers is needed to facilitate the translation of such systems into the clinic.
BACKGROUND An increasing number of studies within digital pathology show the potential of artificial intelligence (AI) to diagnose cancer using histological whole slide images, which requires large and diverse data sets. While diversification may result in more generalizable AI-based systems, it can also introduce hidden variables. If neural networks are able to distinguish/learn hidden variables, these variables can introduce batch effects that compromise the accuracy of classification systems. OBJECTIVE The objective of the study was to analyze the learnability of an exemplary selection of hidden variables (patient age, slide preparation date, slide origin, and scanner type) that are commonly found in whole slide image data sets in digital pathology and could create batch effects. METHODS We trained four separate convolutional neural networks (CNNs) to learn four variables using a data set of digitized whole slide melanoma images from five different institutes. For robustness, each CNN training and evaluation run was repeated multiple times, and a variable was only considered learnable if the lower bound of the 95% confidence interval of its mean balanced accuracy was above 50.0%. RESULTS A mean balanced accuracy above 50.0% was achieved for all four tasks, even when considering the lower bound of the 95% confidence interval. Performance between tasks showed wide variation, ranging from 56.1% (slide preparation date) to 100% (slide origin). CONCLUSIONS Because all of the analyzed hidden variables are learnable, they have the potential to create batch effects in dermatopathology data sets, which negatively affect AI-based classification systems. Practitioners should be aware of these and similar pitfalls when developing and evaluating such systems and address these and potentially other batch effect variables in their data sets through sufficient data set stratification.
The diagnosis of most cancers is made by a board-certified pathologist based on a tissue biopsy under the microscope. Recent research reveals a high discordance between individual pathologists. For melanoma, the literature reports on 25-26% of discordance for classifying a benign nevus versus malignant melanoma. A recent study indicated the potential of deep learning to lower these discordances. However, the performance of deep learning in classifying histopathologic melanoma images was never compared directly to human experts. The aim of this study is to perform such a first direct comparison.A total of 695 lesions were classified by an expert histopathologist in accordance with current guidelines (350 nevi/345 melanoma). Only the haematoxylin & eosin (H&E) slides of these lesions were digitalised via a slide scanner and then randomly cropped. A total of 595 of the resulting images were used to train a convolutional neural network (CNN). The additional 100 H&E image sections were used to test the results of the CNN in comparison to 11 histopathologists. Three combined McNemar tests comparing the results of the CNNs test runs in terms of sensitivity, specificity and accuracy were predefined to test for significance (p < 0.05).The CNN achieved a mean sensitivity/specificity/accuracy of 76%/60%/68% over 11 test runs. In comparison, the 11 pathologists achieved a mean sensitivity/specificity/accuracy of 51.8%/66.5%/59.2%. Thus, the CNN was significantly (p = 0.016) superior in classifying the cropped images.With limited image information available, a CNN was able to outperform 11 histopathologists in the classification of histopathological melanoma images and thus shows promise to assist human melanoma diagnoses.
BackgroundThe diagnosis of most cancers is made by a board-certified pathologist based on a tissue biopsy under the microscope. Recent research reveals a high discordance between individual pathologists. For melanoma, the literature reports 25–26% of discordance for classifying a benign nevus versus malignant melanoma. Deep learning was successfully implemented to enhance the precision of lung and breast cancer diagnoses. The aim of this study is to illustrate the potential of deep learning to assist human assessment for a histopathologic melanoma diagnosis.MethodsSix hundred ninety-five lesions were classified by an expert histopathologist in accordance with current guidelines (350 nevi and 345 melanomas). Only the haematoxylin and eosin stained (H&E) slides of these lesions were digitalised using a slide scanner and then randomly cropped. Five hundred ninety-five of the resulting images were used for the training of a convolutional neural network (CNN). The additional 100 H&E image sections were used to test the results of the CNN in comparison with the original class labels.FindingsThe total discordance with the histopathologist was 18% for melanoma (95% confidence interval [CI]: 7.4–28.6%), 20% for nevi (95% CI: 8.9–31.1%) and 19% for the full set of images (95% CI: 11.3–26.7%).InterpretationEven in the worst case, the discordance of the CNN was about the same compared with the discordance between human pathologists as reported in the literature. Despite the vastly reduced amount of data, time necessary for diagnosis and cost compared with the pathologist, our CNN archived on-par performance. Conclusively, CNNs indicate to be a valuable tool to assist human melanoma diagnoses.
BackgroundIn recent studies, convolutional neural networks (CNNs) outperformed dermatologists in distinguishing dermoscopic images of melanoma and nevi. In these studies, dermatologists and artificial intelligence were considered as opponents. However, the combination of classifiers frequently yields superior results, both in machine learning and among humans. In this study, we investigated the potential benefit of combining human and artificial intelligence for skin cancer classification.MethodsUsing 11,444 dermoscopic images, which were divided into five diagnostic categories, novel deep learning techniques were used to train a single CNN. Then, both 112 dermatologists of 13 German university hospitals and the trained CNN independently classified a set of 300 biopsy-verified skin lesions into those five classes. Taking into account the certainty of the decisions, the two independently determined diagnoses were combined to a new classifier with the help of a gradient boosting method. The primary end-point of the study was the correct classification of the images into five designated categories, whereas the secondary end-point was the correct classification of lesions as either benign or malignant (binary classification).FindingsRegarding the multiclass task, the combination of man and machine achieved an accuracy of 82.95%. This was 1.36% higher than the best of the two individual classifiers (81.59% achieved by the CNN). Owing to the class imbalance in the binary problem, sensitivity, but not accuracy, was examined and demonstrated to be superior (89%) to the best individual classifier (CNN with 86.1%). The specificity in the combined classifier decreased from 89.2% to 84%. However, at an equal sensitivity of 89%, the CNN achieved a specificity of only 81.5%InterpretationOur findings indicate that the combination of human and artificial intelligence achieves superior results over the independent results of both of these systems.
Background Early detection of melanoma can be lifesaving but this remains a challenge. Recent diagnostic studies have revealed the superiority of artificial intelligence (AI) in classifying dermoscopic images of melanoma and nevi, concluding that these algorithms should assist a dermatologist’s diagnoses. Objective The aim of this study was to investigate whether AI support improves the accuracy and overall diagnostic performance of dermatologists in the dichotomous image–based discrimination between melanoma and nevus. Methods Twelve board-certified dermatologists were presented disjoint sets of 100 unique dermoscopic images of melanomas and nevi (total of 1200 unique images), and they had to classify the images based on personal experience alone (part I) and with the support of a trained convolutional neural network (CNN, part II). Additionally, dermatologists were asked to rate their confidence in their final decision for each image. Results While the mean specificity of the dermatologists based on personal experience alone remained almost unchanged (70.6% vs 72.4%; P=.54) with AI support, the mean sensitivity and mean accuracy increased significantly (59.4% vs 74.6%; P=.003 and 65.0% vs 73.6%; P=.002, respectively) with AI support. Out of the 10% (10/94; 95% CI 8.4%-11.8%) of cases where dermatologists were correct and AI was incorrect, dermatologists on average changed to the incorrect answer for 39% (4/10; 95% CI 23.2%-55.6%) of cases. When dermatologists were incorrect and AI was correct (25/94, 27%; 95% CI 24.0%-30.1%), dermatologists changed their answers to the correct answer for 46% (11/25; 95% CI 33.1%-58.4%) of cases. Additionally, the dermatologists’ average confidence in their decisions increased when the CNN confirmed their decision and decreased when the CNN disagreed, even when the dermatologists were correct. Reported values are based on the mean of all participants. Whenever absolute values are shown, the denominator and numerator are approximations as every dermatologist ended up rating a varying number of images due to a quality control step. Conclusions The findings of our study show that AI support can improve the overall accuracy of the dermatologists in the dichotomous image–based discrimination between melanoma and nevus. This supports the argument for AI-based tools to aid clinicians in skin lesion classification and provides a rationale for studies of such classifiers in real-life settings, wherein clinicians can integrate additional information such as patient age and medical history into their decisions.