Stefan Fröhling

National Center for Tumor Diseases

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Titus J. Brinker

German Cancer Research Center

Jochen Utikal

German Cancer Research Center

Achim Hekler

National Center for Tumor Diseases

Christof von Kalle

Berlin Institute of Health at Charité - Universitätsmedizin Berlin

Eva Krieghoff‐Henning

National Center for Tumor Diseases

Sebastian Haferkamp

University Hospital Regensburg

Axel Hauschild

University Hospital Schleswig-Holstein

Wiebke Sondermann

University of Duisburg-Essen

Dirk Schadendorf

Essen University Hospital

Bastian Schilling

Universitätsklinikum Würzburg

Cooperative Institutions

Heidelberg University

German Cancer Research Center

University Hospital Heidelberg

National Center for Tumor Diseases

Ludwig-Maximilians-Universität München

Deutschen Konsortium für Translationale Krebsforschung

LMU Klinikum

Essen University Hospital

University of Duisburg-Essen

University Hospital Schleswig-Holstein

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Artificial Intelligence in Skin Cancer Diagnostics: The Patients' Perspective

Frontiers in Medicine (2020)

Tanja Jutzi Eva Krieghoff‐Henning Tim Holland‐Letz Jochen Utikal Axel Hauschild

Background: Artificial intelligence has shown promise in numerous experimental studies, particularly in skin cancer diagnostics. Translation of these findings into the clinic is the logical next step. This translation can only be successful if patients’ concerns and questions are addressed suitably. We therefore conducted a survey to evaluate the patients’ view of artificial intelligence in melanoma diagnostics in Germany. Participants and Methods: A web-based questionnaire was designed using LimeSurvey, sent by e-mail to university hospitals and melanoma support groups and advertised on social media. The anonymous questionnaire evaluated patients’ expectations and concerns towards artificial intelligence in general as well as their attitudes towards different application scenarios. Descriptive analysis was performed with expression of categorical variables as percentages and 95% confidence intervals. Statistical tests were performed to investigate associations between sociodemographic data and selected items of the questionnaire. Results: 298 people (154 with melanoma diagnosis, 143 without) responded to the questionnaire. About 94% [95% CI = 0.913 – 0.967] of respondents supported the use of artificial intelligence in medical approaches. 88% [95% CI = 0.846 – 0.919] would even make their own health data anonymously available for the further development of AI-based applications in medicine. Only 41% [95% CI = 0.350 – 0.462] of respondents were amenable to the use of artificial intelligence as stand-alone system, 94% [95% CI = 0.917 – 0.969 to its use as assistant system for physicians. In sub-group analyses, only minor differences were detectable. Respondents with a previous history of melanoma were more amenable to the use of AI applications for early detection even at home. They would prefer an application scenario where physician and AI classify the lesions independently. With respect to AI-based applications in medicine, patients were concerned about insufficient data protection, impersonality and susceptibility to errors, but expected faster, more precise and unbiased diagnostics, less diagnostic errors and support for physicians. Conclusions: The vast majority of participants exhibited a positive attitude towards the use of artificial intelligence in melanoma diagnostics, especially as an assistance system.

Categorical variable

10.3389/fmed.2020.00233

Cite

Citations (131)

Überdiagnose von Melanomen – Ursachen, Konsequenzen und Lösungsansätze

JDDG Journal der Deutschen Dermatologischen Gesellschaft (2020)

Heinz Kutzner Tanja Jutzi Dieter Krahl Eva Krieghoff‐Henning Markus V. Heppt

Zusammenfassung Das maligne Melanom ist diejenige Form von Hautkrebs, an der die meisten Menschen sterben. Im Frühstadium ist das Melanom gut behandelbar, eine Früherkennung ist also lebenswichtig. Kritiker bemängeln, dass seit der bundesweiten Einführung des Hautkrebs‐Screenings häufiger Melanome diagnostiziert werden, die Sterblichkeit am malignen Melanom jedoch nicht zurückgegangen ist. Sie führen dies vor allem auf Überdiagnosen zurück. Ein Grund ist die zum Teil komplexe Unterscheidung zwischen benignen und malignen Läsionen. Hinzu kommt, dass es auch Übergangsformen zwischen eindeutig gut‐ oder bösartigen Läsionen geben kann, und dass einige bösartige Läsionen so wenig aggressiv wachsen, dass sie nie lebensbedrohlich geworden wären. Bisher kann mangels entsprechender Biomarker nicht festgestellt werden, bei welchen Melanomen dies der Fall ist. Auch die Progressionswahrscheinlichkeit eines In‐situ‐Melanoms zu einem invasiven Tumor kann bisher nicht sicher beurteilt werden. Die Konsequenzen für überdiagnostizierte benigne Läsionen sind unnötige psychische und körperliche Belastungen für die Betroffenen und anfallende Therapiekosten. Umgekehrt können Unterdiagnosen zu gravierenden Einschränkungen der Prognose der Betroffenen und zur Notwendigkeit belastender(er) Therapien führen. Präzisere Diagnosemöglichkeiten könnten die Anzahl der korrekten Diagnosen erhöhen. Hier haben sich in Studien mit auf künstlicher Intelligenz basierenden Assistenzsystemen bereits erste Erfolge gezeigt, die allerdings noch in die klinische und pathologische Routine übertragen werden müssen.

10.1111/ddg.14233_g

Cite

Citations (0)

Skin Cancer Classification Using Convolutional Neural Networks with Integrated Patient Data: A Systematic Review (Preprint)

Journal of Medical Internet Research (2021)

Julia Höhn Achim Hekler Eva Krieghoff‐Henning Jakob Nikolas Kather Jochen Utikal

Recent years have been witnessing a substantial improvement in the accuracy of skin cancer classification using convolutional neural networks (CNNs). CNNs perform on par with or better than dermatologists with respect to the classification tasks of single images. However, in clinical practice, dermatologists also use other patient data beyond the visual aspects present in a digitized image, further increasing their diagnostic accuracy. Several pilot studies have recently investigated the effects of integrating different subtypes of patient data into CNN-based skin cancer classifiers.

Contextual image classification

10.2196/20708

Cite

Citations (46)

Robustness of convolutional neural networks in recognition of pigmented skin lesions

European Journal of Cancer (2021)

Roman C. Maron Sarah Haggenmüller Christof von Kalle Jochen Utikal Friedegund Meier

A basic requirement for artificial intelligence (AI)-based image analysis systems, which are to be integrated into clinical practice, is a high robustness. Minor changes in how those images are acquired, for example, during routine skin cancer screening, should not change the diagnosis of such assistance systems.To quantify to what extent minor image perturbations affect the convolutional neural network (CNN)-mediated skin lesion classification and to evaluate three possible solutions for this problem (additional data augmentation, test-time augmentation, anti-aliasing).We trained three commonly used CNN architectures to differentiate between dermoscopic melanoma and nevus images. Subsequently, their performance and susceptibility to minor changes ('brittleness') was tested on two distinct test sets with multiple images per lesion. For the first set, image changes, such as rotations or zooms, were generated artificially. The second set contained natural changes that stemmed from multiple photographs taken of the same lesions.All architectures exhibited brittleness on the artificial and natural test set. The three reviewed methods were able to decrease brittleness to varying degrees while still maintaining performance. The observed improvement was greater for the artificial than for the natural test set, where enhancements were minor.Minor image changes, relatively inconspicuous for humans, can have an effect on the robustness of CNNs differentiating skin lesions. By the methods tested here, this effect can be reduced, but not fully eliminated. Thus, further research to sustain the performance of AI classifiers is needed to facilitate the translation of such systems into the clinic.

Robustness

10.1016/j.ejca.2020.11.020

Cite

Citations (45)

Hidden Variables in Deep Learning Digital Pathology and Their Potential to Cause Batch Effects: Prediction Model Study (Preprint)

Max Schmitt Roman C. Maron Achim Hekler Albrecht Stenzinger Axel Hauschild

BACKGROUND An increasing number of studies within digital pathology show the potential of artificial intelligence (AI) to diagnose cancer using histological whole slide images, which requires large and diverse data sets. While diversification may result in more generalizable AI-based systems, it can also introduce hidden variables. If neural networks are able to distinguish/learn hidden variables, these variables can introduce batch effects that compromise the accuracy of classification systems. OBJECTIVE The objective of the study was to analyze the learnability of an exemplary selection of hidden variables (patient age, slide preparation date, slide origin, and scanner type) that are commonly found in whole slide image data sets in digital pathology and could create batch effects. METHODS We trained four separate convolutional neural networks (CNNs) to learn four variables using a data set of digitized whole slide melanoma images from five different institutes. For robustness, each CNN training and evaluation run was repeated multiple times, and a variable was only considered learnable if the lower bound of the 95% confidence interval of its mean balanced accuracy was above 50.0%. RESULTS A mean balanced accuracy above 50.0% was achieved for all four tasks, even when considering the lower bound of the 95% confidence interval. Performance between tasks showed wide variation, ranging from 56.1% (slide preparation date) to 100% (slide origin). CONCLUSIONS Because all of the analyzed hidden variables are learnable, they have the potential to create batch effects in dermatopathology data sets, which negatively affect AI-based classification systems. Practitioners should be aware of these and similar pitfalls when developing and evaluating such systems and address these and potentially other batch effect variables in their data sets through sufficient data set stratification.

Digital Pathology

Robustness

10.2196/preprints.23436

Cite

Citations (0)

Diagnostic performance of artificial intelligence for histologic melanoma recognition compared to 18 international expert pathologists: Supplementary Material

Eva Krieghoff‐Henning Titus J. Brinker Max Schmitt Raymond L. Barnhill Helmut Beltraminelli

10.17632/j87c9jshxy.1

Cite

Citations (0)

Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images

European Journal of Cancer (2019)

Achim Hekler Jochen Utikal Alexander Enk Wiebke Solaß Max Schmitt

The diagnosis of most cancers is made by a board-certified pathologist based on a tissue biopsy under the microscope. Recent research reveals a high discordance between individual pathologists. For melanoma, the literature reports on 25-26% of discordance for classifying a benign nevus versus malignant melanoma. A recent study indicated the potential of deep learning to lower these discordances. However, the performance of deep learning in classifying histopathologic melanoma images was never compared directly to human experts. The aim of this study is to perform such a first direct comparison.A total of 695 lesions were classified by an expert histopathologist in accordance with current guidelines (350 nevi/345 melanoma). Only the haematoxylin & eosin (H&E) slides of these lesions were digitalised via a slide scanner and then randomly cropped. A total of 595 of the resulting images were used to train a convolutional neural network (CNN). The additional 100 H&E image sections were used to test the results of the CNN in comparison to 11 histopathologists. Three combined McNemar tests comparing the results of the CNNs test runs in terms of sensitivity, specificity and accuracy were predefined to test for significance (p < 0.05).The CNN achieved a mean sensitivity/specificity/accuracy of 76%/60%/68% over 11 test runs. In comparison, the 11 pathologists achieved a mean sensitivity/specificity/accuracy of 51.8%/66.5%/59.2%. Thus, the CNN was significantly (p = 0.016) superior in classifying the cropped images.With limited image information available, a CNN was able to outperform 11 histopathologists in the classification of histopathological melanoma images and thus shows promise to assist human melanoma diagnoses.

McNemar's test

Eosin

Haematoxylin

10.1016/j.ejca.2019.06.012

Cite

Citations (235)

Pathologist-level classification of histopathological melanoma images with deep neural networks

European Journal of Cancer (2019)

Achim Hekler Jochen Utikal Alexander Enk Carola Berking Joachim Klode

BackgroundThe diagnosis of most cancers is made by a board-certified pathologist based on a tissue biopsy under the microscope. Recent research reveals a high discordance between individual pathologists. For melanoma, the literature reports 25–26% of discordance for classifying a benign nevus versus malignant melanoma. Deep learning was successfully implemented to enhance the precision of lung and breast cancer diagnoses. The aim of this study is to illustrate the potential of deep learning to assist human assessment for a histopathologic melanoma diagnosis.MethodsSix hundred ninety-five lesions were classified by an expert histopathologist in accordance with current guidelines (350 nevi and 345 melanomas). Only the haematoxylin and eosin stained (H&E) slides of these lesions were digitalised using a slide scanner and then randomly cropped. Five hundred ninety-five of the resulting images were used for the training of a convolutional neural network (CNN). The additional 100 H&E image sections were used to test the results of the CNN in comparison with the original class labels.FindingsThe total discordance with the histopathologist was 18% for melanoma (95% confidence interval [CI]: 7.4–28.6%), 20% for nevi (95% CI: 8.9–31.1%) and 19% for the full set of images (95% CI: 11.3–26.7%).InterpretationEven in the worst case, the discordance of the CNN was about the same compared with the discordance between human pathologists as reported in the literature. Despite the vastly reduced amount of data, time necessary for diagnosis and cost compared with the pathologist, our CNN archived on-par performance. Conclusively, CNNs indicate to be a valuable tool to assist human melanoma diagnoses.

Haematoxylin

Histopathology

Eosin

10.1016/j.ejca.2019.04.021

Cite

Citations (203)

Superior skin cancer classification by the combination of human and artificial intelligence

European Journal of Cancer (2019)

Achim Hekler Jochen Utikal Alexander Enk Axel Hauschild Michael Weichenthal

BackgroundIn recent studies, convolutional neural networks (CNNs) outperformed dermatologists in distinguishing dermoscopic images of melanoma and nevi. In these studies, dermatologists and artificial intelligence were considered as opponents. However, the combination of classifiers frequently yields superior results, both in machine learning and among humans. In this study, we investigated the potential benefit of combining human and artificial intelligence for skin cancer classification.MethodsUsing 11,444 dermoscopic images, which were divided into five diagnostic categories, novel deep learning techniques were used to train a single CNN. Then, both 112 dermatologists of 13 German university hospitals and the trained CNN independently classified a set of 300 biopsy-verified skin lesions into those five classes. Taking into account the certainty of the decisions, the two independently determined diagnoses were combined to a new classifier with the help of a gradient boosting method. The primary end-point of the study was the correct classification of the images into five designated categories, whereas the secondary end-point was the correct classification of lesions as either benign or malignant (binary classification).FindingsRegarding the multiclass task, the combination of man and machine achieved an accuracy of 82.95%. This was 1.36% higher than the best of the two individual classifiers (81.59% achieved by the CNN). Owing to the class imbalance in the binary problem, sensitivity, but not accuracy, was examined and demonstrated to be superior (89%) to the best individual classifier (CNN with 86.1%). The specificity in the combined classifier decreased from 89.2% to 84%. However, at an equal sensitivity of 89%, the CNN achieved a specificity of only 81.5%InterpretationOur findings indicate that the combination of human and artificial intelligence achieves superior results over the independent results of both of these systems.

Multiclass classification

Binary classification

Boosting

10.1016/j.ejca.2019.07.019

Cite

Citations (297)

Artificial Intelligence and Its Effect on Dermatologists’ Accuracy in Dermoscopic Melanoma Image Classification: Web-Based Survey Study

Journal of Medical Internet Research (2020)

Roman C. Maron Jochen Utikal Achim Hekler Axel Hauschild Elke Sattler

Background Early detection of melanoma can be lifesaving but this remains a challenge. Recent diagnostic studies have revealed the superiority of artificial intelligence (AI) in classifying dermoscopic images of melanoma and nevi, concluding that these algorithms should assist a dermatologist’s diagnoses. Objective The aim of this study was to investigate whether AI support improves the accuracy and overall diagnostic performance of dermatologists in the dichotomous image–based discrimination between melanoma and nevus. Methods Twelve board-certified dermatologists were presented disjoint sets of 100 unique dermoscopic images of melanomas and nevi (total of 1200 unique images), and they had to classify the images based on personal experience alone (part I) and with the support of a trained convolutional neural network (CNN, part II). Additionally, dermatologists were asked to rate their confidence in their final decision for each image. Results While the mean specificity of the dermatologists based on personal experience alone remained almost unchanged (70.6% vs 72.4%; P=.54) with AI support, the mean sensitivity and mean accuracy increased significantly (59.4% vs 74.6%; P=.003 and 65.0% vs 73.6%; P=.002, respectively) with AI support. Out of the 10% (10/94; 95% CI 8.4%-11.8%) of cases where dermatologists were correct and AI was incorrect, dermatologists on average changed to the incorrect answer for 39% (4/10; 95% CI 23.2%-55.6%) of cases. When dermatologists were incorrect and AI was correct (25/94, 27%; 95% CI 24.0%-30.1%), dermatologists changed their answers to the correct answer for 46% (11/25; 95% CI 33.1%-58.4%) of cases. Additionally, the dermatologists’ average confidence in their decisions increased when the CNN confirmed their decision and decreased when the CNN disagreed, even when the dermatologists were correct. Reported values are based on the mean of all participants. Whenever absolute values are shown, the denominator and numerator are approximations as every dermatologist ended up rating a varying number of images due to a quality control step. Conclusions The findings of our study show that AI support can improve the overall accuracy of the dermatologists in the dichotomous image–based discrimination between melanoma and nevus. This supports the argument for AI-based tools to aid clinicians in skin lesion classification and provides a rationale for studies of such classifiers in real-life settings, wherein clinicians can integrate additional information such as patient age and medical history into their decisions.

10.2196/18091

Cite

Citations (66)