Lung cancer represents a significant global health problem, accounting for more than 1.7 million deaths worldwide in 2021.[1][1] Despite advances in cancer treatment over the last decade, the 5-year survival rate is still around 50% for surgically resected non-small-cell lung cancer (NSCLC). Even
Lung cancer is the leading cause of cancer death. From the estimation of cases that will be in 2021, more than 230,000 new cases are expected to be of lung cancer patients, with an estimation of more than 131,000 deaths. Improving the survival rates or the patient's quality of life is partially covered by a common element: treatments. Collective knowledge about cancer treatment recommendations is typically included in clinical guidelines, intended to optimize patient care and assist clinicians in lung cancer treatment. These guidelines define a set of treatment paths, where recommendations depend on cancer disease aspects and individual features for a concrete patient. Although oncologists are expected to follow clinical guidelines, the inter and intrapatients' variability of response to the possible treatment combinations, makes it necessary to personalize different treatment-patterns on certain cases. Additionally, clinical guidelines are not frequently updated with new findings or lack a consistent methodology when they are frequently updated. For that reason, the analysis of patterns on both patients treated following the standard of care, or outside it, would allow to validate clinical guidelines and identify potential new treatment recommendations. In this work, we have analysed whether actual treatments prescribed to lung cancer patients follow clinical guidelines or not. Using a machine learning method that provides as output association rules (Apriori), we identify patterns based on cancer stage. These preliminary results show that treatments patterns found mostly match with clinical guidelines recommendations, validating the information included in the consulted guidelines.
Circadian rhythms impose daily rhythms a remarkable variety of metabolic and physiological functions, such as cell proliferation, inflammation, and DNA damage response. Accumulating epidemiological and genetic evidence indicates that circadian rhythms' disruption may be linked to cancer. The integration of circadian biology into cancer research may offer new options for increasing cancer treatment effectiveness and would encompass the prevention, diagnosis, and treatment of this disease.In recent years, there has been a significant development and use of multi-modal sensors to monitor physical activity, sleep, and circadian rhythms, allowing, for the very first time, scaling accurate sleep monitoring to epidemiological research linking sleep patterns to disease, and wellness applications providing new potential applications. This review highlights the role of circadian clock in tumorigenesis, cancer hallmarks and introduces the state-of-the-art in sleep-monitoring technologies, discussing the eventual application of insights in clinical settings and cancer research.
The vision of IASIS project is to turn the wave of big biomedical data heading our way into actionable knowledge for decision makers. This is achieved by integrating data from disparate sources, including genomics, electronic health records and bibliography, and applying advanced analytics methods to discover useful patterns. The goal is to turn large amounts of available data into actionable information to authorities for planning public health activities and policies. The integration and analysis of these heterogeneous sources of information will enable the best decisions to be made, allowing for diagnosis and treatment to be personalised to each individual. The project offers a common representation schema for the heterogeneous data sources. The iASiS infrastructure is able to convert clinical notes into usable data, combine them with genomic data, related bibliography, image data and more, and create a global knowledge base. This facilitates the use of intelligent methods in order to discover useful patterns across different resources. Using semantic integration of data gives the opportunity to generate information that is rich, auditable and reliable. This information can be used to provide better care, reduce errors and create more confidence in sharing data, thus providing more insights and opportunities. Data resources for two different disease categories are explored within the iASiS use cases, dementia and lung cancer.
The COVID-19 pandemic has changed the usual working of many hospitalization units (or wards). Few studies have used electronic nursing clinical notes (ENCN) and their unstructured text to identify alterations in patients' feelings and therapeutic procedures of interest.This study aimed to analyze positive or negative sentiments through inspection of the free text of the ENCN, compare sentiments of ENCN with or without hospitalized patients with COVID-19, carry out temporal analysis of the sentiments of the patients during the start of the first wave of the COVID-19 pandemic, and identify the topics in ENCN.This is a descriptive study with analysis of the text content of ENCN. All ENCNs between January and June 2020 at Guadarrama Hospital (Madrid, Spain) extracted from the CGM Selene Electronic Health Records System were included. Two groups of ENCNs were analyzed: one from hospitalized patients in post-intensive care units for COVID-19 and a second group from hospitalized patients without COVID-19. A sentiment analysis was performed on the lemmatized text, using the National Research Council of Canada, Affin, and Bing dictionaries. A polarity analysis of the sentences was performed using the Bing dictionary, SO Dictionaries V1.11, and Spa dictionary as amplifiers and decrementators. Machine learning techniques were applied to evaluate the presence of significant differences in the ENCN in groups of patients with and those without COVID-19. Finally, a structural analysis of thematic models was performed to study the abstract topics that occur in the ENCN, using Latent Dirichlet Allocation topic modeling.A total of 37,564 electronic health records were analyzed. Sentiment analysis in ENCN showed that patients with subacute COVID-19 have a higher proportion of positive sentiments than those without COVID-19. Also, there are significant differences in polarity between both groups (Z=5.532, P<.001) with a polarity of 0.108 (SD 0.299) in patients with COVID-19 versus that of 0.09 (SD 0.301) in those without COVID-19. Machine learning modeling reported that despite all models presenting high values, it is the neural network that presents the best indicators (>0.8) and with significant P values between both groups. Through Structural Topic Modeling analysis, the final model containing 10 topics was selected. High correlations were noted among topics 2, 5, and 8 (pressure ulcer and pharmacotherapy treatment), topics 1, 4, 7, and 9 (incidences related to fever and well-being state, and baseline oxygen saturation) and topics 3 and 10 (blood glucose level and pain).The ENCN may help in the development and implementation of more effective programs, which allows patients with COVID-19 to adopt to their prepandemic lifestyle faster. Topic modeling could help identify specific clinical problems in patients and better target the care they receive.
The wide adoption of electronic health records (EHRs) offers a potential source to support research. Lung cancer is one of the most common cancer in the world. Although several tools have been developed to automatically extract concepts from oncology clinical notes, still there is a gap between concept extraction and concept understanding. The high number of clinical notes for the same patient, use of negation and proper date annotations lays in the root of the problem. In this paper, we propose an approach to accurate Lung cancer diagnosis extraction from clinical notes written in Spanish. The approach deals with a disambiguation process required to extract the correct date and diagnosis of a patient having hundreds of clinical notes and consequently hundreds of annotations. Results obtained on an annotated database of 1000 patients show an F-score of 90%.
PURPOSE Stratifying patients with cancer according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to use machine learning to estimate probability of relapse in patients with early-stage non–small-cell lung cancer (NSCLC)? MATERIALS AND METHODS For predicting relapse in 1,387 patients with early-stage (I-II) NSCLC from the Spanish Lung Cancer Group data (average age 65.7 years, female 24.8%, male 75.2%), we train tabular and graph machine learning models. We generate automatic explanations for the predictions of such models. For models trained on tabular data, we adopt SHapley Additive exPlanations local explanations to gauge how each patient feature contributes to the predicted outcome. We explain graph machine learning predictions with an example-based method that highlights influential past patients. RESULTS Machine learning models trained on tabular data exhibit a 76% accuracy for the random forest model at predicting relapse evaluated with a 10-fold cross-validation (the model was trained 10 times with different independent sets of patients in test, train, and validation sets, and the reported metrics are averaged over these 10 test sets). Graph machine learning reaches 68% accuracy over a held-out test set of 200 patients, calibrated on a held-out set of 100 patients. CONCLUSION Our results show that machine learning models trained on tabular and graph data can enable objective, personalized, and reproducible prediction of relapse and, therefore, disease outcome in patients with early-stage NSCLC. With further prospective and multisite validation, and additional radiological and molecular data, this prognostic model could potentially serve as a predictive decision support tool for deciding the use of adjuvant treatments in early-stage lung cancer.