Contrastive learning has demonstrated promising performance in image and text domains either in a self-supervised or a supervised manner. In this work, we extend the supervised contrastive learning framework to clinical risk prediction problems based on longitudinal electronic health records (EHR). We propose a general supervised contrastive loss ℒContrastiveCrossEntropy+λℒSupervisedContrastiveRegularizer for learning both binary classification (e.g. in-hospital mortality prediction) and multi-label classification (e.g. phenotyping) in a unified framework. Our supervised contrastive loss practices the key idea of contrastive learning, namely, pulling similar samples closer and pushing dissimilar ones apart from each other, simultaneously by its two components: ℒContrastiveCrossEntropy tries to contrast samples with learned anchors which represent positive and negative clusters, and ℒSupervisedContrastiveRegularizer tries to contrast samples with each other according to their supervised labels. We propose two versions of the above supervised contrastive loss and our experiments on real-world EHR data demonstrate that our proposed loss functions show benefits in improving the performance of strong baselines and even state-of-the-art models on benchmarking tasks for clinical risk predictions. Our loss functions work well with extremely imbalanced data which are common for clinical risk prediction problems. Our loss functions can be easily used to replace (binary or multi-label) cross-entropy loss adopted in existing clinical predictive models. The Pytorch code is released at https://github.com/calvin-zcx/SCEHR.
Incidence estimates of post-acute sequelae of SARS-CoV-2 infection, also known as long-COVID, have varied across studies and changed over time. We estimated long-COVID incidence among adult and pediatric populations in three nationwide research networks of electronic health records (EHR) participating in the RECOVER Initiative using different classification algorithms (computable phenotypes). This EHR-based retrospective cohort study included adult and pediatric patients with documented acute SARS-CoV-2 infection and two control groups-- contemporary COVID-19 negative and historical patients (2019). We examined the proportion of individuals identified as having symptoms or conditions consistent with probable long-COVID within 30-180 days after COVID-19 infection (incidence proportion). Each network (the National COVID Cohort Collaborative (N3C), National Patient-Centered Clinical Research Network (PCORnet), and PEDSnet) implemented its own long-COVID definition. We introduced a harmonized definition for adults in a supplementary analysis. Overall, 4% of children and 10-26% of adults developed long-COVID, depending on computable phenotype used. Excess incidence among SARS-CoV-2 patients was 1.5% in children and ranged from 5-6% among adults, representing a lower-bound incidence estimation based on our control groups. Temporal patterns were consistent across networks, with peaks associated with introduction of new viral variants. Our findings indicate that preventing and mitigating long-COVID remains a public health priority. Examining temporal patterns and risk factors of long-COVID incidence informs our understanding of etiology and can improve prevention and management.
What is the growth pattern of social networks, like Facebook and WeChat? Does it truly exhibit exponential early growth, as predicted by textbook models like the Bass model, SI, or the Branching Process? How about the count of links, over time, for which there are few published models?
AbstractBackground Patients who were SARS-CoV-2 infected could suffer from newly incidental conditions in their post-acute infection period. These conditions, denoted as the post-acute sequelae of SARS-CoV-2 infection (PASC), are highly heterogeneous and involve a diverse set of organ systems. Limited studies have investigated the predictability of these conditions and their associated risk factors. Method In this retrospective cohort study, we investigated two large-scale PCORnet clinical research networks, INSIGHT and OneFlorida+, including 11 million patients in the New York City area and 16.8 million patients from Florida, to develop machine learning prediction models for those who are at risk for newly incident PASC and to identify factors associated with newly incident PASC conditions. Adult patients aged 20 with SARS-CoV-2 infection and without recorded infection between March 1st, 2020, and November 30th, 2021, were used for identifying associated factors with incident PASC after removing background associations. The predictive models were developed on infected adults. Results We find several incident PASC, e.g., malnutrition, COPD, dementia, and acute kidney failure, were associated with severe acute SARS-CoV-2 infection, defined by hospitalization and ICU stay. Older age and extremes of weight were also associated with these incident conditions. These conditions were better predicted (C-index >0.8). Moderately predictable conditions included diabetes and thromboembolic disease (C-index 0.7-0.8). These were associated with a wider variety of baseline conditions. Less predictable conditions included fatigue, anxiety, sleep disorders, and depression (C-index around 0.6). Conclusions This observational study suggests that a set of likely risk factors for different PASC conditions were identifiable from EHRs, predictability of different PASC conditions was heterogeneous, and using machine learning-based predictive models might help in identifying patients who were at risk of developing incident PASC.
The rapid accumulation of large-scale Electronic Health Records (EHR) presents considerable opportunities to generate real-world evidence to inform clinical decision-making and accelerate drug development. However, the complexity of EHR has turned them into a formidable testing ground for cutting-edge AI algorithms. Furthermore, a significant gap still exists between algorithm development in the computer science community and clinical translation within the healthcare community. This tutorial aims to bridge this divide by fostering mutual understanding between the two communities by discussing using advanced machine learning and data mining technologies tailored to tackle real-world healthcare challenges, including 1) using EHR and trial emulation for understanding Long Covid and drug repurposing for Alzheimer's disease, and 2) risk prediction and associated fairness, interpretability, generalizability, etc., issues. We will conclude this tutorial by delving into potential opportunities for future research and unveiling the prospects of a career as a health data scientist.
Recent studies have investigated post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) using real-world patient data such as electronic health records (EHR). Prior studies have typically been conducted on patient cohorts with specific patient populations which makes their generalizability unclear. This study aims to characterize PASC using the EHR data warehouses from two large Patient-Centered Clinical Research Networks (PCORnet), INSIGHT and OneFlorida+, which include 11 million patients in New York City (NYC) area and 16.8 million patients in Florida respectively. With a high-throughput screening pipeline based on propensity score and inverse probability of treatment weighting, we identified a broad list of diagnoses and medications which exhibited significantly higher incidence risk for patients 30-180 days after the laboratory-confirmed SARS-CoV-2 infection compared to non-infected patients. We identified more PASC diagnoses in NYC than in Florida regarding our screening criteria, and conditions including dementia, hair loss, pressure ulcers, pulmonary fibrosis, dyspnea, pulmonary embolism, chest pain, abnormal heartbeat, malaise, and fatigue, were replicated across both cohorts. Our analyses highlight potentially heterogeneous risks of PASC in different populations.
A social network is an ecosystem, and one of its ultimate goals is to maintain itself sustainable, namely keeping users generating information and being informed. However, the reasons why some social ecosystems can keep self-sustaining and others end up with non-active or dead states are largely unknown.
ABSTRACT Importance The frequency and characteristics of post-acute sequelae of SARS-CoV-2 infection (PASC) may vary by SARS-CoV-2 variant. Objective To characterize PASC-related conditions among individuals likely infected by the ancestral strain in 2020 and individuals likely infected by the Delta variant in 2021. Design Retrospective cohort study of electronic medical record data for approximately 27 million patients from March 1, 2020-November 30, 2021. Setting Healthcare facilities in New York and Florida. Participants Patients who were at least 20 years old and had diagnosis codes that included at least one SARS-CoV-2 viral test during the study period. Exposure Laboratory-confirmed COVID-19 infection, classified by the most common variant prevalent in those regions at the time. Main Outcome(s) and Measure(s) Relative risk (estimated by adjusted hazard ratio [aHR]) and absolute risk difference (estimated by adjusted excess burden) of new conditions, defined as new documentation of symptoms or diagnoses, in persons between 31-180 days after a positive COVID-19 test compared to persons with only negative tests during the 31-180 days after the last negative test. Results We analyzed data from 560,752 patients. The median age was 57 years; 60.3% were female, 20.0% non-Hispanic Black, and 19.6% Hispanic. During the study period, 57,616 patients had a positive SARS-CoV-2 test; 503,136 did not. For infections during the ancestral strain period, pulmonary fibrosis, edema (excess fluid), and inflammation had the largest aHR, comparing those with a positive test to those with a negative test, (aHR 2.32 [95% CI 2.09 2.57]), and dyspnea (shortness of breath) carried the largest excess burden (47.6 more cases per 1,000 persons). For infections during the Delta period, pulmonary embolism had the largest aHR comparing those with a positive test to a negative test (aHR 2.18 [95% CI 1.57, 3.01]), and abdominal pain carried the largest excess burden (85.3 more cases per 1,000 persons). Conclusions and Relevance We documented a substantial relative risk of pulmonary embolism and large absolute risk difference of abdomen-related symptoms after SARS-CoV-2 infection during the Delta variant period. As new SARS-CoV-2 variants emerge, researchers and clinicians should monitor patients for changing symptoms and conditions that develop after infection. STATEMENTS AND ACKNOWLEDGEMENTS Authorship has been determined by ICJME recommendation Disclosures to be obtained at time of submission The content is solely the responsibility of the authors and does not necessarily represent the official views of the RECOVER Program, the NIH or other funders We would like to thank the National Community Engagement Group (NCEG), all patient, caregiver and community Representatives, and all the participants enrolled in the RECOVER Initiative.
How do people make friends dynamically in social networks? What are the temporal patterns for an individual increasing its social connectivity? What are the basic mechanisms governing the formation of these temporal patterns? No matter cyber or physical social systems, their structure and dynamics are mainly driven by the connectivity dynamics of each individual. However, due to the lack of empirical data, little is known about the empirical dynamic patterns of social connectivity at microscopic level, let alone the regularities or models governing these microscopic dynamics.