BACKGROUND There is a growing interest in using person-generated wearable device data for biomedical research, but there are also concerns regarding the quality of data such as missing or incorrect data. This emphasizes the importance of assessing data quality before conducting research. In order to perform data quality assessments, it is essential to define what data quality means for person-generated wearable device data by identifying the data quality dimensions. OBJECTIVE This study aims to identify data quality dimensions for person-generated wearable device data for research purposes. METHODS This study was conducted in 3 phases: literature review, survey, and focus group discussion. The literature review was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guideline to identify factors affecting data quality and its associated data quality challenges. In addition, we conducted a survey to confirm and complement results from the literature review and to understand researchers’ perceptions on data quality dimensions that were previously identified as dimensions for the secondary use of electronic health record (EHR) data. We sent the survey to researchers with experience in analyzing wearable device data. Focus group discussion sessions were conducted with domain experts to derive data quality dimensions for person-generated wearable device data. On the basis of the results from the literature review and survey, a facilitator proposed potential data quality dimensions relevant to person-generated wearable device data, and the domain experts accepted or rejected the suggested dimensions. RESULTS In total, 19 studies were included in the literature review, and 3 major themes emerged: device- and technical-related, user-related, and data governance–related factors. The associated data quality problems were incomplete data, incorrect data, and heterogeneous data. A total of 20 respondents answered the survey. The major data quality challenges faced by researchers were completeness, accuracy, and plausibility. The importance ratings on data quality dimensions in an existing framework showed that the dimensions for secondary use of EHR data are applicable to person-generated wearable device data. There were 3 focus group sessions with domain experts in data quality and wearable device research. The experts concluded that intrinsic data quality features, such as conformance, completeness, and plausibility, and contextual and fitness-for-use data quality features, such as completeness (breadth and density) and temporal data granularity, are important data quality dimensions for assessing person-generated wearable device data for research purposes. CONCLUSIONS In this study, intrinsic and contextual and fitness-for-use data quality dimensions for person-generated wearable device data were identified. The dimensions were adapted from data quality terminologies and frameworks for the secondary use of EHR data with a few modifications. Further research on how data quality can be assessed with respect to each dimension is needed.
Neural networks are an advanced set of algorithms designed to determine patterns from data. Since 1959, researchers have speculated that neural networks could support clinical decision making.1Ledley R.S. Lusted L.B. Reasoning foundations of medical diagnosis; symbolic logic, probability, and value theory aid our understanding of how physicians reason.Science. 1959; 130: 9-21Crossref PubMed Scopus (564) Google Scholar Despite the conceivable value of neural networks, many are still unclear how these methods work and what function they may serve. Although they may be varied in their details, all neural networks share a foundational architecture. A neural net is composed of layers of nodes (sometimes called neurons). The node is the basic unit of computation in neural nets. All nodes, in any single layer, are connected to all nodes in the adjacent layers. With any net, data are fed into the network through the input layer. Each node that results from the input layer has an associated weight and is connected to an unobservable layer of nodes, called the hidden layer. In the hidden layer, each node calculates the weighted sum of its input nodes and passes the sum, adjusted for a bias, through an activation function that allows the net to solve nontrivial (nonlinear) problems. The calculations from the hidden layer then connect to the output layer, which finalizes computations and returns the result.2Goodfellow I. Bengio Y. Courville A. Deep Learning. MIT Press, Cambridge, Massachusetts2016Google Scholar Historically, the use of neural networks in health care and other domains was discouraged due to weak processing power, but advances in computational ability in the past 20 years have thrust these methods back into the spotlight as a relevant viable tool for a number of tasks. This new computational power not only encouraged the use of artificial neural networks for a wider variety of tasks, but additionally encouraged the development of increasingly complex neural network architectures. One class of complex architectures, known as deep learning models, is a subgroup of artificial neural networks that is characterized by an increasing number of hidden layers to improve abstraction and prediction from data.2Goodfellow I. Bengio Y. Courville A. Deep Learning. MIT Press, Cambridge, Massachusetts2016Google Scholar, 3Greenspan H. van Ginneken B. Summers R.M. Guest editorial. Deep learning in medical imaging: overview and future promise of an exciting new technique.IEEE Trans Med Imaging. 2016; 35: 1153-1159Crossref Scopus (1110) Google Scholar Deep learning models, or deep neural nets, are able to efficiently represent complex data through a larger set of activation functions than is typical in a neural net with a single hidden layer. This makes deep neural nets particularly well suited for multifarious tasks, such as natural language processing and image analysis.2Goodfellow I. Bengio Y. Courville A. Deep Learning. MIT Press, Cambridge, Massachusetts2016Google Scholar The recent article by Kolachalama et al.4Kolachalama V.B. Singh P. Lin C.Q. et al.Association of pathological fibrosis with renal survival using deep neural networks.Kidney Int Rep. 2018; 3: 464-475Abstract Full Text Full Text PDF PubMed Scopus (82) Google Scholar leveraged deep learning to classify phenotypes of chronic kidney disease from patient-specific histological images. This research used one of the most popular deep learning architectures, the convolutional neural network (CNN). CNNs, like the traditional neural net, are composed of layers of nodes; however, unlike the traditional net, CNNs assume that the input is a multichanneled image. The image is composed of patches of neurons that overlay each other to represent the input image. This architecture exploits spatial invariance of parts of the image; that is, an object in the image that is transformed or otherwise distorted, would still be recognized by the CNN.2Goodfellow I. Bengio Y. Courville A. Deep Learning. MIT Press, Cambridge, Massachusetts2016Google Scholar The authors wanted to explore the utility of CNNs for classifying chronic kidney disease severity from kidney biopsy images. They used a training set of 171 renal biopsies with varying degrees of interstitial fibrosis. This research used Google's Inception-v3, a CNN architecture pretrained on millions of images from the ImageNet dataset. With this architecture as a starting point, the top layers were trained to predict 1 of 6 outcomes: (i) chronic kidney disease stage (1–5) based on estimated glomerular filtration rate, (ii) high gender-specific serum creatinine levels, (iii) nephrotic range proteinuria at the time of biopsy, (iv) 1-year renal survival, (v) 3-year renal survival, and (vi) 5-year renal survival. For each of the 6 classification outcomes, the authors developed separate models, including linear discriminant analysis, Naïve Bayes, and support vector machine classifiers, to relate pathologist-estimated fibrosis scores to the outcome, as a basis for comparison. The CNN models outperformed the pathologist-estimated fibrosis scores for all classification tasks, with CNN-reported areas under the curve ranging from 8% to 24% higher than their pathologist-estimated fibrosis score counterparts. The most interesting of the outcomes are those that pertain to renal survival. Renal biopsies contain histological clues of renal function, as there are observable differences in the tissue between stable patients and those patients with impaired or declining kidney function. In the past, assessments of kidney health from biopsies were done by clinical experts (pathologists), which is likely a lengthy process that is prone to subjectivity and human error. A failure to recognize a histological abnormality could result in a missed opportunity to seek targeted, potentially life-saving or kidney-saving treatments for patients. So, the prognostic value of neural nets for kidney survival that is presented represents real promise of these techniques for clinical decision support. The use of machine learning and artificial intelligence methods, such as neural networks, to support medical image analysis is not a novel concept. Similar models to that presented by Kolachalama et al.4Kolachalama V.B. Singh P. Lin C.Q. et al.Association of pathological fibrosis with renal survival using deep neural networks.Kidney Int Rep. 2018; 3: 464-475Abstract Full Text Full Text PDF PubMed Scopus (82) Google Scholar have been used to support oncology, radiology, ophthalmology, and cardiovascular medicine, with moderate success.5Lisboa P.J.G. A review of evidence of health benefit from artificial neural networks in medical intervention.Neural Netw. 2002; 15: 11-39Crossref PubMed Scopus (350) Google Scholar However, the adoption of these methods lags far behind their utility. Despite the demonstration of their predictive power and reliability in many domains, artificial intelligence tools are relegated to the sidelines, often cited as disruptive to workflow.6Coiera E. Technology, cognition and error.BMJ Qual Saf. 2015; 24: 417-422Crossref PubMed Scopus (24) Google Scholar This is at odds with the fact that artificial intelligence tools will be beneficial only when integrated into standard clinical protocol.5Lisboa P.J.G. A review of evidence of health benefit from artificial neural networks in medical intervention.Neural Netw. 2002; 15: 11-39Crossref PubMed Scopus (350) Google Scholar, 6Coiera E. Technology, cognition and error.BMJ Qual Saf. 2015; 24: 417-422Crossref PubMed Scopus (24) Google Scholar, 7Bates D.W. Saria S. Ohno-Machado L. et al.Big data in health care: using analytics to identify and manage high-risk and high-cost patients.Health Aff (Millwood). 2014; 33: 1123-1131Crossref PubMed Scopus (677) Google Scholar When considering the attributes of artificial intelligence tools, the disruption to clinical practice has historically outweighed their advantages. Nevertheless, these methods persist. Researchers, like Kolachalama et al.,4Kolachalama V.B. Singh P. Lin C.Q. et al.Association of pathological fibrosis with renal survival using deep neural networks.Kidney Int Rep. 2018; 3: 464-475Abstract Full Text Full Text PDF PubMed Scopus (82) Google Scholar continue to develop models to support clinical decision making despite the lackluster adoption of similar methods in the past. This is likely driven by the impressive performance of these methods. However, to bring these models into practice, researchers and clinicians should not be solely preoccupied with speed and accuracy, but should assess how these methods integrate into the medical setting. AJA has a financial interest in Vitalis Pharmaceuticals. The other author declared no competing interests. Association of Pathological Fibrosis With Renal Survival Using Deep Neural NetworksKidney International ReportsVol. 3Issue 2PreviewChronic kidney damage is routinely assessed semiquantitatively by scoring the amount of fibrosis and tubular atrophy in a renal biopsy sample. Although image digitization and morphometric techniques can better quantify the extent of histologic damage, we need more widely applicable ways to stratify kidney disease severity. Full-Text PDF Open Access
Abstract Background The 2022–2023 United States influenza season had unusually early influenza activity with high hospitalization rates. Vaccine-matched A(H3N2) viruses predominated, with lower levels of A(H1N1)pdm09 activity also observed. Methods Using the test-negative design, we evaluated influenza vaccine effectiveness (VE) during the 2022–2023 season against influenza A–associated emergency department/urgent care (ED/UC) visits and hospitalizations from October 2022 to March 2023 among adults (aged ≥18 years) with acute respiratory illness (ARI). VE was estimated by comparing odds of seasonal influenza vaccination among case-patients (influenza A test positive by molecular assay) and controls (influenza test negative), applying inverse-propensity-to-be-vaccinated weights. Results The analysis included 85 389 ED/UC ARI encounters (17.0% influenza A positive; 37.8% vaccinated overall) and 19 751 hospitalizations (9.5% influenza A positive; 52.8% vaccinated overall). VE against influenza A–associated ED/UC encounters was 44% (95% confidence interval [CI], 40%–47%) overall and 45% and 41% among adults aged 18–64 and ≥65 years, respectively. VE against influenza A–associated hospitalizations was 35% (95% CI, 27%–43%) overall and 23% and 41% among adults aged 18–64 and ≥65 years, respectively. Conclusions VE was moderate during the 2022–2023 influenza season, a season characterized with increased burden of influenza and co-circulation with other respiratory viruses. Vaccination is likely to substantially reduce morbidity, mortality, and strain on healthcare resources.
Measurement concepts are essential to observational healthcare research; however, a lack of concept harmonization limits the quality of research that can be done on multisite research networks. We developed five methods that used a combination of automated, semi-automated and manual approaches for generating measurement concept sets. We validated our concept sets by calculating their frequencies in cohorts from the Columbia University Irving Medical Center (CUIMC) database. For heart transplant patients, the preoperative frequencies of basic metabolic panel concept sets, which we generated by a semi-automated approach, were greater than 99%. We also made concept sets for lumbar puncture and coagulation panels, by automated and manual methods respectively.
Abstract Background In this study we phenotyped individuals hospitalised with coronavirus disease 2019 (COVID-19) in depth, summarising entire medical histories, including medications, as captured in routinely collected data drawn from databases across three continents. We then compared individuals hospitalised with COVID-19 to those previously hospitalised with influenza. Methods We report demographics, previously recorded conditions and medication use of patients hospitalised with COVID-19 in the US (Columbia University Irving Medical Center [CUIMC], Premier Healthcare Database [PHD], UCHealth System Health Data Compass Database [UC HDC], and the Department of Veterans Affairs [VA OMOP]), in South Korea (Health Insurance Review & Assessment [HIRA]), and Spain (The Information System for Research in Primary Care [SIDIAP] and HM Hospitales [HM]). These patients were then compared with patients hospitalised with influenza in 2014-19. Results 34,128 (US: 8,362, South Korea: 7,341, Spain: 18,425) individuals hospitalised with COVID-19 were included. Between 4,811 (HM) and 11,643 (CUIMC) unique aggregate characteristics were extracted per patient, with all summarised in an accompanying interactive website ( http://evidence.ohdsi.org/Covid19CharacterizationHospitalization/ ). Patients were majority male in the US (CUIMC: 52%, PHD: 52%, UC HDC: 54%, VA OMOP: 94%,) and Spain (SIDIAP: 54%, HM: 60%), but were predominantly female in South Korea (HIRA: 60%). Age profiles varied across data sources. Prevalence of asthma ranged from 4% to 15%, diabetes from 13% to 43%, and hypertensive disorder from 24% to 70% across data sources. Between 14% and 33% were taking drugs acting on the renin-angiotensin system in the 30 days prior to hospitalisation. Compared to 81,596 individuals hospitalised with influenza in 2014-19, patients admitted with COVID-19 were more typically male, younger, and healthier, with fewer comorbidities and lower medication use. Conclusions We provide a detailed characterisation of patients hospitalised with COVID-19. Protecting groups known to be vulnerable to influenza is a useful starting point to minimize the number of hospital admissions needed for COVID-19. However, such strategies will also likely need to be broadened so as to reflect the particular characteristics of individuals hospitalised with COVID-19.
Abstract Background Pregnant people have an increased risk of severe COVID-19, including hospitalization and critical illness. Currently, pregnant people are recommended to receive the same vaccinations as non-pregnant people of the same age and underlying health status (i.e., with or without immunocompromising conditions); however, additional data are needed to inform policy decisions around the potential need for an extra dose during pregnancy to protect mother and infant. Our goal was to estimate effectiveness of COVID-19 vaccination against medically attended COVID-19 among pregnant people, during predominance of the Omicron variant. Methods The VISION Network conducted a test-negative, case-control study including emergency department/urgent care (ED/UC) encounters December 2021–April 2023 among immunocompetent pregnant people between the ages of 18-45 years. Encounters were included if COVID-19-like illness (CLI) was documented, and the pregnant person underwent SARS-CoV-2 testing within 14 days prior to the encounter. We compared the odds of vaccination among pregnant persons who tested positive with the odds of vaccination among those who tested negative. Monovalent vaccine effectiveness (VE) was calculated as 1-adjusted odds ratio multiplied by 100. Results Among 10,631 eligible CLI-associated ED/UC encounters, 2,022 (19%) were SARS-CoV-2 positive (Table). Of these, 52% of cases and 44% of controls were unvaccinated and 14% of cases and 23% of controls had received a primary series with at least 1 booster. VE of a complete monovalent primary series was 19% (95% CI: 8-29%); VE of a primary series plus a monovalent booster was 39% (95% CI: 27-50%). Median time since last dose was 349 and 219 days, respectively (Figure). VE of bivalent doses and time since last dose within the VISION Network will be available in the coming months. Conclusion Monovalent COVID-19 vaccines helped provide protection against medically attended COVID-19 among pregnant people. Pregnant people should stay up to date with all recommended vaccinations. Disclosures Gabriela Vazquez-Benitez, PhD, MSc, Abbvie: Grant/Research Support|Sanofi Pasteur: Grant/Research Support
The Omicron variant (B.1.1.529) of SARS-CoV-2, the virus that causes COVID-19, was first identified in the United States in November 2021, with the BA.1 sublineage (including BA.1.1) causing the largest surge in COVID-19 cases to date. Omicron sublineages BA.2 and BA.2.12.1 emerged later and by late April 2022, accounted for most cases.* Estimates of COVID-19 vaccine effectiveness (VE) can be reduced by newly emerging variants or sublineages that evade vaccine-induced immunity (1), protection from previous SARS-CoV-2 infection in unvaccinated persons (2), or increasing time since vaccination (3). Real-world data comparing VE during the periods when the BA.1 and BA.2/BA.2.12.1 predominated (BA.1 period and BA.2/BA.2.12.1 period, respectively) are limited. The VISION network† examined 214,487 emergency department/urgent care (ED/UC) visits and 58,782 hospitalizations with a COVID-19-like illness§ diagnosis among 10 states during December 18, 2021-June 10, 2022, to evaluate VE of 2, 3, and 4 doses of mRNA COVID-19 vaccines (BNT162b2 [Pfizer-BioNTech] or mRNA-1273 [Moderna]) compared with no vaccination among adults without immunocompromising conditions. VE against COVID-19-associated hospitalization 7-119 days and ≥120 days after receipt of dose 3 was 92% (95% CI = 91%-93%) and 85% (95% CI = 81%-89%), respectively, during the BA.1 period, compared with 69% (95% CI = 58%-76%) and 52% (95% CI = 44%-59%), respectively, during the BA.2/BA.2.12.1 period. Patterns were similar for ED/UC encounters. Among adults aged ≥50 years, VE against COVID-19-associated hospitalization ≥120 days after receipt of dose 3 was 55% (95% CI = 46%-62%) and ≥7 days (median = 27 days) after a fourth dose was 80% (95% CI = 71%-85%) during BA.2/BA.2.12.1 predominance. Immunocompetent persons should receive recommended COVID-19 booster doses to prevent moderate to severe COVID-19, including a first booster dose for all eligible persons and second booster dose for adults aged ≥50 years at least 4 months after an initial booster dose. Booster doses should be obtained immediately when persons become eligible.¶.
Abstract The recognition, disambiguation, and expansion of medical abbreviations and acronyms is of upmost importance to prevent medically-dangerous misinterpretation in natural language processing. To support recognition, disambiguation, and expansion, we present the Medical Abbreviation and Acronym Meta-Inventory, a deep database of medical abbreviations. A systematic harmonization of eight source inventories across multiple healthcare specialties and settings identified 104,057 abbreviations with 170,426 corresponding senses. Automated cross-mapping of synonymous records using state-of-the-art machine learning reduced redundancy, which simplifies future application. Additional features include semi-automated quality control to remove errors. The Meta-Inventory demonstrated high completeness or coverage of abbreviations and senses in new clinical text, a substantial improvement over the next largest repository (6–14% increase in abbreviation coverage; 28–52% increase in sense coverage). To our knowledge, the Meta-Inventory is the most complete compilation of medical abbreviations and acronyms in American English to-date. The multiple sources and high coverage support application in varied specialties and settings. This allows for cross-institutional natural language processing, which previous inventories did not support. The Meta-Inventory is available at https://bit.ly/github-clinical-abbreviations .