In epidemiological studies, finding the best subset of factors is challenging when the number of explanatory variables is large.Our study had two aims. First, we aimed to identify essential depression-associated factors using the extreme gradient boosting (XGBoost) machine learning algorithm from big survey data (the Korea National Health and Nutrition Examination Survey, 2012-2016). Second, we aimed to achieve a comprehensive understanding of multifactorial features in depression using network analysis.An XGBoost model was trained and tested to classify "current depression" and "no lifetime depression" for a data set of 120 variables for 12,596 cases. The optimal XGBoost hyperparameters were set by an automated machine learning tool (TPOT), and a high-performance sparse model was obtained by feature selection using the feature importance value of XGBoost. We performed statistical tests on the model and nonmodel factors using survey-weighted multiple logistic regression and drew a correlation network among factors. We also adopted statistical tests for the confounder or interaction effect of selected risk factors when it was suspected on the network.The XGBoost-derived depression model consisted of 18 factors with an area under the weighted receiver operating characteristic curve of 0.86. Two nonmodel factors could be found using the model factors, and the factors were classified into direct (P<.05) and indirect (P≥.05), according to the statistical significance of the association with depression. Perceived stress and asthma were the most remarkable risk factors, and urine specific gravity was a novel protective factor. The depression-factor network showed clusters of socioeconomic status and quality of life factors and suggested that educational level and sex might be predisposing factors. Indirect factors (eg, diabetes, hypercholesterolemia, and smoking) were involved in confounding or interaction effects of direct factors. Triglyceride level was a confounder of hypercholesterolemia and diabetes, smoking had a significant risk in females, and weight gain was associated with depression involving diabetes.XGBoost and network analysis were useful to discover depression-related factors and their relationships and can be applied to epidemiological studies using big survey data.
BACKGROUND In the past 20 years, various methods have been introduced to construct disease networks. However, established disease networks have not been clinically useful to date because of differences among demographic factors, as well as the temporal order and intensity among disease-disease associations. OBJECTIVE This study sought to investigate the overall patterns of the associations among diseases; network properties, such as clustering, degree, and strength; and the relationship between the structure of disease networks and demographic factors. METHODS We used National Health Insurance Service-National Sample Cohort (NHIS-NSC) data from the Republic of Korea, which included the time series insurance information of 1 million out of 50 million Korean (approximately 2%) patients obtained between 2002 and 2013. After setting the observation and outcome periods, we selected only 520 common Korean Classification of Disease, sixth revision codes that were the most prevalent diagnoses, making up approximately 80% of the cases, for statistical validity. Using these data, we constructed a directional and weighted temporal network that considered both demographic factors and network properties. RESULTS Our disease network contained 294 nodes and 3085 edges, a relative risk value of more than 4, and a false discovery rate-adjusted <i>P</i> value of <.001. Interestingly, our network presented four large clusters. Analysis of the network topology revealed a stronger correlation between in-strength and out-strength than between in-degree and out-degree. Further, the mean age of each disease population was related to the position along the regression line of the out/in-strength plot. Conversely, clustering analysis suggested that our network boasted four large clusters with different sex, age, and disease categories. CONCLUSIONS We constructed a directional and weighted disease network visualizing demographic factors. Our proposed disease network model is expected to be a valuable tool for use by early clinical researchers seeking to explore the relationships among diseases in the future.
Background Dry eye disease (DED) is a complex disease of the ocular surface, and its associated factors are important for understanding and effectively treating DED. Objective This study aimed to provide an integrative and personalized model of DED by making an explanatory model of DED using as many factors as possible from the Korea National Health and Nutrition Examination Survey (KNHANES) data. Methods Using KNHANES data for 2012 (4391 sample cases), a point-based scoring system was created for ranking factors associated with DED and assessing patient-specific DED risk. First, decision trees and lasso were used to classify continuous factors and to select important factors, respectively. Next, a survey-weighted multiple logistic regression was trained using these factors, and points were assigned using the regression coefficients. Finally, network graphs of partial correlations between factors were utilized to study the interrelatedness of DED-associated factors. Results The point-based model achieved an area under the curve of 0.70 (95% CI 0.61-0.78), and 13 of 78 factors considered were chosen. Important factors included sex (+9 points for women), corneal refractive surgery (+9 points), current depression (+7 points), cataract surgery (+7 points), stress (+6 points), age (54-66 years; +4 points), rhinitis (+4 points), lipid-lowering medication (+4 points), and intake of omega-3 (0.43%-0.65% kcal/day; −4 points). Among these, the age group 54 to 66 years had high centrality in the network, whereas omega-3 had low centrality. Conclusions Integrative understanding of DED was possible using the machine learning–based model and network-based factor analysis. This method for finding important risk factors and identifying patient-specific risk could be applied to other multifactorial diseases.
Two mRNA COVID-19 vaccines (mRNA-1273, Moderna; and BNT162b2, Pfizer-BioNTech) and one viral vector vaccine (JNJ-78436735, Janssen/Johnson and Johnson) are authorized in the US to hinder COVID-19 infections. We analyzed severe and common adverse events in response to COVID-19 vaccines using real-world, Vaccine Adverse Effect Reporting System (VAERS) data. From 14 December 2020 to 30 September 2021, 481,172 (50.7 ± 17.5 years, males 27.89%, 12.35 per 100,000 people) individuals reported adverse events (AEs). The median time to severe AEs was 2 days after injection. The risk of severe AEs following the one viral vector vaccine (OR = 1.044, 95% CI = 1.005–1.086) was significantly higher than that after the two mRNA vaccines, and the risk among males (OR = 1.374, 95% CI = 1.342–1.406) was higher than among females, except for anaphylaxis. For common AEs, however, the risk to males (OR = 0.621, 95% CI = 0.612–0.63) was lower than to females. In conclusion, we provided medical insight and clinical guidance about vaccine types by characterizing AEs using real-world data. In particular, COVID-19 mRNA vaccines are safer than viral vector vaccines with regard to coagulation disorders, whereas inflammation-related AEs are lower in the viral vaccine. The risk–benefit ratio of vaccines should be carefully considered, and close monitoring and management of severe AEs is needed.
In recent years, several network models have been introduced to elucidate the relationships between diseases. However, important risk factors that contribute to many human diseases, such as age, gender and prior diagnoses, have not been considered in most networks. Here, we construct a diagnosis progression network of human diseases using large-scale claims data and analyze the associations between diagnoses. Our network is a scale-free network, which means that a small number of diagnoses share a large number of links, while most diagnoses show limited associations. Moreover, we provide strong evidence that gender, age and disease class are major factors in determining the structure of the disease network. Practically, our network represents a methodology not only for identifying new connectivity that is not found in genome-based disease networks but also for estimating directionality, strength, and progression time to transition between diseases considering gender, age and incidence. Thus, our network provides a guide for investigators for future research and contributes to achieving precision medicine.
We identified the association of changes in moderate-to-vigorous physical activity (MVPA) with SARS-CoV-2 infection. From 6,396,500 patients, we performed a nested case-control study who participated in both biennial check-ups. Adjusted odds ratios (aOR) and 95% confidence intervals (CI) were calculated using multivariable logistic regression. From physically inactive patients at period I, the odds increased when engaged in 1-2, 3-4, or ≥5 times of MVPA/week at period II. This study found that MVPA was directly associated with SARS-CoV-2 infection.
Abstract Background Cardiovascular disease (CVD) is a significant contributor to morbidity and mortality worldwide, with CVD and post-acute COVID-19 associated CVD increasing. It remains unknown whether COVID-19 patients with weight gain are at a high risk for CVD events. Therefore, the primary objective of this study is to investigate the association between weight control and the risk of CVD following COVID-19. Methods The study included 2,024,728 adults who participated in two rounds of health screening between 2017 and 2020. The final cohort, which included 70,996 participants in the COVID-19 group and 212,869 participants in the control group. The adjusted hazard ratio of BMI change to CVD risk was calculated using Cox proportional hazards regression. Results We identified a total of 2869 cases of CVD (861 events for COVID-19 group and 2,008 events for the control group). Compared to individuals with a stable BMI, COVID-19 patients without obesity had an increased risk of CVD (adjusted hazard ratio [aHR] = 2.28; 95% confidence interval [CI], 1.15–4.53; p-value = 0.018). Additionally, non-COVID-19 patients with obesity also exhibited a higher risk of CVD (aHR = 1.58; 95% CI, 1.01–2.47; p-value = 0.046). Conclusion In conclusion, people who gained weight during the pandemic, regardless of their weight category, had a significantly higher risk of CVD associated with COVID-19 compared to those who maintained their weight before the pandemic.
Introduction: It remains unknown whether patients with pre-existing depressive conditions are at high risk of severe COVID-19. Therefore, this study aims to investigate the association between patients with pre-existing depressive conditions and severe COVID-19. Method: This study is part of the Korea Disease Control and Prevention Agency-COVID19-National Health Insurance Service cohort study of an ongoing large-scale health screening survey of adults 18 years and older residing in South Korea. Pre-existing depression status was measured from 552,860 patients who participated in a biennial health screening from 2019 to 2020. Finally, 29,106 confirmed COVID-19 patients were enrolled and followed up to track any severe clinical events within 1 month of their diagnosis date. Adjusted odds ratio (AOR) and 95% confidence interval (CI) were calculated using multivariate-adjusted logistic regression analysis. Results: We identified 2868 COVID-19 patients with severe clinical events and 26,238 COVID-19 patients without severe clinical events. The moderate-to-severe depressive symptoms group showed an elevated odds of severe outcomes of COVID-19 (AOR, 1.46; 95% CI, 1.25–1.72), including those without vaccination (AOR, 1.32; 95% CI, 1.08–1.61) and those with complete vaccination (AOR, 1.76; 95% CI, 1.18–2.63). In addition, those who were diagnosed with depression along with depressive symptoms at the health screening revealed an increased risk of severe outcomes of COVID-19 (AOR, 2.22; 95% CI, 1.22–4.05). Conclusion: Moderate-to-severe depressive symptoms were associated with higher odds of severe COVID-19 events in both no and complete vaccination groups. Participants with depressive symptoms may be at higher risk of severe outcomes of COVID-19.
Chronic obstructive pulmonary disease (COPD) is considered a major cause of death worldwide, and various studies have been conducted for its early diagnosis. Our work developed a scoring system by predicting and validating COPD and performed predictive model implementations. Participants who underwent a health screening between 2017 and 2020 were extracted from the Korea National Health and Nutrition Examination Survey (KNHANES) database. COPD individuals were defined as aged 40 years or older with prebronchodilator forced expiratory volume in 1 s/forced vital capacity (FEV1/FVC < 0.7). The logistic regression model was performed, and the C-index was used for variable selection. Receiver operating characteristic (ROC) curves with area under the curve (AUC) values were generated for evaluation. Age, sex, waist circumference and diastolic blood pressure were used to predict COPD and to develop a COPD score based on a multivariable model. A simplified model for COPD was validated with an AUC value of 0.780 from the ROC curves. In addition, we evaluated the association of the derived score with cardiovascular disease (CVD). COPD scores showed significant performance in COPD prediction. The developed score also showed a good effect on the diagnostic ability for CVD risk. In the future, studies comparing the diagnostic accuracy of the derived scores with standard diagnostic tests are needed.