Abstract Background Diabetic kidney disease (DKD) is a serious complication of diabetes mellitus (DM), with patients typically remaining asymptomatic until reaching an advanced stage. We aimed to develop and validate a predictive model for DKD in patients with an initial diagnosis of type 2 diabetes mellitus (T2DM) using real-world data. Methods We retrospectively examined data from 3,291 patients (1740 men, 1551 women) newly diagnosed with T2DM at Ningbo Municipal Hospital of Traditional Chinese Medicine (2011–2023). The dataset was randomly divided into training and validation cohorts. Forty-six readily available medical characteristics at initial diagnosis of T2DM from the electronic medical record were used to develop prediction models based on linear, non-linear, and SuperLearner approaches. Model performance was evaluated using the area under the curve (AUC). SHapley Additive exPlanation (SHAP) was used to interpret the best-performing models. Results Among 3291 participants, 563 (17.1%) were diagnosed with DKD during median follow-up of 2.53 years. The SuperLearner model exhibited the highest AUC (0.7138, 95% confidence interval: [0.673, 0.7546]) for the holdout internal validation set in predicting any DKD stage. Top-ranked features were WBC_Cnt*, Neut_Cnt, Hct, and Hb. High WBC_Cnt, low Neut_Cnt, high Hct, and low Hb levels were associated with an increased risk of DKD. Conclusions We developed and validated a DKD risk prediction model for patients with newly diagnosed T2DM. Using routinely available clinical measurements, the SuperLearner model could predict DKD during hospital visits. Prediction accuracy and SHAP-based model interpretability may help improve early detection, targeted interventions, and prognosis of patients with DM.
Wilson's disease, also known as hepatolenticular degeneration, is a rare human autosomal recessive inherited disorder of copper metabolism. The clinical manifestations are diverse, and the diagnosis and treatment are often delayed. The purpose of this study is to establish a new predictive diagnostic model of Wilson's disease and evaluate its predictive efficacy by multivariate regression analysis of small trauma, good accuracy, low cost, and quantifiable serological indicators, in order to identify Wilson's disease early, improve the diagnosis rate, and clarify the treatment plan.A retrospective analysis was performed on 127 patients with Wilson's disease admitted to the First People's Hospital of Yunnan Province from January 2003 to May 2022 as the experimental group and 73 patients with normal serological indicators who were not diagnosed with Wilson's disease. SPSS version 26.0 software was used for single factor screening and a multivariate binary logistic regression analysis to screen out independent factors. R version 4.1.0 software was used to establish an intuitive nomogram prediction model for the independent influencing factors included. The accuracy of the nomogram prediction model was evaluated and quantified by calculating the concordance index (C-index) and drawing the calibration curve. At the same time, the area under the curve (AUC) of the nomogram prediction model and the receiver operating characteristic (ROC) curve of the Leipzig score was calculated to compare the predictive ability of the nomogram model and the current Leipzig score for Wilson's disease.Alanine aminotransferase (ALT), aspartate aminotransferase (AST), alkaline phosphatase (AKP), albumin (ALB), uric acid (UA), serum calcium (Ca), serum phosphorus (P), and hemoglobin (HGB) are closely related to the occurrence of Wilson's disease (p < 0.1). The prediction model of Wilson's disease contains seven independent predictors: ALT, AST, AKP, ALB, UA, Ca, and P. The AUC value of the prediction model was 0.971, and the C-index value was 0.972. The calibration curve was well fitted with the ideal curve. The nomogram prediction model had a good predictive effect on the occurrence of Wilson's disease; the ROC curve of Leipzig score was drawn, and the AUC value was calculated. The AUC of the Leipzig score was 0.969, indicating that the prediction model and the scoring system had predictive value, and the nomogram prediction model had a better predictive effect on the research objects of the center.ALT, AST, AKP, ALB, UA, Ca, and P are independent predictors of Wilson's disease, and can be used as early predictors. Based on the nomogram prediction model, the optimal threshold was determined to be 0.698, which was an important reference index for judging Wilson's disease. Compared with the Leipzig score, the nomogram prediction model has a relatively high sensitivity and specificity and has a good clinical application value.
With the continuous development of inter-bank FX (foreign exchange) market, FX trading platform has been developed from a single electronic trading platform to multiple electronic trading platforms which covered a variety of trading products and trading mechanisms. FX transactions belong to specific areas, and their application systems have a lot of commonalities in the business requirements, architecture and concrete realization. There are certain cohesion and stability in domain. In this paper, an improved FODA (Feature-oriented Domain Analysis) method is used to analyze the domain and application requirements of FX derivatives. To achieve reusable domain components, DSSA (Domain Specific Software Architecture) is designed firstly to recognize the public and variable parts in FX derivatives field. Then, based on the analysis of feature model, domain model is adopted to deal with the reuse problem of FX derivatives component. After that, the FSM (Finite State Machine) is designed to guarantee the idempotent of the service. Finally, this paper shows the design and implementation of FX derivatives. In proof-of-concept domain-based FX derivatives trading system, the improvement of efficiency and quality in software development is well demonstrated.
Aircraft Engine Health Management Data Mining Tools is a project led by NASA Glenn Research Center in support of the NASA Aviation Safety Program's Aviation System Monitoring and Modeling Thrust. The objective of the Glenn-led effort is to develop enhanced aircraft engine health management prognostic and diagnostic methods through the application of data mining technologies to operational data and maintenance records. This will lead to the improved safety of air transportation, optimized scheduling of engine maintenance, and optimization of engine usage. This paper presents a roadmap for achieving these goals.
Abstract In the process of constructing the power grid supervision knowledge graph, it is necessary to sort out and integrate structured and unstructured multi-source data for knowledge extraction and reasoning. In order to solve the quality problems of multi-source data redundancy and errors, this paper proposes a multi-source data quality evaluation system to achieve multi-dimensional quality evaluation of power grid supervision data such as maintenance, defect, measurement, alarm and oil chromatography. Data preprocessing methods are firstly adopted for text and numerical data separately to complete data noise reduction, data filtering, data filling, etc. According to the actual calculation example, the practicability and effectiveness of the data preprocessing method and data quality evaluation system are verified. Finally, the data value of the power grid supervision knowledge graph is significantly improved, which is helpful to comprehensively improve the equipment state perception ability.
The divide and conquer strategy, which breaks a massive data set into a se- ries of manageable data blocks, and then combines the independent results of data blocks to obtain a final decision, has been recognized as a state-of-the-art method to overcome challenges of massive data analysis. In this paper, we merge the divide and conquer strategy with local average regression methods to infer the regressive relationship of input-output pairs from a massive data set. After theoretically analyzing the pros and cons, we find that although the divide and conquer local average regression can reach the optimal learning rate, the restric- tion to the number of data blocks is a bit strong, which makes it only feasible for small number of data blocks. We then propose two variants to lessen (or remove) this restriction. Our results show that these variants can achieve the optimal learning rate with much milder restriction (or without such restriction). Extensive experimental studies are carried out to verify our theoretical assertions.
Abstract Context Intraoperative hemodynamic instability (HI) deteriorates surgical outcomes of patients with normotensive pheochromocytoma (NP). Objective To characterize the hemodynamics of NP and develop and externally validate a prediction model for intraoperative HI. Methods Data on 117 patients with NP (derivation cohort) and 40 patients with normotensive adrenal myelolipoma (NAM) who underwent laparoscopic adrenalectomy from January 2011 to November 2021 were retrospectively collected. Data on 22 patients with NP (independent validation cohort) were collected from another hospital during the same period. The hemodynamic characteristics of patients with NP and NAM were compared. Machine learning models were used to identify risk factors associated with HI. The final model was visualized via a nomogram. Results Forty-eight (41%) out of 117 patients experienced HI, which was significantly more than that for NAM. A multivariate logistic regression including age, tumor size, fasting plasma glucose, and preoperative systolic blood pressure showed good discrimination measured by area under curve (0.8286; 95% CI 0.6875-0.9696 and 0.7667; 95% CI 0.5386-0.9947) for predicting HI in internal and independent validation cohorts, respectively. The sensitivities and positive predictive values were 0.6667 and 0.7692 for the internal and 0.9167 and 0.6111 for the independent validations, respectively. The final model was visualized via a nomogram and yielded net benefits across a wide range of risk thresholds in decision curve analysis. Conclusion Patients with NP experienced HI during laparoscopic adrenalectomy. The nomogram can be used for individualized prediction of intraoperative HI in patients with NP.
The vulnerability of machine learning models to Membership Inference Attacks (MIAs) has garnered considerable attention in recent years. These attacks determine whether a data sample belongs to the model's training set or not. Recent research has focused on reference-based attacks, which leverage difficulty calibration with independently trained reference models. While empirical studies have demonstrated its effectiveness, there is a notable gap in our understanding of the circumstances under which it succeeds or fails. In this paper, we take a further step towards a deeper understanding of the role of difficulty calibration. Our observations reveal inherent limitations in calibration methods, leading to the misclassification of non-members and suboptimal performance, particularly on high-loss samples. We further identify that these errors stem from an imperfect sampling of the potential distribution and a strong dependence of membership scores on the model parameters. By shedding light on these issues, we propose RAPID: a query-efficient and computation-efficient MIA that directly \textbf{R}e-lever\textbf{A}ges the original membershi\textbf{P} scores to m\textbf{I}tigate the errors in \textbf{D}ifficulty calibration. Our experimental results, spanning 9 datasets and 5 model architectures, demonstrate that RAPID outperforms previous state-of-the-art attacks (e.g., LiRA and Canary offline) across different metrics while remaining computationally efficient. Our observations and analysis challenge the current de facto paradigm of difficulty calibration in high-precision inference, encouraging greater attention to the persistent risks posed by MIAs in more practical scenarios.