Background Characterizing and assessing the prevalence, awareness, and treatment patterns of patients with isolated diastolic hypertension ( IDH ) can generate new knowledge and highlight opportunities to improve their care. Methods and Results We used data from the China PEACE (Patient‐centered Evaluative Assessment of Cardiac Events) Million Persons Project, which screened 2 351 035 participants aged 35 to 75 years between 2014 and 2018. IDH was defined as systolic and diastolic blood pressure of <140 and ≥90 mm Hg; awareness as self‐reported diagnosis of hypertension; and treatment as current use of antihypertensive medications. Of the 2 310 184 participants included (mean age 55.7 years; 59.5% women); 73 279 (3.2%) had IDH , of whom 63 112 (86.1%) were untreated, and only 6512 (10.3%) of the untreated were aware of having hypertension. When compared with normotensives, participants who were <60 years, men, at least college educated, had body mass index of >28 kg/m 2 , consumed alcohol, had diabetes mellitus, and prior cardiovascular events were more likely to have IDH (all P <0.01). Among those with IDH , higher likelihood of awareness was associated with increased age, women, college education, body mass index of >28 kg/m 2 , higher income, diabetes mellitus, prior cardiovascular events, and Central or Eastern region (all P <0.05). Most treated participants with IDH reported taking only 1 class of antihypertensive medication. Conclusions IDH affects a substantial number of people in China, however, few are aware of having hypertension and most treated participants are poorly managed, which suggests the need to improve the diagnosis and treatment of people with IDH .
Large Language Models (LLMs) have demonstrated impressive capabilities in code completion tasks, where they assist developers by predicting and generating new code in real-time. However, existing LLM-based code completion systems primarily rely on the immediate context of the file being edited, often missing valuable repository-level information, user behaviour and edit history that could improve suggestion accuracy. Additionally, challenges such as efficiently retrieving relevant code snippets from large repositories, incorporating user behavior, and balancing accuracy with low-latency requirements in production environments remain unresolved. In this paper, we propose ContextModule, a framework designed to enhance LLM-based code completion by retrieving and integrating three types of contextual information from the repository: user behavior-based code, similar code snippets, and critical symbol definitions. By capturing user interactions across files and leveraging repository-wide static analysis, ContextModule improves the relevance and precision of generated code. We implement performance optimizations, such as index caching, to ensure the system meets the latency constraints of real-world coding environments. Experimental results and industrial practise demonstrate that ContextModule significantly improves code completion accuracy and user acceptance rates.
Recent years have seen the development of LLM-based code generation. Compared to generating code in a software project, incremental code edits are empirically observed to be more frequent. The emerging code editing approaches usually formulate the problem as generating an edit based on known relevant prior edits and context. However, practical code edits can be more complicated. First, an editing session can include multiple (ir)relevant edits to the code under edit. Second, the inference of the subsequent edits is non-trivial as the scope of its ripple effect can be the whole project. In this work, we propose CoEdPilot, an LLM-driven solution to recommend code edits by discriminating the relevant edits, exploring their interactive natures, and estimating its ripple effect in the project. Specifically, CoEdPilot orchestrates multiple neural transformers to identify what and how to edit in the project regarding both edit location and edit content. When a user accomplishes an edit with an optional editing description, a Subsequent Edit Analysis first reports the most relevant files in the project with what types of edits (e.g., keep, insert, and replace) can happen for each line of their code. Next, an Edit-content Generator generates concrete edit options for the lines of code, regarding its relevant prior changes reported by an Edit-dependency Analyzer. Lastly, both the Subsequent Edit Analysis and the Edit-content Generator capture relevant prior edits as feedback to readjust their recommendations. We train our models by collecting over 180K commits from 471 open-source projects in 5 programming languages. Our extensive experiments show that CoEdPilot can well predict the edits (i.e., predicting edit location with an accuracy of 70.8%-85.3%, and the edit content with an exact match rate of 41.8% and BLEU4 score of 60.7)...
Self-supervised learning (SSL) has empirically shown its data representation learnability in many downstream tasks. There are only a few theoretical works on data representation learnability, and many of those focus on final data representation, treating the nonlinear neural network as a ``black box". However, the accurate learning results of neural networks are crucial for describing the data distribution features learned by SSL models. Our paper is the first to analyze the learning results of the nonlinear SSL model accurately. We consider a toy data distribution that contains two features: the label-related feature and the hidden feature. Unlike previous linear setting work that depends on closed-form solutions, we use the gradient descent algorithm to train a 1-layer nonlinear SSL model with a certain initialization region and prove that the model converges to a local minimum. Furthermore, different from the complex iterative analysis, we propose a new analysis process which uses the exact version of Inverse Function Theorem to accurately describe the features learned by the local minimum. With this local minimum, we prove that the nonlinear SSL model can capture the label-related feature and hidden feature at the same time. In contrast, the nonlinear supervised learning (SL) model can only learn the label-related feature. We also present the learning processes and results of the nonlinear SSL and SL model via simulation experiments.
Introduction Adenocarcinoma in situ (AIS) and minimally invasive adenocarcinoma (MIA) are considered pre-invasive forms of lung adenocarcinoma (LUAD) with a 5-year recurrence-free survival of 100%. We investigated genomic profiles in early tumorigenesis and distinguished mutational features of preinvasive to invasive adenocarcinoma (IAC) for early diagnosis. Methods Molecular information was obtained from a 689-gene panel in the 90 early-stage LUAD Chinese patients using next-generation sequencing. Gene signatures were identified between pathology subtypes, including AIS/MIA (n=31) and IAC (n=59) in this cohort. Mutational and clinicopathological information was also obtained from the Cancer Genome Atlas (TCGA) as a comparison cohort. Results A higher mutation frequency of TP53 , RBM10 , MUC1 , CSMD , MED1 , LRP1B , GLI1 , MAP3K , and RYR2 was observed in the IAC than in the AIS/MIA group. The AIS/MIA group showed higher mutation frequencies of ERBB2 , BRAF , GRIN2A , and RB1 . Comparable mutation rates for mutually exclusive genes ( EGFR and KRAS ) across cohorts highlight the critical transition to invasive LUAD. Compared with the TCGA cohort, EGFR, KRAS, TP53 , and RBM10 were frequently mutated in both cohorts. Despite limited gene mutation overlap between cohorts, we observed variant mutation types in invasive LUAD. Additionally, the tumor mutation burden (TMB) values were significantly lower in the AIS/MIA group than in the IAC group in both the Chinese cohort (P=0.0053) and TCGA cohort (P<0.01). Conclusion These findings highlight the importance of distinguishing preinvasive from invasive LUAD in the early stages of LUAD and both pathology and molecular features in clinical practice, revealing genomic tumor heterogeneity and population differences.
e16004 Background: Esophageal cancer ranks as the sixth most common cause of cancer-related deaths worldwide. The occurrence and development of esophageal cancer is a complex process that involves multiple genetic variants, including changes in gene function, which could be a crucial factor in its development. Hence, studying the molecular genetic changes specific to esophageal cancer is essential for understanding its mechanism and guiding clinical treatment. In this study, we analyzed and documented genetic variations and clinical information of 82 esophageal cancer patients to provide new insights into the correlation between genes and clinical symptoms. Methods: The tissue and blood-based DNA samples from 82 patients were analyzed using 680-gene targeted capture sequencing for CNV, fusion, SNV, indel, MSI, and TMB. Coverage of tissue with 2200X, cfDNA 20000X, and control gDNA 10000X. Additionally, the level of PD-L1 expression was assessed in the tissue samples. TMB was classified as TMB-H ( > 10 mutations/megabase) and TMB-L ( < 10). MSI-H was determined with instability ≥20% of 117 sites for detection, while < 20% was determined as MSS. Results: The analysis revealed 786 SNV/indel mutations in 82 patients, of which 346 (44.02%) mutations were detected in both tissue and plasma, 296 (37.66%) only found in tissue, and 144 (18.32%) only found in plasma. 378 mutations were detected in CNV, with 181 mutations (47.88%) in tissue and blood, 186 mutations only found in tissue (49.21%), and 11 mutations only in plasma (2.91%). The high-frequency mutation genes identified in this study were TP53, CCND1, FGF19, FGF3/4, NOTCH1, and PIK3CA, which are also listed as high frequency mutated genes in the IntoGen database (7 cohorts, 945 samples). However, the genes CCND1, FGF19, FGF3/4, and PIK3CA in the CNV results were not in the IntoGen high frequency mutated gene list. The results of tTMB vs bTMB were consistent in 80 out of 82 patients (97.56%). Among the 82 samples, 79 had low expression of PD-L1, while 3 had high PD-L1 expression and TMB-L, MSS. Conclusions: Fibroblast growth factor (FGF) signaling system may plays an important role in tumorigenesis and progression. Increase in the expression of FGFs and fibroblast growth factor receptor genes, might lead to genetic alterations in their protein structures and activation of the FGFs-FGFR signaling pathway, which may contribute to the onset of esophageal cancer. In conclusion, the results of our study suggest that the detection of tissue mutations in the blood of esophageal cancer patients is challenging and cannot be reliably determined by a single blood sample. The incidence of PD-L1 expression was low in all esophageal cancer tissues. Further investigation into the correlation between CCND1, FGF19, FGF3/4, and PIK3CA genes and esophageal cancer is warranted.
With the growing reliance on automated code completion tools in software development, the need for robust evaluation benchmarks has become critical. However, existing benchmarks focus more on code generation tasks in function and class level and provide rich text description to prompt the model. By contrast, such descriptive prompt is commonly unavailable in real development and code completion can occur in wider range of situations such as in the middle of a function or a code block. These limitations makes the evaluation poorly align with the practical scenarios of code completion tools. In this paper, we propose RepoMasterEval, a novel benchmark for evaluating code completion models constructed from real-world Python and TypeScript repositories. Each benchmark datum is generated by masking a code snippet (ground truth) from one source code file with existing test suites. To improve test accuracy of model generated code, we employ mutation testing to measure the effectiveness of the test cases and we manually crafted new test cases for those test suites with low mutation score. Our empirical evaluation on 6 state-of-the-art models shows that test argumentation is critical in improving the accuracy of the benchmark and RepoMasterEval is able to report difference in model performance in real-world scenarios. The deployment of RepoMasterEval in a collaborated company for one month also revealed that the benchmark is useful to give accurate feedback during model training and the score is in high correlation with the model's performance in practice. Based on our findings, we call for the software engineering community to build more LLM benchmarks tailored for code generation tools taking the practical and complex development environment into consideration.