Hoang Nguyen

FPT University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Douglas M. Ruderfer

Vanderbilt University Medical Center

Patrick F. Sullivan

University of North Carolina at Chapel Hill

Michael O’Donovan

Cardiff University

Pamela Sklar

Icahn School of Medicine at Mount Sinai

Eli A. Stahl

Broad Institute

Weiqing Wang

Ruijin Hospital

Shaun Purcell

Harvard University

Andrew M. McIntosh

University of Edinburgh

Sarah E. Bergen

Karolinska Institutet

Fernando S. Goes

Johns Hopkins University

Cooperative Institutions

Massachusetts General Hospital

153

Genomics (United Kingdom)

138

Virginia Commonwealth University

122

University of Utah

107

Harvard University

Broad Institute

Huntsman (United States)

Triangle

Icahn School of Medicine at Mount Sinai

National Center on Birth Defects and Developmental Disabilities

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Enhancing Cross-lingual Transfer via Phonemic Transcription Integration

Findings of the Association for Computational Linguistics: ACL 2022 (2023)

Hoang Nguyen Chenwei Zhang Tao Zhang Eugene Rohrbaugh Philip S. Yu

Previous cross-lingual transfer methods are restricted to orthographic representation learning via textual scripts. This limitation hampers cross-lingual transfer and is biased towards languages sharing similar well-known scripts. To alleviate the gap between languages from different writing scripts, we propose PhoneXL, a framework incorporating phonemic transcriptions as an additional linguistic modality beyond the traditional orthographic transcriptions for cross-lingual transfer. Particularly, we propose unsupervised alignment objectives to capture (1) local one-to-one alignment between the two different modalities, (2) alignment via multi-modality contexts to leverage information from additional modalities, and (3) alignment via multilingual contexts where additional bilingual dictionaries are incorporated. We also release the first phonemic-orthographic alignment dataset on two token-level tasks (Named Entity Recognition and Part-of-Speech Tagging) among the understudied but interconnected Chinese-Japanese-Korean-Vietnamese (CJKV) languages. Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer and bridge the gap among CJKV languages, leading to consistent improvements on cross-lingual token-level tasks over orthographic-based multilingual PLMs.

Leverage (statistics)

10.18653/v1/2023.findings-acl.583

Cite

Citations (2)

Genome-wide association study identifies 30 Loci Associated with Bipolar Disorder

bioRxiv (Cold Spring Harbor Laboratory) (2017)

Eli A. Stahl Gerome Breen Andreas J. Forstner Andrew McQuillin Stephan Ripke

ABSTRACT Bipolar disorder is a highly heritable psychiatric disorder that features episodes of mania and depression. We performed the largest genome-wide association study to date, including 20,352 cases and 31,358 controls of European descent, with follow-up analysis of 822 sentinel variants at loci with P<1×10 -4 in an independent sample of 9,412 cases and 137,760 controls. In the combined analysis, 30 loci reached genome-wide significant evidence for association, of which 20 were novel. These significant loci contain genes encoding ion channels and neurotransmitter transporters ( CACNA1C , GRIN2A , SCN2A , SLC4A1 ), synaptic components ( RIMS1 , ANK3 ), immune and energy metabolism components. Bipolar disorder type I (depressive and manic episodes; ~ 73% of our cases) is strongly genetically correlated with schizophrenia whereas bipolar disorder type II (depressive and hypomanic episodes; ~ 17% of our cases) is more strongly correlated with major depressive disorder. These findings address key clinical questions and provide potential new biological mechanisms for bipolar disorder.

Genome-wide Association Study

Genetic Association

10.1101/173062

Cite

Citations (88)

Investigating rare pathogenic/likely pathogenic exonic variation in bipolar disorder

Molecular Psychiatry (2021)

Xiaoming Jia Fernando S. Goes Adam E. Locke Duncan S. Palmer Weiqing Wang

Abstract Bipolar disorder (BD) is a serious mental illness with substantial common variant heritability. However, the role of rare coding variation in BD is not well established. We examined the protein-coding (exonic) sequences of 3,987 unrelated individuals with BD and 5,322 controls of predominantly European ancestry across four cohorts from the Bipolar Sequencing Consortium (BSC). We assessed the burden of rare, protein-altering, single nucleotide variants classified as pathogenic or likely pathogenic (P-LP) both exome-wide and within several groups of genes with phenotypic or biologic plausibility in BD. While we observed an increased burden of rare coding P-LP variants within 165 genes identified as BD GWAS regions in 3,987 BD cases (meta-analysis OR = 1.9, 95% CI = 1.3–2.8, one-sided p = 6.0 × 10 −4 ), this enrichment did not replicate in an additional 9,929 BD cases and 14,018 controls (OR = 0.9, one-side p = 0.70). Although BD shares common variant heritability with schizophrenia, in the BSC sample we did not observe a significant enrichment of P-LP variants in SCZ GWAS genes, in two classes of neuronal synaptic genes (RBFOX2 and FMRP) associated with SCZ or in loss-of-function intolerant genes. In this study, the largest analysis of exonic variation in BD, individuals with BD do not carry a replicable enrichment of rare P-LP variants across the exome or in any of several groups of genes with biologic plausibility. Moreover, despite a strong shared susceptibility between BD and SCZ through common genetic variation, we do not observe an association between BD risk and rare P-LP coding variants in genes known to modulate risk for SCZ.

Genome-wide Association Study

Exome

Genetic Association

Missing heritability problem

Candidate gene

10.1038/s41380-020-01006-9

Cite

Citations (20)

Gene expression imputation across multiple brain regions reveals schizophrenia risk throughout development

bioRxiv (Cold Spring Harbor Laboratory) (2017)

Laura M. Huckins Amanda Dobbyn Douglas M. Ruderfer Gabriel E. Hoffman Weiqing Wang

Abstract Transcriptomic imputation approaches offer an opportunity to test associations between disease and gene expression in otherwise inaccessible tissues, such as brain, by combining eQTL reference panels with large-scale genotype data. These genic associations could elucidate signals in complex GWAS loci and may disentangle the role of different tissues in disease development. Here, we use the largest eQTL reference panel for the dorso-lateral pre-frontal cortex (DLPFC), collected by the CommonMind Consortium, to create a set of gene expression predictors and demonstrate their utility. We applied these predictors to 40,299 schizophrenia cases and 65,264 matched controls, constituting the largest transcriptomic imputation study of schizophrenia to date. We also computed predicted gene expression levels for 12 additional brain regions, using publicly available predictor models from GTEx. We identified 413 genic associations across 13 brain regions. Stepwise conditioning across the genes and tissues identified 71 associated genes (67 outside the MHC), with the majority of associations found in the DLPFC, and of which 14/67 genes did not fall within previously genome-wide significant loci. We identified 36 significantly enriched pathways, including hexosaminidase-A deficiency, and multiple pathways associated with porphyric disorders. We investigated developmental expression patterns for all 67 non-MHC associated genes using BRAINSPAN, and identified groups of genes expressed specifically pre-natally or post-natally.

Genome-wide Association Study

Imputation (statistics)

Genetic Association

10.1101/222596

Cite

Citations (16)

Dynamic Semantic Matching and Aggregation Network for Few-shot Intent Detection

Hoang Nguyen Chenwei Zhang Congying Xia Philip L. H. Yu

Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances. Although recent works demonstrate that multi-level matching plays an important role in transferring learned knowledge from seen training classes to novel testing classes, they rely on a static similarity measure and overly fine-grained matching components. These limitations inhibit generalizing capability towards Generalized Few-shot Learning settings where both seen and novel classes are co-existent. In this paper, we propose a novel Semantic Matching and Aggregation Network where semantic components are distilled from utterances via multi-head self-attention with additional dynamic regularization constraints. These semantic components capture high-level information, resulting in more effective matching between instances. Our multi-perspective matching method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances. We also propose a more challenging evaluation setting that considers classification on the joint all-class label space. Extensive experimental results demonstrate the effectiveness of our method. Our code and data are publicly available.

Regularization

Semantic Matching

10.18653/v1/2020.findings-emnlp.108

Cite

Citations (23)

Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive Learning

Hoang Nguyen Chenwei Zhang Ye Liu Philip S. Yu

Recent advanced methods in Natural Language Understanding for Task-oriented Dialogue (TOD) Systems (e.g., intent detection and slot filling) require a large amount of annotated data to achieve competitive performance. In reality, token-level annotations (slot labels) are time-consuming and difficult to acquire. In this work, we study the Slot Induction (SI) task whose objective is to induce slot boundaries without explicit knowledge of token-level slot annotations. We propose leveraging Unsupervised Pre-trained Language Model (PLM) Probing and Contrastive Learning mechanism to exploit (1) unsupervised semantic knowledge extracted from PLM, and (2) additional sentence-level intent label signals available from TOD. Our approach is shown to be effective in SI task and capable of bridging the gaps with token-level supervised models on two NLU benchmark datasets. When generalized to emerging intents, our SI objectives also provide enhanced slot label representations, leading to improved performance on the Slot Filling tasks.

Benchmark (surveying)

Natural language understanding

Factor (programming language)

10.18653/v1/2023.sigdial-1.44

Cite

Citations (1)

Correction: Investigating rare pathogenic/likely pathogenic exonic variation in bipolar disorder (Molecular Psychiatry, (2021), 10.1038/s41380-020-01006-9) : Investigating rare pathogenic/likely pathogenic exonic variation in bipolar disorder (Molecular Psychiatry, (2021), 26, 9, (5239-5250), 10.1038/s41380-020-01006-9)

Molecular Psychiatry (2021)

Xiaoming Jia Fernando S. Goes Adam E. Locke Duncan S. Palmer Weiqing Wang

Variation (astronomy)

Source

Cite