Nguyen Vo

Google (United States)

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Kyumin Lee

Worcester Polytechnic Institute

Chitta Baral

Arizona State University

Sandeep Tata

Google (United States)

Yonggwan Won

Chonnam National University

Minh-Tuan T. Hoang

Chonnam National University

Ying Sheng

Google (United States)

Hieu Trung Huynh

Industrial University of Ho Chi Minh City

Thanh Tran

Worcester Polytechnic Institute

James B. Wendt

Google (United States)

Yichao Zhou

Nanjing University of Science and Technology

Cooperative Institutions

Arizona State University

Google (United States)

Decision Systems (United States)

La Roche College

University of California, Berkeley

University of California, San Diego

Chonnam National University

The University of Texas at Austin

Nanjing University of Science and Technology

University of California, Los Angeles

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

An Approach to Solve Winograd Schema Challenge Using Automatically Extracted Commonsense Knowledge

National Conference on Artificial Intelligence (2015)

Arpit Sharma Nguyen Vo Shruti Gaur Chitta Baral

The Winograd Schema Challenge has recently been proposed as an alternative to the Turing test. A Winograd Schema consists of a sentence and question pair such that the answer to the question depends on the resolution of a definite pronoun in the sentence. The answer is fairly intuitive for humans but is difficult for machines because it requires commonsense knowledge about words or concepts in the sentence. In this paper we propose a novel technique which semantically parses the text, hunts for the needed commonsense knowledge and uses that knowledge to answer the given question.

Commonsense knowledge

Schema (genetic algorithms)

Turing test

Commonsense reasoning

Source

Cite

Citations (7)

RiSER: Learning Better Representations for Richly Structured Emails

Furkan Kocayusufoglu Ying Sheng Nguyen Vo James B. Wendt Qi Zhao

Recent studies show that an overwhelming majority of emails are machine-generated and sent by businesses to consumers. Many large email services are interested in extracting structured data from such emails to enable intelligent assistants. This allows experiences like being able to answer questions such as "What is the address of my hotel in New York?" or "When does my flight leave?". A high-quality email classifier is a critical piece in such a system. In this paper, we argue that the rich formatting used in business-to-consumer emails contains valuable information that can be used to learn better representations. Most existing methods focus only on textual content and ignore the rich HTML structure of emails. We introduce RiSER (Richly Structured Email Representation) - an approach for incorporating both the structure and content of emails. RiSER projects the email into a vector representation by jointly encoding the HTML structure and the words in the email. We then use this representation to train a classifier. To our knowledge, this is the first description of a neural technique for combining formatting information along with the content to learn improved representations for richly formatted emails. Experimenting with a large corpus of emails received by users of Gmail, we show that RiSER outperforms strong attention-based LSTM baselines. We expect that these benefits will extend to other corpora with richly formatted documents. We also demonstrate with examples where leveraging HTML structure leads to better predictions.

10.1145/3308558.3313720

Cite

Citations (12)

Recognizing Social Constructs from Textual Conversation

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2015)

Somak Aditya Chitta Baral Nguyen Vo Joohyung Lee Jieping Ye

Somak Aditya, Chitta Baral, Nguyen Ha Vo, Joohyung Lee, Jieping Ye, Zaw Naung, Barry Lumpkin, Jenny Hastings, Richard Scherl, Dawn M. Sweet, Daniela Inclezan. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015.

Computational linguistics

10.3115/v1/n15-1141

Cite

Citations (1)

Learning Transferable Node Representations for Attribute Extraction from Web Documents

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (2022)

Yichao Zhou Ying Sheng Nguyen Vo Nick Edmonds Sandeep Tata

Given a web page, extracting an object along with various attributes of interest (e.g. price, publisher, author, and genre for a book) can facilitate a variety of downstream applications such as large-scale knowledge base construction, e-commerce product search, and personalized recommendation. Prior approaches have either relied on computationally expensive visual feature engineering or required large amounts of training data to get to an acceptable precision. In this paper, we propose a novel method, LeArNing TransfErable node RepresentatioNs for Attribute Extraction (LANTERN), to tackle the problem. We model the problem as a tree node tagging task. The key insight is to learn a contextual representation for each node in the DOM tree where the context explicitly takes into account the tree structure of the neighborhood around the node. Experiments on the SWDE public dataset show that LANTERN outperforms the previous state-of-the-art (SOTA) by 1.44% (F1 score) with a dramatically simpler model architecture. Furthermore, we report that utilizing data from a different domain (for instance, using training data about web pages with cars to extract book objects) is surprisingly useful and helps beat the SOTA by a further 1.37%.

Tree (set theory)

Document Object Model

Feature Learning

10.1145/3488560.3498424

Cite

Citations (6)

Learning from Fact-checkers

Nguyen Vo Kyumin Lee

In fighting against fake news, many fact-checking systems comprised of human-based fact-checking sites (e.g., snopes.com and politifact.com) and automatic detection systems have been developed in recent years. However, online users still keep sharing fake news even when it has been debunked. It means that early fake news detection may be insufficient and we need another complementary approach to mitigate the spread of misinformation. In this paper, we introduce a novel application of text generation for combating fake news. In particular, we (1) leverage online users named fact-checkers, who cite fact-checking sites as credible evidences to fact-check information in public discourse; (2) analyze linguistic characteristics of fact-checking tweets; and (3) propose and build a deep learning framework to generate responses with fact-checking intention to increase the fact-checkers' engagement in fact-checking activities. Our analysis reveals that the fact-checkers tend to refute misinformation and use formal language (e.g. few swear words and Internet slangs). Our framework successfully generates relevant responses, and outperforms competing models by achieving up to 30% improvements. Our qualitative study also confirms that the superiority of our generated responses compared with responses generated from the existing models.

Misinformation

Leverage (statistics)

Fake News

False accusation

10.1145/3331184.3331248

Cite

Citations (51)

Extension of General Convergence Framework with Significant Samples

ITC-CSCC :International Technical Conference on Circuits Systems, Computers and Communications (2007)

Nguyen Vo Yonggwan Won

Single Class Classification is the problem of distinguishing one class of data (called positive class) from the universal set of multiple classes (negative class). In this paper, we proposed an improvement of Extended General Mapping Convergence framework using extreme learning machine, a recently developed machine learning algorithm. This proposed method keeping the high accuracy in classification while improving the high speed of old method.

One-class classification

Source

Cite

Citations (0)

Attributed Multi-Relational Attention Network for Fact-checking URL Recommendation

Di You Nguyen Vo Kyumin Lee Qiang Liu

To combat fake news, researchers mostly focused on detecting fake news and journalists built and maintained fact-checking sites (e.g., Snopes.com and Politifact.com). However, fake news dissemination has been greatly promoted via social media sites, and these fact-checking sites have not been fully utilized. To overcome these problems and complement existing methods against fake news, in this paper we propose a deep-learning based fact-checking URL recommender system to mitigate impact of fake news in social media sites such as Twitter and Facebook. In particular, our proposed framework consists of a multi-relational attentive module and a heterogeneous graph attention network to learn complex/semantic relationship between user-URL pairs, user-user pairs, and URL-URL pairs. Extensive experiments on a real-world dataset show that our proposed framework outperforms eight state-of-the-art recommendation models, achieving at least 3$\sim$5.3% improvement. Our source code and dataset are available at \urlhttps://web.cs.wpi.edu/~kmlee/data.html .

Code (set theory)

Social network (sociolinguistics)

Complement

Fake News

10.1145/3357384.3358006

Cite

Citations (20)

Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News

Nguyen Vo Kyumin Lee

Although many fact-checking systems have been developed in academia and industry, fake news is still proliferating on social media. These systems mostly focus on fact-checking but usually neglect online users who are the main drivers of the spread of misinformation. How can we use fact-checked information to improve users’ consciousness of fake news to which they are exposed? How can we stop users from spreading fake news? To tackle these questions, we propose a novel framework to search for fact-checking articles, which address the content of an original tweet (that may contain misinformation) posted by online users. The search can directly warn fake news posters and online users (e.g. the posters’ followers) about misinformation, discourage them from spreading fake news, and scale up verified content on social media. Our framework uses both text and images to search for fact-checking articles, and achieves promising results on real-world datasets. Our code and datasets are released at https://github.com/nguyenvo09/EMNLP2020.

Misinformation

Fake News

Code (set theory)

10.18653/v1/2020.emnlp-main.621

Cite

Citations (61)

Identifying Novel Drug Indications through Automated Reasoning

PLoS ONE (2012)

Luis Tari Nguyen Vo Shanshan Liang Jagruti Patel Chitta Baral

With the large amount of pharmacological and biological knowledge available in literature, finding novel drug indications for existing drugs using in silico approaches has become increasingly feasible. Typical literature-based approaches generate new hypotheses in the form of protein-protein interactions networks by means of linking concepts based on their cooccurrences within abstracts. However, this kind of approaches tends to generate too many hypotheses, and identifying new drug indications from large networks can be a time-consuming process.In this work, we developed a method that acquires the necessary facts from literature and knowledge bases, and identifies new drug indications through automated reasoning. This is achieved by encoding the molecular effects caused by drug-target interactions and links to various diseases and drug mechanism as domain knowledge in AnsProlog, a declarative language that is useful for automated reasoning, including reasoning with incomplete information. Unlike other literature-based approaches, our approach is more fine-grained, especially in identifying indirect relationships for drug indications.To evaluate the capability of our approach in inferring novel drug indications, we applied our method to 943 drugs from DrugBank and asked if any of these drugs have potential anti-cancer activities based on information on their targets and molecular interaction types alone. A total of 507 drugs were found to have the potential to be used for cancer treatments. Among the potential anti-cancer drugs, 67 out of 81 drugs (a recall of 82.7%) are indeed known cancer drugs. In addition, 144 out of 289 drugs (a recall of 49.8%) are non-cancer drugs that are currently tested in clinical trials for cancer treatments. These results suggest that our method is able to infer drug indications (original or alternative) based on their molecular targets and interactions alone and has the potential to discover novel drug indications for existing drugs.

DrugBank

10.1371/journal.pone.0040946

Cite

Citations (17)

Enhancing Incremental Summarization with Structured Representations

arXiv (Cornell University) (2024)

EunJeong Hwang Yichao Zhou James B. Wendt Beliz Gunel Nguyen Vo

Large language models (LLMs) often struggle with processing extensive input contexts, which can lead to redundant, inaccurate, or incoherent summaries. Recent methods have used unstructured memory to incrementally process these contexts, but they still suffer from information overload due to the volume of unstructured data handled. In our study, we introduce structured knowledge representations ($GU_{json}$), which significantly improve summarization performance by 40% and 14% across two public datasets. Most notably, we propose the Chain-of-Key strategy ($CoK_{json}$) that dynamically updates or augments these representations with new information, rather than recreating the structured memory for each new source. This method further enhances performance by 7% and 4% on the datasets.

10.48550/arxiv.2407.15021

Cite

Citations (0)