Abstract 5110: An online tool for summarizing and searching human cancer-genomic publications

2019 
PubMed catalogs over 33,000 articles published with the keywords “cancer” and “gene” in the past year alone. The large volume of cancer genomic publications necessitates the development of text-mining tools to help cancer researchers navigate and summarize articles efficiently. Here we present a Cancer Publication Portal (CPP) that allows a researcher to search and summarize cancer genomic literature based on a gene of interest. CPP integrates data from several sources, including PubTator, which uses robust text-mining tools to identify associations between articles and biological concepts; the Medical Subject Headings (MeSH) database; the HUGO Gene Nomenclature Committee human gene name database; PubMed, a database of biomedical literature citations; and the National Cancer Institute (NCI) Thesaurus. To begin a search, a user selects a gene of interest. CPP then summarizes the relevant cancer-related publications mentioning this gene through tabular and graphical summaries showing the frequency of articles stratified by references to any (1) cancer type, (2) pharmacological agent, (3) genomic mutation, and (4) additional human gene. Additionally, CPP catalogs and summarizes articles based on mentions of >30 additional cancer-related terms, based on an analysis of terms that are significantly more likely to occur in cancer-related abstracts. CPP allows users to quickly obtain statistics, e.g., to find the frequency of articles mentioning EGFR and Erlotinib across cancer types. Interactive summaries allow users to narrow in on a topic of interest by applying additional filters, in which case the summaries are updated and the process can be repeated. At any point, the user can browse through the current set of article abstracts and citation information to get more information about an article. Finally, we introduce a Citation Analysis module for helping users identify “important” articles. This module identifies articles which cite articles in a provided list. The module then generates an edge list that can be imported into network analysis tools such as Gephi for visualization and analysis. CPP currently includes information for ~1.1 million cancer-related publications associated with >19,000 human genes. The underlying data is stored in an MySQL database while R/Shiny is used to implement the web interface. CPP is available from the following link: https://gdancik.github.io/bioinformatics/CPP/ Citation Format: Kevin Williams, Myron Zhang, Anayancy Ramos, Andrew Johnson, Stefanos Stravoravdis, Megan Heenehan, Garrett M. Dancik. An online tool for summarizing and searching human cancer-genomic publications [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 5110.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []