Detection of Fusion Genes from Human Breast Cancer Cell-Line RNA-Seq Data Using Shifted Short Read Clustering

2018 
Fusion genes make for one of the mechanisms of tumorigenesis. The identification of fusion genes by RNA-Seq has attracted attention. Various methods for detecting fusion genes have been proposed, but their accuracy is not sufficient. One of the causes of this problem is the relatively short reading length in RNA-Seq data. Therefore, before mapping RNA-Seq data, we proposed a method, which is based on shifted short-read clustering (SSC), to identify shifted reads of the same origin and extend them as representative sequences. As a result, we assumed that the percentage of uniquely mapped reads would be increased, and the detection rates of the fusion genes could be improved. To verify these hypotheses, we applied the SSC method to RNA-Seq data from three cell lines (BT-474, MCF-7, and SKBR-3). When only one base was shifted, the average read lengths of BT-474, MCF-7, and SKBR-3 were extended from 201 to 223 bases (111%), 201 to 214 bases (106%), and 201 to 213 bases (106%), respectively. Furthermore, the effectiveness of the SSC method is demonstrated by comparing the performances of a fusion gene detection tool's results, STAR-Fusion, with and without the SSC method of the reads. The percentage of uniquely mapped reads of BT-474, MCF-7, and SKBR-3 were improved from 88% to 93%, 88% to 94%, and 92% to 95%, respectively. Finally, the fusion gene detection rates of BT-474, MCF-7, and SKBR-3 were increased from 48% to 57%, 49% to 53%, and 50% to 53% respectively. The SSC method is considered to be an effective method not only for improving the percentage of uniquely mapped reads but also for fusion gene detection.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []