Semi-automated annotation of known and novel cancer long noncoding RNAs with the Cancer LncRNA Census 2 (CLC2)
2020
Long noncoding RNAs (lncRNAs) can promote or repress the cellular hallmarks of cancer. Understanding their molecular roles and realising their therapeutic potential depend on high-quality catalogues of cancer lncRNA genes. Presently, such catalogues depend on labour-intensive curation of heterogeneous data with permissive criteria, resulting in unknown numbers of genes without direct functional evidence. Here, we present an approach for semi-automated curation focused exclusively on pathogenic functionality. The result is Cancer LncRNA Census 2 (CLC2), comprising 492 gene loci in 33 cancer types. To complement manual literature curation, we develop an automated pipeline, CLIO-TIM, to identify novel cancer lncRNAs based on functional evolutionary conservation with mouse. This yields 95 novel lncRNAs, which display characteristics of known cancer genes and include LINC00570 (ncRNA-a5), which we demonstrate experimentally to promote cell proliferation. The clinical importance and curation accuracy of CLC2 lncRNAs is highlighted by a range of features, including evolutionary selection, expression in tumours, and both somatic and germline polymorphisms. The entire dataset is available in a highly-curated format facilitating the widest range of downstream applications. In summary, we show how manual and automated methods can be integrated to catalogue known and novel functional cancer lncRNAs with unique genomic and clinical properties.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
58
References
0
Citations
NaN
KQI