Reordering Genomic Sequences for Enhanced Classification via Compression Analytics

Christina L. Ting,Renee Gooding,Richard V. Field,Jacob Caswell

Reordering Genomic Sequences for Enhanced Classification via Compression Analytics

2019

The full implications of sharing genomic information are still largely unknown. Understanding what attributes can be inferred from available information is therefore a critical part of genomic privacy and security. We show that compression analytics are successful at classifying, or inferring, unknown attributes of genomic sequences without the need for a predefined feature set and with very little training data. Compression analytics perform best when predictable elements within a sequence are local; however, long range dependencies are ubiquitous in the human genome. We therefore consider a variety of schemes to reorder genomic sequences so as to localize predictable elements and improve the performance of compression analytics. Compression analytics on both native and reordered sequences are shown to outperform more traditional, feature-based machine learning approaches.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations