Haplotype tagging using support vector machines

2006 
Constructing a complete human haplotype map can help in associating complex diseases with SNPs (single nucleotide polymorphisms). Unfortunately, the number of SNPs is very large and it is costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNPs that should be sequenced to a small number of informative representatives called tag SNPs. Depending on the application, tagging can achieve either budget savings by inferring non-tag SNPs from tag SNPs or shortening lengthy and difficult to handle SNP sequences obtained from Affimetrix Map Array. Tagging should first choose which SNPs to use as tags and then predict the unknown non-tag SNPs from the known tags. In this paper we propose a new SNP prediction using a robust tool for classification - Support Vector Machine (SVM). For tag selection we use a fast stepwise tag selection algorithm. An extensive experimental study on various datasets including 3 regions from HapMap shows that the tag selection based on SVM SNP prediction can reach the same prediction accuracy as the methods of Halldorson et al. (7) on the LPL using significantly fewer tags. For example, our method reaches 90% SNP prediction accuracy using only 3 tags for Daly et al. (6) dataset with 103 SNPs. The proposed tagging method is also more accurate (but considerably slower) than multivariate linear regression method of He et al. (12).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    7
    Citations
    NaN
    KQI
    []