High-accuracy haplotype imputation using unphased genotype data as the references

Wenzhi Li,Wei Xu,Guoxing Fu,Li Ma,Jendai Richards,Weinian Rao,Tameka N Bythwood,Shiwen Guo,Qing Song

High-accuracy haplotype imputation using unphased genotype data as the references

2015

Wenzhi Li
Wei Xu
Guoxing Fu
Li Ma
Jendai Richards
Weinian Rao
Tameka N Bythwood
Shiwen Guo
Qing Song

Enormously growing genomic datasets present a new challenge on missing data imputation, a notoriously resource-demanding task. Haplotype imputation requires ethnicity-matched references. However, to date, haplotype references are not available for the majority of populations in the world. We explored to use existing unphased genotype datasets as references; if it succeeds, it will cover almost all of the populations in the world. The results showed that our HiFi software successfully yields 99.43% accuracy with unphased genotype references. Our method provides a cost-effective solution to breakthrough the bottleneck of limited reference availability for haplotype imputation in the big data era.

Keywords:

Big data
Biology
Genetics
Missing data
Imputation (statistics)
Imputation (genetics)
Haplotype
Bottleneck
Genotype
Haplotype estimation
missing data imputation

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations