Complete genome sequencing and variant analysis of a Pakistani individual

2013 
We sequenced the genome of a Pakistani male at 25.5x coverage using massively parallel sequencing technology. More than90% of the sequence reads were mapped to the human reference genome. In subsequent analysis, we identified 3224311single-nucleotide polymorphisms (SNPs), of which 388532 (12% of the total SNPs) had not been previously recorded in singlenucleotide polymorphism database (dbSNP) or the 1000 Genomes Project database. The 5991 non-synonymous coding variantswere screened for deleterious or disease-associated SNPs. Analysis of genes with deleterious SNPs identified ‘retinoic acidsignaling’ and ‘regulation of transcription’ as the enriched Gene Ontology terms. Scanning of non-synonymous SNPs against theOMIM revealed several disease and phenotype-associated variants in Pakistani genome. Comparative analysis with Indiangenome sequence revealed 41.8 million shared SNPs; 32% of which were annotated in B14000 genes. Gene Ontology (GO)terms analysis of these genes identified ‘response to jasmonic acid stimulus’, ‘aminoglycoside antibiotic metabolic process’ and‘glycoside metabolic process’ with considerable enrichment. A total of 59558 of small indels (1–5bp) and 16063 largestructural variations were found; 54% of which was novel. Substantial number of novel structural variations discovered inPakistani genome enforced previous inferences that (a) structural variations are major type of variation in the genome and (b)compared with SNPs, they putatively exhibit equivalent or superior functional roles. This genome sequence information will bean important reference for population-wide genomics studies of ethnically diverse South Asian subcontinent.Journal of Human Genetics advance online publication, 11 July 2013; doi:10.1038/jhg.2013.72Keywords: human genome; Pakistan; variant analysis; whole-genome sequencingINTRODUCTIONSouth Asia is the home of over 1.5 billion humans, representingalmost one-quarter of the world population. Early migration to thisregion from Africa occurred B50000–70000 years before present. Inrecent years, genomic markers have used to study the migrationpatterns and relationships among different Asian ethnic groups. Theseefforts provided clues for two major waves of migration to South Asiafrom the Middle East. One wave followed a southern coastal route,around the rim of Indian subcontinent, and continued acrossMalaysia, Indonesia and the Philipines, whereas a distinct wave ofimmigrants traveled east across the Euroasian plains and turned souththrough the Asian mainland.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    14
    Citations
    NaN
    KQI
    []