Efficient discovery of single-nucleotide polymorphisms in coding regions of human genes.

2002 
Single nucleotide polymorphisms in protein coding regions (cSNPs) are of great interest for their effects on phenotype and potential for mapping disease genes. We have identified 5400 novel exonic SNPs from alignments of public EST data to the draft human genome sequence, and approximately 12 000 more novel exonic SNPs from EST cluster alignments. We found 82% of the genomic-aligned SNPs and 63% of the EST-only SNPs to be detectably polymorphic in 20 Finnish DNA samples. 37% of the SNPs mapped to known protein coding regions, yielding 6500 distinct, novel cSNPs from the two datasets. These data reveal selection against mutations that alter protein structure, and distinct classes of genes under strongly positive vs. negative pressure from natural selection for amino acid replacement (detected by K A /K S ratio). We have searched these cSNPs for compatibility with the amino acid profile at each site and structural impact on protein core stability.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    19
    Citations
    NaN
    KQI
    []