The evolution of duplicate genes has been a topic of broad interest. Here, we propose that the conservation of gene family size is a good indicator of the rate of sequence evolution and some other biological properties. By comparing the human-chimpanzee-macaque orthologous gene families with and without family size conservation, we demonstrate that genes with family size conservation evolve more slowly than those without family size conservation. Our results further demonstrate that both family expansion and contraction events may accelerate gene evolution, resulting in elevated evolutionary rates in the genes without family size conservation. In addition, we show that the duplicate genes with family size conservation evolve significantly more slowly than those without family size conservation. Interestingly, the median evolutionary rate of singletons falls in between those of the above two types of duplicate gene families. Our results thus suggest that the controversy on whether duplicate genes evolve more slowly than singletons can be resolved when family size conservation is taken into consideration. Furthermore, we also observe that duplicate genes with family size conservation have the highest level of gene expression/expression breadth, the highest proportion of essential genes, and the lowest gene compactness, followed by singletons and then by duplicate genes without family size conservation. Such a trend accords well with our observations of evolutionary rates. Our results thus point to the importance of family size conservation in the evolution of duplicate genes.
It has been suggested that insertions and deletions (indels) have contributed to the sequence divergence between the human and chimpanzee genomes more than do nucleotide changes (3% vs. 1.2%). However, although there have been studies of large indels between the two genomes, no systematic analysis of small indels (i.e., indels ≤ 100 bp) has been published. In this study, we first estimated that the false-positive rate of small indels inferred from human–chimpanzee pairwise sequence alignments is quite high, suggesting that the chimpanzee genome draft is not sufficiently accurate for our purpose. We have therefore inferred only human-specific indels using multiple sequence alignments of mammalian genomes. We identified >840,000 “small” indels, which affect >7000 UCSC-annotated human genes (>11,000 transcripts). These indels, however, amount to only ∼0.21% sequence change in the human lineage for the regions compared, whereas in pseudogenes indels contribute to a sequence divergence of 1.40%, suggesting that most of the indels that occurred in genic regions have been eliminated. Functional analysis reveals that the genes whose coding exons have been affected by human-specific indels are enriched in transcription and translation regulatory activities but are underrepresented in catalytic and transporter activities, cellular and physiological processes, and extracellular region/matrix. This functional bias suggests that human-specific indels might have contributed to human unique traits by causing changes at the RNA and protein level.