Deterministic protein inference for shotgun proteomics data provides new insights into Arabidopsis pollen development and function

2009 
The plant life cycle alternates between a diploid and a haploid generation, the spore-producing sporophyte and the gamete-producing gametophyte, respectively (Supplemental Fig. S1). Unlike in animals, where meiotic products directly differentiate into gametes, the haploid spores undergo several mitotic divisions to form multicellular gametophytes, which in turn form the gametes. In the anther, microspores initiate male gametophyte (pollen) development through an asymmetric division forming a large vegetative and a smaller generative cell (McCormick 2004). The latter is engulfed into the cytoplasm of the vegetative cell and divides again to form the two sperm cells (Fig. 1A,B). The mature pollen is released from the anther and, after deposition on the stigma, the pollen grain germinates, grows a pollen tube, and transports the sperm cells to the female gametes where double fertilization ensues. Consequently, the pollen grain—although in a silent state—must be poised for these rapid physiological changes. Since pollen is a vehicle for dispersal, it is largely dehydrated and has to survive harsh environmental conditions before it reaches a fertilization partner. Figure 1. Arabidopsis thaliana male gametophyte (pollen grain). (A) Schematic representation of a mature pollen grain, which contains two sperm cells enclosed in the cytoplasm of a vegetative cell. Characteristics of the vegetative cell include a large vacuole ... Mature pollen represents a largely autonomous, highly simplified organism, which is specialized for the dispersal and transport of male gametes. It is ideally suited for the study of cell growth and morphogenesis as well as processes underlying dehydration and prolonged survival (Hepler et al. 2001; Boavida et al. 2005). Since pollen is the main allergen for type I allergy, and more than 400 million people suffer from seasonal asthma or hay fever, pollen biology is of major medical interest (Taylor et al. 2007). Most of our knowledge about pollen development and function is based on genetic analyses and transcriptomics studies in a few plant model systems such as Arabidopsis thaliana or Zea mays. Expression evidence for ∼12,000 genes has been reported from various stages of pollen development (Becker et al. 2003; Honys and Twell 2003, 2004; Pina et al. 2005; Schmid et al. 2005). The transcriptome complexity of mature pollen (6500 expressed gene models) was lowest among 79 Arabidopsis tissues characterized in the AtGenExpress data set, and stood out based on its very broad distribution of expression levels, which included a prominent fraction of low expressed genes, as well as a minor fraction of highly expressed genes (Schmid et al. 2005). Transcription is not essential for the mature pollen grain and during pollen tube growth (Mascarenhas 1965; Onodera et al. 2008), suggesting that significant control is exercised at the post-transcriptional level. As the correlation between pollen transcript and protein levels is not known and the ATH1 array used in transcriptome studies covers only 83% of the protein-coding gene models of the Arabidopsis reference database TAIR7, proteomic studies promise additional insights. Despite this, our knowledge about the pollen proteome is very limited: 2D gel electrophoresis approaches have collectively identified 266 distinct proteins (Holmes-Davis et al. 2005; Noir et al. 2005; Sheoran et al. 2006). Mature pollen represents a difficult system for a proteomics approach in terms of sample preparation, where sufficient quantities of protein have to be collected. Moreover, the significant amount of genome duplication in higher plants, combined with the expectation (based on transcriptomics data) that a large percentage of proteins can only be identified by a single peptide, poses a significant data analysis challenge. The peptide-centric nature of shotgun proteomics has the effect that identified peptides often cannot be unambiguously assigned to one protein. This makes a subsequent biological data interpretation very difficult and requires strategies to extract the maximum of unambiguous protein evidence. To address this issue, we have devised a novel deterministic peptide classification and protein inference scheme for shotgun proteomics data, which differs from the existing approaches such as ProteinProphet (Nesvizhskii et al. 2003), EBP (Price et al. 2007), and IDPicker (Zhang et al. 2007) in three aspects: (1) Our deterministic classification is the only approach that considers the gene model–protein sequence–protein accession relationships and classifies each peptide sequence according to its information content (Fig. 2B,C). Thus it distinguishes unique peptides from those shared by several proteins, either encoded by the same gene model or by distinct gene models; (2) in contrast to probabilistic approaches, it only considers peptides above a certain confidence threshold after the peptide spectrum matching process, not peptides of lower score, hence the name deterministic. By filtering less informative, ambiguous peptides a conservative cumulative protein list with a minimal number of false or ambiguous protein assignments can be generated, allowing researchers to draw firm conclusions from the final data set; (3) by considering the protein–gene model relationship, our classification scheme facilitates the seamless integration with transcriptomics data sets. Figure 2. Overall workflow and peptide classification scheme. (A) Our workflow integrates the in silico analysis of the Arabidopsis reference protein database (TAIR7; Supplemental Fig. S3) to generate an identifiable proteome index (open boxes); the extraction, ... Using shotgun proteomics, we identified ∼3500 proteins, expanding the mature pollen proteome by a factor of 13. Manual validation of all unambiguous single hit protein identifications enabled us to eliminate a large number of false positive identifications and to provide a reference data set of high quality. Integration of our proteomics data with published transcriptomics data sets allowed us to report >500 proteins that were not previously identified in mature pollen. Functional analysis of the mature pollen proteome provided novel insights into pollen function and development, related to dehydration, prolonged survival, protein stability, post-transcriptional control, and rapid tip growth.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    83
    References
    144
    Citations
    NaN
    KQI
    []