Quality Control and Peak Finding for Proteomics Data Collected from Nipple Aspirate Fluid by Surface-Enhanced Laser Desorption and Ionization

2003 
Background: Recently, researchers have been using mass spectroscopy to study cancer. For use of proteomics spectra in a clinical setting, stringent quality-control procedures will be needed. Methods: We pooled samples of nipple aspirate fluid from healthy breasts and breasts with cancer to prepare a control sample. Aliquots of the control sample were used on two spots on each of three IMAC ProteinChip ® arrays (Ciphergen Biosystems, Inc.) on 4 successive days to generate 24 SELDI spectra. In 36 subsequent experiments, the control sample was applied to two spots of each ProteinChip array, and the resulting spectra were analyzed to determine how closely they agreed with the original 24 spectra. Results: We describe novel algorithms that ( a ) locate peaks in unprocessed proteomics spectra and ( b ) iteratively combine peak detection with baseline correction. These algorithms detected ∼200 peaks per spectrum, 68 of which are detected in all 24 original spectra. The peaks were highly correlated across samples. Moreover, we could explain 80% of the variance, using only six principal components. Using a criterion that rejects a chip if the Mahalanobis distance from both control spectra to the center of the six-dimensional principal component space exceeds the 95% confidence limit threshold, we rejected 5 of the 36 chips. Conclusions: Mahalanobis distance in principal component space provides a method for assessing the reproducibility of proteomics spectra that is robust, effective, easily computed, and statistically sound.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    187
    Citations
    NaN
    KQI
    []