Data Mining Crystallization Databases: Knowledge-Based Approaches to Optimize Protein Crystal Screens

2003 
Protein crystallization is a major bottleneck in protein X-ray crystallography, the workhorse of most structural proteomics projects. Because the principles that govern protein crystalli- zation are too poorly understood to allow them to be used in a strongly predictive sense, the most com- mon crystallization strategy entails screening a wide variety of solution conditions to identify the small subset that will support crystal nucleation and growth. We tested the hypothesis that more efficient crystallization strategies could be formulated by extracting useful patterns and correlations from the large data sets of crystallization trials created in structural proteomics projects. A database of crystal- lization conditions was constructed for 755 differ- ent proteins purified and crystallized under uni- form conditions. Forty-five percent of the proteins formed crystals. Data mining identified the condi- tions that crystallize the most proteins, revealed that many conditions are highly correlated in their behavior, and showed that the crystallization suc- cess rate is markedly dependent on the organism from which proteins derive. Of the proteins that crystallized in a 48-condition experiment, 60% could be crystallized in as few as 6 conditions and 94% in 24 conditions. Consideration of the full range of information coming from crystal screening trials allows one to design screens that are maximally productive while consuming minimal resources, and also suggests further useful conditions for extend- ing existing screens. Proteins 2003;51:562-568.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    89
    Citations
    NaN
    KQI
    []