Protein crystallization image classification with elastic net

2014 
Protein crystallization plays a crucial role in pharmaceutical research by supporting the investigation of a protein’s molecular structure through X-ray diffraction of its crystal. Due to the rare occurrence of crystals, images must be manually inspected, a laborious process. We develop a solution incorporating a regularized, logistic regression model for automatically evaluating these images. Standard image features, such as shape context, Gabor filters and Fourier transforms, are first extracted to represent the heterogeneous appearance of our images. Then the proposed solution utilizes Elastic Net to select relevant features. Its L 1 -regularization mitigates the effects of our large dataset, and its L 2 - regularization ensures proper operation when the feature number exceeds the sample number. A two-tier cascade classifier based on naive Bayes and random forest algorithms categorized the images. In order to validate the proposed method, we experimentally compare it with naive Bayes, linear discriminant analysis, random forest, and their two-tier cascade classifiers, by 10-fold cross validation. Our experimental results demonstrate a 3-category accuracy of 74%, outperforming other models. In addition, Elastic Net better reduces the false negatives responsible for a high, domain specific risk. To the best of our knowledge, this is the first attempt to apply Elastic Net to classifying protein crystallization images. Performance measured on a large pharmaceutical dataset also fared well in comparison with those presented in the previous studies, while the reduction of the high-risk false negatives is promising.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    5
    Citations
    NaN
    KQI
    []