A Bayesian approach to accurate and robust signature detection on LINCS L1000 data

2019 
LINCS L1000 dataset produced by L1000 assay contains numerous cellular expression data induced by large sets of perturbagens. Although it provides invaluable resources for drug discovery as well as understanding of disease mechanisms, severe noise in the dataset makes the detection of reliable gene expression signals difficult. Existing methods for the peak deconvolution, either k-means based or Gaussian mixture model, cannot reliably recover the accurate expression level of genes in many cases, thereby limiting their robust applications in biomedical studies. Here, we have developed a novel Bayes theory based deconvolution algorithm that gives unbiased likelihood estimations for peak positions and characterizes the peak with a probability based z-scores. Based on above algorithm, a pipeline is built to process raw data from L1000 assay into signatures that represent the features of perturbagen. The performance of the proposed new pipeline is rigorously evaluated using the similarity between bio-replicates and between drugs with shared targets. The results show that the new signature derived from the proposed algorithm gives a substantially more reliable and informative representation for perturbagens than existing methods. Thus, our new Bayesian-based peak deconvolution and z-score calculation method may significantly boost the performance of invaluable L1000 data in the down-stream applications such as drug repurposing, disease modeling, and gene function prediction.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    1
    Citations
    NaN
    KQI
    []