Comparing Dissimilarity Metrics for Clustering Gene into Functional Modules using Machine Learning

2020 
Clustering is widely used in biological analyses for clustering genes into functional modules. For any clustering mechanism, we need to define some measurements for dissimilarity. The two most commonly used dissimilarity metrics are the Manhattan distance and Euclidean distance. Moreover, the 1-correlation coefficient is also commonly used for defining similarity. Here, we use the transcriptomic data across multiple environments in yeast for gene clustering and evaluate the performance of using these four dissimilarity metrics. We designed two metrics that use 1-abs(correlation) for Pearson and Spearman correlation. We found that 1-abs(Pearson correlation) works the best in two test cases for identifying genes involved in ethanol metabolism and galactose metabolism and build a clustering model based on the metric. We propose that this dissimilarity metric be used for future studies of clustering of genes based on expression level. Such information, combined with more gathering of transcriptomic information across environments, will boost our understanding of gene clustering and modularity in exploring unknown species.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    0
    Citations
    NaN
    KQI
    []