PEARSON VERSUS SPEARMAN, KENDALL'S TAU CORRELATION ANALYSIS ON STRUCTURE-ACTIVITY RELATIONSHIPS OF BIOLOGIC ACTIVE COMPOUNDS

2005 
Correlation coefficients and their associated squared values are examined for the validation of estimates of the activity of biological compounds when a molecular descriptors family is used in the framework of structure-activity relationship (SAR) methods [1]. Starting with the assumption that the measured activity of a biologically active compound is a semiquantitative outcome, we examined Pearson, Spearman, and Kendall’s correlation coefficients. Toxicity descriptors of sixty-seven biologic active compounds were analyzed by applying the molecular descriptors family using SAR modeling. The correlation between the measured toxicity and that estimated by the best performing model was investigated by applying the Pearson, Spearman and Kendall's τa , τb , τc squared correlation coefficient. The results obtained were express as squared correlation coefficients, 95% confidence intervals (CI) of correlation coefficient, Student's t or Z test value, and theirs associated pvalue. They were as follows: Pearson: rPrs 2 = 0.90577, [0.9223, 0.9701], tPrs = 24.99 (p < 0.0001); Spearman: ρSpm 2 = 0.86064, [0.8846, 0.9550], tSpm = 20.03 (p < 0.0001); Kendall's τa: τKen,a 2 = 0.61294, [0.6683, 0.8611], ZKen,τa = 9.37 (p < 0.0001); Kendall's τb: τKen,b 2 = 0.61769, [0.6726, 0.8631], ZKen,τb = 9.37 (p < 0.0001); Kendall's τc: τKen,c 2 = 0.59478, [0.6517, 0.8533], ZKen,τc = 9.23 (p < 0.0001) We remark, that the toxicity of biologically active compounds is a semi-quantitative variable and that its determination may depend on various external factors, e.g. the type of equipment used, the researcher's skills and performance, the type and class of chemicals used. Under those circumstances, a rank correlation coefficient would provide a more reliable estimate of the association than the parametric Pearson coefficient. Our study shows that all five computational methods used to evaluate the squared correlation coefficients resulted in a statistically significant p-value (always less than 0.0001). As expected, lower values of squared correlation coefficients were obtained with Kendall’s methods, and the 95% CI associated with the correlation coefficients overlapped. Looking at the correlation coefficients and their 95% CI calculated with the Pearson and Spearman formulas and how they overlap with the Kendall's τa , τb , τc squared correlation coefficients we suggest that there are no significant differences between them. More research on other classes of biologic active compounds may reveal whether it is appropriate to analyze the activity of molecular descriptors family based on SAR methods using the Pearson correlation coefficient or whether a rank correlation coefficient must be applied
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    155
    Citations
    NaN
    KQI
    []