Reproducible experiments on word and sentence similarity measures for the biomedical domain

Alicia Lara-Clares,Juan José Lastra Díaz,Ana García Serrano

Reproducible experiments on word and sentence similarity measures for the biomedical domain

2021

Measuring semantic similarity between sentences is an important task in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and biomedical text mining, among others. HESML is a self-contained experimentation platform on word and sentence similarity and relatedness which is especially well suited to run large experimental surveys by supporting the execution of automatic reproducible experiment files based on a XML-based file format. HESML library has been developed in Java 8 and Netbeans 8. This dataset introduces HESML V2R1, implementing the protocol in [1], which is the sixth release of the Half-Edge Semantic Measures Library (HESML), and is based on HESML V1R5 [2]. HESML V2R1 is a linearly scalable and efficient Java software library of word and sentence semantic similarity measures. This last release of HESML allows the evaluation and comparison of most of the sentence similarity methods for the biomedical domain as well as the study on the impact of different pre-processing configurations on the performance of the sentence similarity methods. [1] Lara-Clares A, Lastra-Diaz JJ, Garcia-Serrano A. Protocol for a reproducible experimental survey on biomedical sentence similarity. PLoS One. 2021;16: e0248663. doi:10.1371/journal.pone.0248663 [2] Juan J. Lastra-Diaz; Alicia Lara-Clares; Ana Garcia-Serrano, 2021, "HESML V1R5 Java software library of ontology-based semantic similarity measures and information content models", https://doi.org/10.21950/1RRAWJ, e-cienciaDatos, V2

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations