Evaluation Datasets for Cross-lingual Semantic Textual Similarity

2021 
Semantic textual similarity (STS) systems estimate the degree of the meaning similarity between two sentences. Cross-lingual STS systems estimate the degree of the meaning similarity between two sentences, each in a different language. State-of-the-art algorithms usually employ a strongly supervised, resource-rich approach difficult to use for poorly-resourced languages. However, any approach needs to have evaluation data to confirm the results. In order to simplify the evaluation process for poorly-resourced languages (in terms of STS evaluation datasets), we present new datasets for cross-lingual and monolingual STS for languages without this evaluation data. We also present the results of several state-of-the-art methods on these data which can be used as a baseline for further research. We believe that this article will not only extend the current STS research to other languages, but will also encourage competition on this new evaluation data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    0
    Citations
    NaN
    KQI
    []