Using Wiktionary as a resource for WSD : the case of French verbs.

Vincent Segonne,Marie Candito,Benoît Crabbé

Using Wiktionary as a resource for WSD : the case of French verbs.

2019

As opposed to word sense induction, word sense disambiguation (WSD), whether supervised or semi-supervised, has the advantage of using interpretable senses, but requires annotated data, which are quite rare for most languages except English (Miller et al., 1993). In this paper, we investigate which strategy to adopt to achieve WSD for languages lacking data that was annotated specifically for the task, focusing on the particular case of verb disambiguation in French. We first study the us-ability of Eurosense (Bovi et al. 2017), a multilingual corpus extracted from Europarl (Kohen, 2005) and automatically annotated with BabelNet (Navigli and Ponzetto, 2010) senses. Such a resource opened up the way to supervised and semi-supervised WSD for resourceless languages like French. While this perspective looked promising, our evaluation showed the annotated senses' quality was not sufficient for supervised WSD on French verbs. Instead, we propose to use Wiktionary, a col-laboratively edited, multilingual online dictionary, as a new resource for WSD. Wiktionary provides both sense inventory and manually sense tagged examples which can be used to train supervised and semi-supervised WSD systems. Yet, because senses' distribution differ in lexicographic examples as found in Wiktionary with respect to natural text, we then focus on studying the impact on WSD of the training data size and senses' distribution. Using state-of-the art semi-supervised systems, we report experiments of wiktionary-based WSD for French verbs, evaluated on FrenchSemEval (FSE), a new dataset of French verbs manually annotated with wiktionary senses.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations