Selecting training data for unsupervised domain adaptation in word sense disambiguation

Kanako Komiya,Minoru Sasaki,Hiroyuki Shinnou,Yoshiyuki Kotani,Manabu Okumura

Selecting training data for unsupervised domain adaptation in word sense disambiguation

2016

Kanako Komiya
Minoru Sasaki
Hiroyuki Shinnou
Yoshiyuki Kotani
Manabu Okumura

This paper describes a method of domain adaptation, which involves adapting a classifier developed from source to target data. We automatically select the training data set that is suitable for the target data from the whole source data of multiple domains. This is unsupervised domain adaptation for Japanese word sense disambiguation (WSD). Experiments revealed that the accuracies of WSD improved when we automatically selected the training data set using two criteria, the degree of confidence and the leave-one-out (LOO)-bound score, compared with when the classifier was trained with all the data.

Keywords:

Artificial intelligence
Training set
Pattern recognition
Domain adaptation
Machine learning
Word-sense disambiguation
Computer science
Classifier (linguistics)
Source data
data selection
Speech recognition
degree of confidence
Natural language processing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations