Cross-language information retrieval by reduced k- means
2018
Cross-language information retrieval aims at retrieving relevant documents in one language for a query set in another language. Here we propose a new approach to the problem of cross-language information retrieval based on factorization of a term-document matrix by an iterative method of Reduced k-means clustering. Method of Reduced k- means intended at simultaneous reduction of objects (documents) and variables (index terms). Proposed method is compared to standard machine learning techniques of cross-language information retrieval by usage of latent semantic indexing and canonical correlation analysis. Motivation for usage of Reduced k-means method for a task of cross-language information retrieval comes from an observation that documents in a semantic space obtained by method of latent semantic indexing are clustered by their language and not by their topics in the first place. As Reduced k-means aims at preserving clustering structure of data, the idea is that the proposed method could address the mentioned problem.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
21
References
0
Citations
NaN
KQI