Learning Document Similarity Using Natural Language Processing

Paola Merlo,James Henderson,Gerold Schneider,Eric Wehrli

Learning Document Similarity Using Natural Language Processing

2013

Paola Merlo
James Henderson
Gerold Schneider
Eric Wehrli

The recent considerable growth in the amount of easily available on-line text has brought to the foreground the need for large-scale natural language processing tools for text data mining. In this paper we address the problem of organizing documents into meaningful groups according to their content and to visualize a text collection, providing an overview of the range of documents and of their relationships, so that they can be browsed more easily. We use Self-Organizing Maps (SOMs) (Kohonen 1984). Great efficiency challenges arise in creating these maps. We study linguistically-motivated ways of reducing the representation of a document to increase efficiency and ways to disambiguate the words in the documents.

Keywords:

Self-organizing map
Natural language processing
Information retrieval
Data mining
Computer science
Artificial intelligence
document similarity

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations