Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?

Sidsel Boldsen,Manex Agirrezabal,Patrizia Paggio

Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?

2019

Sidsel Boldsen
Manex Agirrezabal
Patrizia Paggio

In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.

Keywords:

Cluster analysis
Natural language processing
Artificial intelligence
Language change
Perplexity
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations