A Study of English Neologisms Through Large-Scale Probabilistic Indexing of Bentham’s Manuscripts

2019 
Probabilistic indexes (PI) are obtained from untranscribed handwritten text images by means of recently introduced lexicon-free, query-by-string, probabilistic keyword spotting techniques. PIs have proven to be a powerful tool that allow efficient, free textual searching in very large collections of handwritten historical documents. PIs convey uncertain information about the textual contents of the document images. However, text uncertainty is accurately modeled by the associated lexical probability distributions, which can be conveniently exploited in many applications. As an example of these applications, here we study the dating of a number of English neologisms in the large collection of Bentham’s manuscripts, which encompass \(90\,000\) images. The statistical techniques used for neologism dating are theoretically motivated and experiments on this collection are reported. Among other interesting contributions of this study, it provides sound evidence that some commonly assumed neologism introduction dates need to be revised.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []