A quantitative analysis of concepts and semantic structure in written language: Long range correlations in dynamics of texts

2008 
Understanding texts requires memory: the reader has to keep in mind enough words to create meaning. This calls for a relation between the memory of the reader and the structure of the text. To investigate this interaction, we first identify a connectivity matrix defined by co-occurre nce of words in the text. A vector space of words characterizing the text is spanned by the principal directi ons of this matrix. It is useful to think of these weighted combinations of words as representing “concepts”. As the reader follows the text, the set of words in her window of attention follows a dynamical motion among these concepts. We observe long range power law correlations in this trajectory. By explicitly constructing surrogate h ierarchical texts, we demonstrate that the power law originates from structural organization of texts into subu nits such as chapters and paragraphs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []