Features Identification and Selection

2020 
As a first approach, it is assumed that stylistic markers can be detected by considering words, or more precisely, the most frequent ones. This chapter explores several other ways to define useful stylistic traces let by the author. Instead of considering only isolated words, one can explore the usefulness of short sequences of words (called word n-grams). After applying a part-of-speech (POS) tagger, the resulting tags or sequences of them could be pertinent to discriminate between distinct styles. On the other hand, the letters and n-grams of them could also reflect the distinction between authors. In addition, various feature selection functions have been proposed to select the best subset of stylistic markers to describe a given writer or category (e.g., men vs. women). All those solutions present advantages and drawbacks and this chapter exposes and illustrates them. Finally, two methods for extracting the overused terms and expressions corresponding to a given author or category are discussed and examples are presented to illustrate the required computation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    347
    References
    0
    Citations
    NaN
    KQI
    []