Heuristic Methods for Filtering Newly Coined Profanities Using Phylogenetic Analysis

2010 
We proposed a smart filtering system for newly coined profanities, using approximate string searching and sequence alignment. However there are a lot of coined profanities. For example, game portal Nexon has a forbidden word list of 60,000 words, so even our system still requires too much computational time for application to a real-time chat system. Therefore we need to manage a profanity database, discard redundancy and divide the elements into several groups by priority. In this paper, we propose a management algorithm for a profanity database. We use phylogenetic analysis, make evolution trees and classify profanities. We compare input words and a root of a group. We decrease the elements of the database from 6302 to 2229.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []