Content Mining and Network Analysis of Microblog Spam

2010 
The number of microblogs’ user is growing rapidly with the increase of spam. Firstly, we give microblog a formal definition, and then divide spam into two types: news and advertisements. We collect 1,760,314 items of 188MB microblog news to complete the process of content mining. Using ROST Content Mining, we work on topology macro statistics, time series mining, and so on. We find that the group of microblog presents the feature of small world. Its coefficient with the same degree is negative and the probability of news microblog followers is 0.0002, while the rate of second spread is 0.011.We put forward a recursive filtering method to estimate the rate of spread on many occasions and we import cross-relation method that switches the node that are difficult for network analysis to easy forms and do social network analysis.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    19
    Citations
    NaN
    KQI
    []