A Spam Detection Study of Tweets in Indian Healthcare

Sramana Mukherjee,Arijit Sarkar,Saptarsi Goswami,Amit Kumar Das

A Spam Detection Study of Tweets in Indian Healthcare

2016

Sramana Mukherjee
Arijit Sarkar
Saptarsi Goswami
Amit Kumar Das

One of the rapidly growing social network, twitter has been infiltrated by large amounts of spam. Twitter has many potential applications across diverse areas, however the signal to noise ratio is very high because of spam, which is a major obstacle of meaningful analysis and action. It is a well-studied problem in emails; however, for tweets, it is relatively less researched. In this paper we have a set up a focused study consisting of nearly 5000 Tweets related to Indian Healthcare. An extensive study has been conducted where six classifiers have been evaluated and compared for spam detection. A simple term frequency based feature selection technique has been shown to reduce the model building time significantly. Ensemble method based on top five classifiers improve the accuracy as well as the stability of the results.

Keywords:

Feature selection
Ensemble learning
Obstacle
Data mining
Social network
Signal-to-noise ratio
Computer science
Health care
Model building

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations