Can One Tamper with the Sample API?: Toward Neutralizing Bias from Spam and Bot Content

2016 
While social media mining continues to be an active area of research, obtaining data for research is a perennial problem. Even more, obtaining unbiased data is a challenge for researchers who wish to study human behavior, and not technical artifacts induced by the sampling algorithm of a social media site. In this work, we evaluate one social media data outlet that gives data to its users in the form of a stream: Twitter's Sample API. We show that in its current form, this API can be poisoned by bots or spammers who wish to promote their content, jeopardizing the credibility of the data collected through this API. We design a proof-of-concept algorithm that shows how malicious users could increase the probability of their content appearing in the Sample API, thus biasing the content towards spam and bot content and harming the representativity of this data outlet.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    18
    Citations
    NaN
    KQI
    []