RedTweet: Recommendation Engine for Reddit

2015 
With the growing popularity in using social media to collect data, there is an increasing need to discover ways in which to productively use this data. Our objective is to form an interest profile from tweets and use this to recommend loosely related Reddit threads which the reader is most likely to be interested in. The problem is approached as a genre classification problem. Given a tweet, we want to deduce what genre(s) it might fall under if those words in the tweet were used in official texts. From there, we keep track of how many tweets fall under which genre, and generate a list of Reddit threads which similarly fall under those genre and are proportional to the interests of the user. Due to the complexity of genre classification, we chose to use an ensemble approach for classification. We use three classifiers in our ensemble: 1) a classic Naive Bayesian classifier, 2) a Naive Bayesian classifier trained only on the parts-of-speech of sentences, and 3) a Naive Bayesian classifier which will only make a decision if the probability P(x) ≥ 0.9. We measured the success of our classifiers by comparing the accuracy, precision, and recall of each model. Classifiers 1 and 2 had high accuracy than classifier 3 but classifier 3 had a much higher precision and recall rate. After creating the classifier, we were then able to form an interest profile on well-known people, one who has a small number of tweets versus one with a much larger number, and compile a list of recommended articles. The genres tagged to each person seemed to match their public personas and most of the articles chosen fit these genres. Our results are a valuable beginning for what constitutes a much larger project.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    2
    Citations
    NaN
    KQI
    []