Gender prediction on a real life blog data set using LSI and KNN

2017 
Gender prediction on social media data set is usually tackled as a text classification problem and can be solved using machine learning methods such as K-nearest neighbor algorithm (KNN). However, KNN is computationally costly due to its lazy learning pattern; it does not perform well when the dimension of feature space is high. Dimension reduction methods are thus introduced and integrated into KNN to save the computation time. In this paper we proposed an approach which combines the Latent Semantic Indexing (LSI) method to KNN to predict the gender based on a real life collection of posts on actual blog pages. Its effectiveness in processing large scale and high dimensional data is demonstrated by experimental results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    11
    Citations
    NaN
    KQI
    []