Unsupervised classification of online community input to advance transportation services

2018 
The traditional approach taken by transit agencies to assess their performance is through regular rider surveys. The performance metrics include safety, timeliness, efficiency, and cleanliness. However, with the increased use of online social media by the public, including users of public transit, it has become possible to automatically summarize the riders’ opinions on the services provided by transit agencies by statistical analysis of the words used in the social media messages. This work describes a system based on machine learning to summarize text messages regarding transportation in California on online social media platforms. This tool is intended to reveal factors important to transportation users that may not be evident to transit agencies and hence will not be collected by rider surveys. The system uses an unsupervised statistical topic modeling algorithm (latent Dirichlet allocation) to cluster public messages related to transportation on the Twitter social media platform into distinct “topics.” Sentiment analysis was then utilized to assign a polarity (positive, negative, or neutral) to each message and then the sentiment is aggregated by topic. The system is thus able to summarize the sentiment towards each automatically identified topic. The approach was applied to a set of 10,400 tweets containing words related to transportation; these messages were downloaded over a period of three weeks in 2016. The proposed system was evaluated by varying topic modeling algorithm parameters and studying the effect of parameters on the interpretability of results. It was found that the quality of topic identification depends on the size of the dataset, the number of topics that has to be specified to the topic modeling algorithm, and the positive/negative thresholds for the sentiment analysis algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    2
    Citations
    NaN
    KQI
    []