Which portland is it?: a machine learning approach

2021 
This paper reviews several approaches to the problem of toponym resolution for news articles referring to 'Portland.' We train several models to differentiate between Portland, Maine and Portland, Oregon, generating features using only the text of the articles. The data used is in the form of articles pulled from NewsStand. The labels, which are provided by NewsStand's interpretation of the articles, allow for a supervised learning approach. We apply Natural Language Processing (NLP) and data cleaning techniques to process the article data, perform feature reduction, and then feed the data to the models. We show that the logistic regression model performs the best of the four models that we test. We also demonstrate that this model learns a more robust representation of the two classes than the other three models do.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []