Which portland is it?: a machine learning approach

Nicole R. Schneider,Hanan Samet

Which portland is it?: a machine learning approach

2021

Nicole R. Schneider
Hanan Samet

This paper reviews several approaches to the problem of toponym resolution for news articles referring to 'Portland.' We train several models to differentiate between Portland, Maine and Portland, Oregon, generating features using only the text of the articles. The data used is in the form of articles pulled from NewsStand. The labels, which are provided by NewsStand's interpretation of the articles, allow for a supervised learning approach. We apply Natural Language Processing (NLP) and data cleaning techniques to process the article data, perform feature reduction, and then feed the data to the models. We show that the logistic regression model performs the best of the four models that we test. We also demonstrate that this model learns a more robust representation of the two classes than the other three models do.

Keywords:

Resolution (logic)
Machine learning
Supervised learning
Process (engineering)
Reduction (complexity)
Artificial intelligence
Feature (machine learning)
Computer science
Geotagging
Interpretation (logic)
Representation (mathematics)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations