A machine learning-based approach for classifying tourists and locals using geotagged photos: the case of Tokyo

2021 
In tourism-dependent cities, investigating the spatiotemporal distribution and dynamics of tourist flows is crucial for better urban planning in both steady and perturbed states. In recent years, researchers have started relying more on photo-based, geotagged social data, which offer insights about tourists, popular hotspots, and mobility patterns. However, distinguishing between tourists and locals from this data is problematic since residence information is often not provided. While previous studies rely on heuristic (e.g., period of stay) and probabilistic (Shannon entropy) approaches, this paper proposes a method for classifying tourists and residents based on machine learning (ML) algorithms and considering parameters that could explain the variability between the two (e.g., weather, mobility, and photo content). This approach was applied to Flickr users’ geotagged photos taken in Tokyo’s 23 special wards from July 2008 to December 2019. The results show that stacked ensemble (SE) models are superior to models based on five supervised-learning algorithms, including gradient boosting machine (GBM), generalized linear model (GLM), distributed random forest (DRF), deep learning (DL), and extremely randomized trees (XRT). Temporal entropy (TEN), mobility on workdays, and frequent visits to amusement venues and crowded places influenced how users were classified. While temporal distribution showed similar monthly/hourly patterns, spatial distribution varied. The proposed approach might pave the way for scholars to carry out future tourism research on different topics and subsequently support policymakers in the decision-making process, specifically in urban settings.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    57
    References
    0
    Citations
    NaN
    KQI
    []