On Evaluating Data Preprocessing Methods for Machine Learning Models for Flight Delays

2018 
Flight delays cause various inconveniences for airlines, airports, and passengers. According to data provided by the Brazilian National Civil Aviation Agency (ANAC), between 2009 and 2015, about 22% of domestic flights made in Brazil were delayed by more than 15 minutes. The prediction of these delays is fundamental to mitigate their occurrence and optimize the decision-making process of an air transport system. Particularly, airlines, airports, and users may be more interested in when delays are likely to occur than the accurate prediction of the absence of delays. This paper focuses on the unbalanced distribution of the classes of delay (presence and absence) by performing an experimental evaluation of several preprocessing methods for the development of machine-learning flight delay classification models. Those models were built from a dataset that integrates national flight operations with meteorological conditions of airports. Our results indicate the models that applied the balancing techniques performed much better in predicting the occurrence of delays, getting about 60% of hits.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    6
    Citations
    NaN
    KQI
    []