Robust causal dependence mining in big data network and its application to traffic flow predictions

2015 
Abstract In this paper, we focus on a special problem in transportation studies that concerns the so called “Big Data” challenge, which is: how to build concise yet accurate traffic flow prediction models based on the massive data collected by different sensors ? The size of the data, the hidden causal dependence and the complexity of traffic time series are some of the obstacles that affect making reliable forecast at a reasonable cost, both time-wise and computation-wise. To better prepare the data for traffic modeling, we introduce a multiple-step strategy to process the raw “Big Data” into compact time series that are better suited for regression and causality analysis. First, we use the Granger causality to define and determine the potential dependence among data, and produce a much condensed set of times series who are also highly dependent. Next, we deploy a decomposition algorithm to separate daily-similar trend and nonstationary bursts components from the traffic flow time series yielded by the Granger test. The decomposition results are then treated by two rounds of Lasso regression: the standard Lasso method is first used to quickly filter out most of the irrelevant data, followed by a robust Lasso method to further remove the disturbance caused by bursts components and recover the strongest dependence among the remaining data. Test results show that the proposed method significantly reduces the costs of building prediction models. Moreover, the obtained causal dependence graph reveals the relationship between the structure of road networks and the correlations among traffic time series. All these findings are useful for building better traffic flow prediction models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    52
    References
    66
    Citations
    NaN
    KQI
    []