Informed driving is increasingly becoming a key feature for increasing the sustainability of taxi companies. The sensors that are installed in each vehicle are providing new opportunities for automatically discovering knowledge, which, in return, delivers information for real-time decision making. Intelligent transportation systems for taxi dispatching and for finding time-saving routes are already exploring these sensing data. This paper introduces a novel methodology for predicting the spatial distribution of taxi-passengers for a short-term time horizon using streaming data. First, the information was aggregated into a histogram time series. Then, three time-series forecasting techniques were combined to originate a prediction. Experimental tests were conducted using the online data that are transmitted by 441 vehicles of a fleet running in the city of Porto, Portugal. The results demonstrated that the proposed framework can provide effective insight into the spatiotemporal distribution of taxi-passenger demand for a 30-min horizon.
Online recommender systems often deal with continuous, potentially fast and unbounded flows of data. Ensemble methods for recommender systems have been used in the past in batch algorithms, however they have never been studied with incremental algorithms, that are capable of processing those data streams on the fly. We propose online bagging, using an incremental matrix factorization algorithm for positive-only data streams. Using prequential evaluation, we show that bagging is able to improve accuracy more than 35% over the baseline with small computational overhead.
In the last decade, the real-time vehicle location systems attracted everyone attention for the new kind of rich spatio-temporal information. The fast processing of this large amount of information is a growing and explosive challenge. Taxi companies are already exploring such information in efficient taxi dispatching and time-saving route finding. In this paper, we propose a novel methodology to produce online short term predictions on the passenger demand spatial distribution over 63 taxi stands in the city of Porto, Portugal. We did so using time series forecasting techniques to the processed events constantly communicated for 441 taxi vehicles. Our tests - using 4 months of real data - demonstrated that this model is a true major contribution to the driver mobility intelligence: 76% of the 86411 demanded taxi services were accurately forecasted in a 30 minutes time horizon.
This chapter presents an adaptive predictive model for a student modeling prediction task in the context of an adaptive educational hypermedia system (AEHS). The task, that consists in determining what kind of learning resources are more appropriate to a particular learning style, presents two issues that are critical. The first is related to the uncertainty of the information about the student’s learning style acquired by psychometric instruments. The second is related to the changes over time of the student’s preferences (concept drift). To approach this task, we propose a probabilistic adaptive predictive model that includes a method to handle concept drift based on statistical quality control. We claim that our approach is able to adapt quickly to changes in the student’s preferences and that it should be successfully used in similar user modeling prediction tasks, where uncertainty and concept drift are presented.Request access from your librarian to read this chapter's full text.
Wide-area sensor infrastructures, remote sensors, RFIDs, phasor measurements, and wireless sensor networks yield massive volumes of disparate, dynamic, and geographically distributed data. With the recent proliferation of smart-phones and similar GPS enabled mobile devices with several onboard sensors, collection of sensor data is no longer limited to scientific communities, but has reached general public. As such sensors are becoming ubiquitous, a set of broad requirements is beginning to emerge across high-priority applications including adaptability to national or homeland security, critical infrastructures monitoring, smart grids, disaster preparedness and management, greenhouse emissions and climate change, and transportation. The raw data from sensors need to be efficiently managed and transformed to usable information through data fusion, which in turn must be converted to predictive insights via knowledge discovery, ultimately facilitating automated or human-induced tactical decisions or strategic policy based on decision sciences and decision support systems. The challenges for the knowledge discovery community are expected to be immense. On the one hand are dynamic data streams or events that require real-time analysis methodologies and systems, while on the other hand are static data that require high end computing for generating offline predictive insights, which in turn can facilitate real-time analysis. The online and real-time knowledge discovery imply immediate opportunities as well as intriguing short- and long-term challenges for practitioners and researchers in knowledge discovery. The opportunities would be to develop new data mining approaches and adapt traditional and emerging knowledge discovery methodologies to the requirements of the emerging problems. In addition, emerging societal problems require knowledge discovery solutions that are designed to investigate anomalies, rare events, hotspots, changes, extremes and nonlinear processes, and departures from the normal.