We describe new methodology for supervised learning with sparse data, i.e., when the number of input features is (much) larger than the number of training samples (n). Under the proposed approach, all available (d) input features are split into several (t) subsets, effectively resulting in a larger number (t*n) of labeled training samples in lower-dimensional input space (of dimensionality d/t). This (modified) training data is then used to estimate a classifier for making predictions in lower-dimensional space. In this paper, standard SVM is used for training a classifier. During testing (prediction), a group of t predictions made by SVM classifier needs to be combined via intelligent post-processing rules, in order to make a prediction for a test input (in the original d-dimensional space). The novelty of our approach is in the design and empirical validation of these post-processing rules under Group Learning setting. We demonstrate that such post-processing rules effectively reflect general (common-sense) a priori knowledge (about application data). Specifically, we propose two different post-processing schemes and demonstrate their effectiveness for two real-life application domains, i.e., handwritten digit recognition and seizure prediction from iEEG signal. These empirical results show superior performance of the Group Learning approach for sparse data, under both balanced and unbalanced classification settings.
We describe a novel system for online prediction of lead seizures from long-term intracranial electroencephalogram (iEEG) recordings for canines with naturally occurring epilepsy. This study adopts new specification of lead seizures, reflecting strong clustering of seizures in observed data. This clustering results in fewer lead seizures (~7 lead seizures per dog), and hence new challenges for online seizure prediction, that are addressed in the proposed system. In particular, machine learning part of the system is implemented using the Group Learning method suitable for modeling sparse and noisy seizure data. In addition, several modifications for the proposed system are introduced to cope with non-stationarity of noisy iEEG signal. They include: (1) periodic re-training of SVM classifier using most recent training data; (2) removing samples with noisy labels from training data; (3) introducing new adaptive post-processing technique for combining many predictions made for 20-second windows into a single prediction for 4 hr segment. Application of the proposed system requires only 2 lead seizures for training the initial model, and results in high prediction performance for all four dogs (with mean 0.84 sensitivity, 0.27 time-in-warning, and 0.78 false-positive rate per day). Proposed system achieves accurate prediction of lead seizures during long-term test periods, 3–16 lead seizures during 169–364 days test period, whereas earlier studies did not differentiate between lead vs. non-lead seizures and used much shorter test periods (~few days long).
Objective: This paper describes a data-analytic modeling approach for the prediction of epileptic seizures from intracranial electroencephalogram (iEEG) recording of brain activity. Even though it is widely accepted that statistical characteristics of iEEG signal change prior to seizures, robust seizure prediction remains a challenging problem due to subject-specific nature of data-analytic modeling. Methods: Our work emphasizes the understanding of clinical considerations important for iEEG-based seizure prediction, and proper translation of these clinical considerations into data-analytic modeling assumptions. Several design choices during preprocessing and postprocessing are considered and investigated for their effect on seizure prediction accuracy. Results: Our empirical results show that the proposed support vector machine-based seizure prediction system can achieve robust prediction of preictal and interictal iEEG segments from dogs with epilepsy. The sensitivity is about 90-100%, and the false-positive rate is about 0-0.3 times per day. The results also suggest that good prediction is subject specific (dog or human), in agreement with earlier studies. Conclusion : Good prediction performance is possible only if the training data contain sufficiently many seizure episodes, i.e., at least 5-7 seizures. Significance: The proposed system uses subject-specific modeling and unbalanced training data. This system also utilizes three different time scales during training and testing stages.
Examining how in-game behavior preferences map onto real world demographics provides important empirically-derived insights into how to match game-based mechanisms to target demographic segments. Using behavioral and demographic data from 1,037 World of Warcraft players, we use multiple regressions to provide this mapping. Given current interest in "gamifying" applications, we believe these findings are relevant for both gaming and non-gaming research.
University of Minnesota Ph.D. dissertation. October 2017. Major: Electrical/Computer Engineering. Advisor: Vladimir Cherkassky. 1 computer file (PDF); ix, 109 pages.
Exploiting additional information to improve traditional inductive learning is an active research area in machine learning. In many supervised-learning applications, data can be naturally separated into several groups, or tasks, and incorporating this information into learning may improve generalization. There are many Multi-Task Learning (MTL) techniques for classification recently proposed in machine learning. This paper focuses on analysis and comparison of the two recent SVM-based MTL techniques: regularized MTL (rMTL) and SVM+ based MTL (SVM+MTL). In particular, our analysis shows how these two methods can be implemented using standard SVM software. Further, we present extensive empirical comparisons between these two methods, which relates advantages/limitations of each method to statistical characteristics of the training data.
We describe a novel system for online prediction of lead seizures from long-term intracranial electroencephalogram (iEEG) recordings for canines with naturally occurring epilepsy. This study adopts new specification of lead seizures, reflecting strong clustering of seizures in observed data. This clustering results in fewer lead seizures (~7 lead seizures per dog), and hence new challenges for online seizure prediction, that are addressed in the proposed system. In particular, the machine learning part of the system is implemented using the group learning method suitable for modeling sparse and noisy seizure data. In addition, several modifications for the proposed system are introduced to cope with the non-stationarity of a noisy iEEG signal. They include: (1) periodic retraining of the SVM classifier using most recent training data; (2) removing samples with noisy labels from training data; and (3) introducing a new adaptive post-processing technique for combining many predictions made for 20 s windows into a single prediction for a 4 h segment. Application of the proposed system requires only two lead seizures for training the initial model, and results in high prediction performance for all four dogs (with mean 0.84 sensitivity, 0.27 time-in-warning, and 0.78 false-positive rate per day). The proposed system achieves accurate prediction of lead seizures during long-term test periods, 3-16 lead seizures during a 169-364 day test period, whereas earlier studies did not differentiate between lead vs. non-lead seizures and used much shorter test periods (~few days long).
There is a growing interest in data-analytic modeling for prediction and/or detection of epileptic seizures from EEG recording of brain activity [1-10]. Even though there is clear evidence that many patients have changes in EEG signal prior to seizures, development of robust seizure prediction methods remains elusive [1]. We argue that the main issue for development of effective EEG-based predictive models is an apparent disconnect between clinical considerations and dataanalytic modeling assumptions. We present an SVM-based system for seizure prediction, where design choices and performance metrics are clearly related to clinical objectives and constraints. This system achieves very accurate prediction of preictal and interictal EEG segments in dogs with naturally occurring epilepsy. However, our empirical results suggest that good prediction performance may be possible only if the training data set has sufficiently many preictal segments, i.e. at least 6-7 seizure episodes.
During periods of extreme market volatility, such as that experienced during the COVID-19 pandemic, advised investors may consider impulsive and inappropriate investment decisions like moving all assets to cash. Financial advisors, through proactive behavioral coaching, can help their clients avoid such decisions. But which clients need the most help? A predictive model that better identifies the clients most likely to react to market volatility can be an invaluable tool for financial advisors. Such a model requires insight into the investors’ mindset. In previous work, the authors focused on the perspective of the financial advisor and used natural language processing to explore advisors’ summary notes to extract such investor insights. They then used this novel data source as input for a machine-learning model to predict the investors most in need of intervention during volatile market periods. In this article, the authors further expand the model to include a unique dataset of investors’ digital activity, including investor-initiated contacts (via web, email, and phone) and web activity (page view and browsing history), to better reveal investor intention. Using machine-learning techniques, the authors build a model using this novel dataset as well as advisor notes, transaction activity, and a market volatility index to identify advised investors most in need of proactive intervention. The authors further describe the implication such work has for both traditional and robo-advisory service models.