logo
    Development and external validation of a machine learning model to predict the initial dose of vancomycin for targeting an area under the concentration–time curve of 400–600 mg∙h/L
    0
    Citation
    29
    Reference
    10
    Related Paper
    LncRNA plays an important role in many biological and disease progression by binding to related proteins. However, the experimental methods for studying lncRNA-protein interactions are time-consuming and expensive. Although there are a few models designed to predict the interactions of ncRNA-protein, they all have some common drawbacks that limit their predictive performance. In this study, we present a model called HLPI-Ensemble designed specifically for human lncRNA-protein interactions. HLPI-Ensemble adopts the ensemble strategy based on three mainstream machine learning algorithms of Support Vector Machines (SVM), Random Forests (RF) and Extreme Gradient Boosting (XGB) to generate HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble, respectively. The results of 10-fold cross-validation show that HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble achieved AUCs of 0.95, 0.96 and 0.96, respectively, in the test dataset. Furthermore, we compared the performance of the HLPI-Ensemble models with the previous models through external validation dataset. The results show that the false positives (FPs) of HLPI-Ensemble models are much lower than that of the previous models, and other evaluation indicators of HLPI-Ensemble models are also higher than those of the previous models. It is further showed that HLPI-Ensemble models are superior in predicting human lncRNA-protein interaction compared with previous models. The HLPI-Ensemble is publicly available at: http://ccsipb.lnu.edu.cn/hlpiensemble/ .
    Ensemble Learning
    Ensemble forecasting
    Boosting
    Random Forest is a supervised machine learning algorithm. In Data Mining domain, machine learning algorithms are extensively used to analyze data, and generate predictions based on this data. Being an ensemble algorithm, Random Forest generates multiple decision trees as base classifiers and applies majority voting to combine the outcomes of the base trees. Strength of individual decision trees and correlation among the base trees are key issues which decide generalization error of Random Forest classifiers. Based on accuracy measure, Random Forest classifiers are at par with existing ensemble techniques like bagging and boosting. In this research work an attempt is made to improve performance of Random Forest classifiers in terms of accuracy, and time required for learning and classification. To achieve this, five new approaches are proposed. The empirical analysis and outcomes of experiments carried out in this research work lead to effective learning and classification using Random Forest algorithm.
    Ensemble Learning
    Boosting
    Statistical classification
    Base (topology)
    Citations (44)
    Construction and demolition waste (DW) generation information has been recognized as a tool for providing useful information for waste management. Recently, numerous researchers have actively utilized artificial intelligence technology to establish accurate waste generation information. This study investigated the development of machine learning predictive models that can achieve predictive performance on small datasets composed of categorical variables. To this end, the random forest (RF) and gradient boosting machine (GBM) algorithms were adopted. To develop the models, 690 building datasets were established using data preprocessing and standardization. Hyperparameter tuning was performed to develop the RF and GBM models. The model performances were evaluated using the leave-one-out cross-validation technique. The study demonstrated that, for small datasets comprising mainly categorical variables, the bagging technique (RF) predictions were more stable and accurate than those of the boosting technique (GBM). However, GBM models demonstrated excellent predictive performance in some DW predictive models. Furthermore, the RF and GBM predictive models demonstrated significantly differing performance across different types of DW. Certain RF and GBM models demonstrated relatively low predictive performance. However, the remaining predictive models all demonstrated excellent predictive performance at R2 values > 0.6, and R values > 0.8. Such differences are mainly because of the characteristics of features applied to model development; we expect the application of additional features to improve the performance of the predictive models. The 11 DW predictive models developed in this study will be useful for establishing detailed DW management strategies.
    Categorical variable
    Predictive modelling
    Gradient boosting
    Boosting
    Predictive Analytics
    Hyperparameter
    Ensemble forecasting
    Prioritization
    Citations (72)
    Phishing websites are characterized by distinguished visual, address, domain, and embedded features, which identify and defend such threats. Yet, phishing website detection is challenged by overlapping these features with legitimate websites’ features. As the inter-class variance between legitimate and phishing websites becomes low, commonly utilized machine learning algorithms suffer from low performance in overlapping feature cases. Alternatively, ensemble learning that combines multiple predictions intending to address low inter-class variations in the classified data improves the performance in such cases. Ensemble learning utilizes multiple classifiers of similar or different types with multiple deviations of the training data. This paper develops a framework based on random forest ensemble techniques. The limitations of the random forest are the inability to capture the high correlation between features and their join dependency on the label. The random forest is combined with k-means clustering to capture the feature correlation. The framework is evaluated for phishing detection with a dataset of 5000 samples. The results showed the proposed framework over-performed the random forest classifier, all other ensemble classifiers, and the conventional classification algorithms. The proposed framework achieved an accuracy of 98.64%, precision of 0.986, recall of 0.987, and F-measure of 0.986.
    Phishing
    Ensemble Learning
    Feature (linguistics)
    Ensemble forecasting
    Citations (14)
    Determining the right selling price for a car can be a challenge for car sales companies. The selling price of a car is highly influenced by car characteristics such as brand, type, year of production, fuel type, and mileage. Therefore, the research aims to develop a more accurate model of car price prediction model by using a stacking ensemble technique that combines Random Forest and ANN. Random Forest is effective in handling outliers and reducing the risk of overfitting, while ANN has the advantage of capturing complex nonlinear patterns. The results show that the stacking ensemble model combining ANN and Random Forest can predict car sales prices by achieving an R2 value of 0.97. The results of this study can help distributors in selling cars make the right decisions regarding the sales price of cars. To improve the generalization of the model, future research is recommended to try a combination of different ensemble methods and the use of larger and more diverse datasets.
    Ensemble Learning
    Citations (0)
    Abstract The purpose of this work is to detect people lying using different ensemble machine learning algorithms to conclude a better classification model through comparison. Random Forest (RF) did an efficient work while dealing with both classification and regression problems; In this paper, we proposed a Random Forest-based ensemble learning, which is the combination of RF with SVM, GLM, KNNs, and GBM to improve the model performance. The data set that we used to fit into the machine learning models is Miami University Deception Detection Database (MU3D). MU3D is a free resource containing 320 videos of Black and White targets, female and male, telling truths and lies. We fit the MU3D video level data set into Random Forest-based ensemble learning models, which includes RF + SVM.Linear, RF + SVM.Poly, RF + GLM, RF + KNNs, RF + GBM (Stochastic Gradient Boosting) and RF + WSRF (Weighted Subspace Random Forest). As a comprehensive comparison of the model performance, we conclude our new combination of algorithms performs better than the traditional machine learning models. Our contribution in this work provides a robust classification method which improves the predicted performance while avoiding model overfitting.
    Overfitting
    Ensemble Learning
    Boosting
    AdaBoost
    Gradient boosting
    Ensemble forecasting
    Ground truth
    In this study, a weighted ensemble method of numerical weather prediction by ensemble models is applied for PyeongChang area. The post-processing method takes into account combination and calibration of forecasts from different numerical models, assigning greater weight to ensemble models that exhibit the better performance. Three different numerical models, including European Center Medium-Range Weather Forecast, Ensemble Prediction System for Global, and Limited Area Ensemble Prediction System, were used to perform the post-processing method. We compared the model outputs from the weighed combination of ensembles with those from the Ensemble Model Output Statistics (EMOS) model for each raw ensemble model. The results showed that the weighted ensemble method can significantly improve the post-processing performance, compared to the raw ensemble method of the numerical models.
    Ensemble forecasting
    Ensemble Learning
    Ensemble average
    A classification ensemble is a learning method that aggregates different classifiers to obtain more accurate class predictions. Among many developed methods, Random Forest is known as one of the most accurate ensemble methods. It combines many randomized decision trees using simple majority voting scheme. Wave, a weighted voting algorithm, has proven to outperform simple majority voting when combined with bagging. In this paper, we investigated whether random forest using the wave voting scheme can further improve classification accuracy. Experiments show that the larger the ensemble size, the more accurate it is than other methods including single tree, bagging, AdaBoost, random forest (simple majority voting). The result also shows that random forest with wave is more accurate than bagging with wave when the ensemble size is large enough.
    Ensemble Learning
    AdaBoost
    Majority Rule
    Weighted voting
    Ensemble learning is a popular and intensively studied field in machine learning and pattern recognition to increase the performance of the classification. Random forest is so important for giving fast and effective results. On the other hand, Rotation Forest can get better performance than Random Forest. In this study, we present a meta-ensemble classifier, called Random Rotation Forest to utilize and combine the advantages of two classifiers (e.g. Rotation Forest and Random Forest). In the experimental studies, we use three base learners (namely, J48, REPTree, and Random Forest) and two meta-learners (namely, Bagging and Rotation Forest) for ensemble classification on five datasets in UCI Machine Learning Repository. The experimental results indicate that Random Rotation Forest gives promising results according to base learners and bagging ensemble approaches in terms of accuracy rates, AUC, precision and recall values. Our method can be used for image/pattern recognition and machine learning problems.
    Ensemble Learning
    C4.5 algorithm
    Bootstrap aggregating
    Citations (13)