Development and external validation of a machine learning model to predict the initial dose of vancomycin for targeting an area under the concentration–time curve of 400–600 mg∙h/L

LncRNA plays an important role in many biological and disease progression by binding to related proteins. However, the experimental methods for studying lncRNA-protein interactions are time-consuming and expensive. Although there are a few models designed to predict the interactions of ncRNA-protein, they all have some common drawbacks that limit their predictive performance. In this study, we present a model called HLPI-Ensemble designed specifically for human lncRNA-protein interactions. HLPI-Ensemble adopts the ensemble strategy based on three mainstream machine learning algorithms of Support Vector Machines (SVM), Random Forests (RF) and Extreme Gradient Boosting (XGB) to generate HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble, respectively. The results of 10-fold cross-validation show that HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble achieved AUCs of 0.95, 0.96 and 0.96, respectively, in the test dataset. Furthermore, we compared the performance of the HLPI-Ensemble models with the previous models through external validation dataset. The results show that the false positives (FPs) of HLPI-Ensemble models are much lower than that of the previous models, and other evaluation indicators of HLPI-Ensemble models are also higher than those of the previous models. It is further showed that HLPI-Ensemble models are superior in predicting human lncRNA-protein interaction compared with previous models. The HLPI-Ensemble is publicly available at: http://ccsipb.lnu.edu.cn/hlpiensemble/ .

Ensemble Learning

Ensemble forecasting

Boosting

10.1080/15476286.2018.1457935

Cite

Citations (97)

Effective Learning and Classification using Random Forest Algorithm

University (2014)

Vrushali Kulkarni Pradeep K. Sinha

Random Forest is a supervised machine learning algorithm. In Data Mining domain, machine learning algorithms are extensively used to analyze data, and generate predictions based on this data. Being an ensemble algorithm, Random Forest generates multiple decision trees as base classifiers and applies majority voting to combine the outcomes of the base trees. Strength of individual decision trees and correlation among the base trees are key issues which decide generalization error of Random Forest classifiers. Based on accuracy measure, Random Forest classifiers are at par with existing ensemble techniques like bagging and boosting. In this research work an attempt is made to improve performance of Random Forest classifiers in terms of accuracy, and time required for learning and classification. To achieve this, five new approaches are proposed. The empirical analysis and outcomes of experiments carried out in this research work lead to effective learning and classification using Random Forest algorithm.

Ensemble Learning

Boosting

Statistical classification

Base (topology)

Source

Cite

Citations (44)

Comparison of Random Forest and Gradient Boosting Machine Models for Predicting Demolition Waste Based on Small Datasets and Categorical Variables

International Journal of Environmental Research and Public Health (2021)

Gi-Wook Cha Hyeun-Jun Moon Young‐Chan Kim

Construction and demolition waste (DW) generation information has been recognized as a tool for providing useful information for waste management. Recently, numerous researchers have actively utilized artificial intelligence technology to establish accurate waste generation information. This study investigated the development of machine learning predictive models that can achieve predictive performance on small datasets composed of categorical variables. To this end, the random forest (RF) and gradient boosting machine (GBM) algorithms were adopted. To develop the models, 690 building datasets were established using data preprocessing and standardization. Hyperparameter tuning was performed to develop the RF and GBM models. The model performances were evaluated using the leave-one-out cross-validation technique. The study demonstrated that, for small datasets comprising mainly categorical variables, the bagging technique (RF) predictions were more stable and accurate than those of the boosting technique (GBM). However, GBM models demonstrated excellent predictive performance in some DW predictive models. Furthermore, the RF and GBM predictive models demonstrated significantly differing performance across different types of DW. Certain RF and GBM models demonstrated relatively low predictive performance. However, the remaining predictive models all demonstrated excellent predictive performance at R2 values > 0.6, and R values > 0.8. Such differences are mainly because of the characteristics of features applied to model development; we expect the application of additional features to improve the performance of the predictive models. The 11 DW predictive models developed in this study will be useful for establishing detailed DW management strategies.

Categorical variable

Predictive modelling

Gradient boosting

Boosting

Predictive Analytics

Hyperparameter

Ensemble forecasting

Prioritization

10.3390/ijerph18168530

Cite

Citations (72)

A new phishing-website detection framework using ensemble classification and clustering

International Journal of Data and Network Science (2023)

Mohammad A. Alsharaiah Ahmad Adel Abu-Shareha Mosleh Abualha Laith H. Baniata Omar Adwan

Phishing websites are characterized by distinguished visual, address, domain, and embedded features, which identify and defend such threats. Yet, phishing website detection is challenged by overlapping these features with legitimate websites’ features. As the inter-class variance between legitimate and phishing websites becomes low, commonly utilized machine learning algorithms suffer from low performance in overlapping feature cases. Alternatively, ensemble learning that combines multiple predictions intending to address low inter-class variations in the classified data improves the performance in such cases. Ensemble learning utilizes multiple classifiers of similar or different types with multiple deviations of the training data. This paper develops a framework based on random forest ensemble techniques. The limitations of the random forest are the inability to capture the high correlation between features and their join dependency on the label. The random forest is combined with k-means clustering to capture the feature correlation. The framework is evaluated for phishing detection with a dataset of 5000 samples. The results showed the proposed framework over-performed the random forest classifier, all other ensemble classifiers, and the conventional classification algorithms. The proposed framework achieved an accuracy of 98.64%, precision of 0.986, recall of 0.987, and F-measure of 0.986.

Phishing

Ensemble Learning

Feature (linguistics)

Ensemble forecasting

10.5267/j.ijdns.2023.1.003

Cite

Citations (14)

Improving car price prediction performance using stacking ensemble learning based on ann and random forest

Journal of Soft Computing Exploration (2024)

Yulizchia Malica Pinkan Tanga Robert Panca R. Simanjuntak Rofik Rofik Much Aziz Muslim

Determining the right selling price for a car can be a challenge for car sales companies. The selling price of a car is highly influenced by car characteristics such as brand, type, year of production, fuel type, and mileage. Therefore, the research aims to develop a more accurate model of car price prediction model by using a stacking ensemble technique that combines Random Forest and ANN. Random Forest is effective in handling outliers and reducing the risk of overfitting, while ANN has the advantage of capturing complex nonlinear patterns. The results show that the stacking ensemble model combining ANN and Random Forest can predict car sales prices by achieving an R2 value of 0.97. The results of this study can help distributors in selling cars make the right decisions regarding the sales price of cars. To improve the generalization of the model, future research is recommended to try a combination of different ensemble methods and the use of larger and more diverse datasets.

Ensemble Learning

10.52465/joscex.v5i3.462

Cite

Citations (0)

Deception Detection using Random Forest-based Ensemble Learning

Research Square (Research Square) (2023)

Kun Bu Kandethody M. Ramachandran

Abstract The purpose of this work is to detect people lying using different ensemble machine learning algorithms to conclude a better classification model through comparison. Random Forest (RF) did an efficient work while dealing with both classification and regression problems; In this paper, we proposed a Random Forest-based ensemble learning, which is the combination of RF with SVM, GLM, KNNs, and GBM to improve the model performance. The data set that we used to fit into the machine learning models is Miami University Deception Detection Database (MU3D). MU3D is a free resource containing 320 videos of Black and White targets, female and male, telling truths and lies. We fit the MU3D video level data set into Random Forest-based ensemble learning models, which includes RF + SVM.Linear, RF + SVM.Poly, RF + GLM, RF + KNNs, RF + GBM (Stochastic Gradient Boosting) and RF + WSRF (Weighted Subspace Random Forest). As a comprehensive comparison of the model performance, we conclude our new combination of algorithms performs better than the traditional machine learning models. Our contribution in this work provides a robust classification method which improves the predicted performance while avoiding model overfitting.

Overfitting

Ensemble Learning

Boosting

AdaBoost

Gradient boosting

Ensemble forecasting

Ground truth

10.21203/rs.3.rs-2460665/v1

Cite

Citations (0)

A Calibration of Multi-model Ensemble for Weather Prediction of PyeongChang Area

Journal of Climate Research (2017)

Chansoo Kim

In this study, a weighted ensemble method of numerical weather prediction by ensemble models is applied for PyeongChang area. The post-processing method takes into account combination and calibration of forecasts from different numerical models, assigning greater weight to ensemble models that exhibit the better performance. Three different numerical models, including European Center Medium-Range Weather Forecast, Ensemble Prediction System for Global, and Limited Area Ensemble Prediction System, were used to perform the post-processing method. We compared the model outputs from the weighed combination of ensembles with those from the Ensemble Model Output Statistics (EMOS) model for each raw ensemble model. The results showed that the weighted ensemble method can significantly improve the post-processing performance, compared to the raw ensemble method of the numerical models.

Ensemble forecasting

Ensemble Learning

Ensemble average

10.14383/cri.2017.12.4.337

Cite

Citations (0)

Random forest ensemble using a weight-adjusted voting algorithm

Journal of the Korean Data and Information Science Society (2020)

Ahhyoun Kim Jeeae Myung Hyunjoong Kim

A classification ensemble is a learning method that aggregates different classifiers to obtain more accurate class predictions. Among many developed methods, Random Forest is known as one of the most accurate ensemble methods. It combines many randomized decision trees using simple majority voting scheme. Wave, a weighted voting algorithm, has proven to outperform simple majority voting when combined with bagging. In this paper, we investigated whether random forest using the wave voting scheme can further improve classification accuracy. Experiments show that the larger the ensemble size, the more accurate it is than other methods including single tree, bagging, AdaBoost, random forest (simple majority voting). The result also shows that random forest with wave is more accurate than bagging with wave when the ensemble size is large enough.

Ensemble Learning

AdaBoost

Majority Rule

Weighted voting

10.7465/jkdi.2020.31.2.427

Cite

Citations (5)

The ensemble approach to forecasting: A review and synthesis

Transportation Research Part C Emerging Technologies (2021)

Hao Wu David Levinson

Ensemble forecasting

Ensemble Learning

Robustness

Statistical ensemble

Limiting

10.1016/j.trc.2021.103357

Cite

Citations (72)

A Meta-Ensemble Classifier Approach: Random Rotation Forest

Balkan Journal of Electrical and Computer Engineering (2019)

Erdal Taşçı

Ensemble learning is a popular and intensively studied field in machine learning and pattern recognition to increase the performance of the classification. Random forest is so important for giving fast and effective results. On the other hand, Rotation Forest can get better performance than Random Forest. In this study, we present a meta-ensemble classifier, called Random Rotation Forest to utilize and combine the advantages of two classifiers (e.g. Rotation Forest and Random Forest). In the experimental studies, we use three base learners (namely, J48, REPTree, and Random Forest) and two meta-learners (namely, Bagging and Rotation Forest) for ensemble classification on five datasets in UCI Machine Learning Repository. The experimental results indicate that Random Rotation Forest gives promising results according to base learners and bagging ensemble approaches in terms of accuracy rates, AUC, precision and recall values. Our method can be used for image/pattern recognition and machine learning problems.

Ensemble Learning

C4.5 algorithm

Bootstrap aggregating

10.17694/bajece.502156

Cite

Citations (13)