logo
    Wind Speed Prediction using Extra Tree Classifier
    4
    Citation
    9
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    A cluster of wind turbines in the same site that generates power. Using turbines perform effectively with severe winds and optimal wind speed. For a wind farm, the wind direction and speed can be projected that wind turbines would operate efficiently. So, the wind generators' output will be having increased effectiveness. Big data and machine learning are defined as a large collection of datasets that are advanced to process. Wind speed forecasting is one of the most critical responsibilities in a wind farm. Machine learning approaches are frequently used to forecast time series non-linear wind behavior. This research provides a wind dataset prediction model that relies on the Extra Tree classifier in this context. The proposed model has the benefit of being simple, quick, and well-suited to the short term. The accuracy of the project is then compared with bagging classifier and Ada boost Classifier algorithms in their regression mode, and then the project aims to illustrate how wind direction may affect power generation and why it is vital to anticipate it. A real-time series data collection contains past values of characteristics like speed of wind, temperature, and atmospheric pressure, they are used to forecast the speed of the wind. The suggested model Extra Tree classifier will be evaluated using Mean Absolute, Mean Square Error values, and its performance will be compared to that of bagging classifier and Ada boost Classifier algorithm models.
    군특성화고 정책은 양질의 군인적자원을 수급하기 위해 도입된 정책으로 학·군(學軍) 협력을 기반으로 하는 국방인적자원관리(Military HRM)의 성격을 지닌다. 이에 본 연구는 머신러닝을 활용하여 군특성화고 정책이 내재한 인적자원개발의 측면을 실증적으로 분석하고 전문병 선발 예측 모델과 중요 변수를 제시한다.BR 이를 위해 국내 군특성화고등학교 A학교의 졸업생 850여 명의 교육 및 진로 데이터의 전처리를 수행하여 50여개의 투입변수를 최종적으로 획득하였다. '전문병 선발'을 타겟변수로 선정하여 과대 표집을 통해 타겟변수의 클래스 불균형을 해소한 후 머신러닝의 예측모델을 훈련하였다.BR 전문병 선발을 정확하게 예측할 수 있는 최적 모델 수립을 위해 Random Forest, XGBoost, LightGBM, SVM, Logistic과 같은 5개 머신러닝 알고리즘을 타겟변수 클래스가 불균형한 원천 데이터와 과대표집을 시행한 과대 표집데이터에 모두 적용하여 총 10개의 모델을 훈련하였다. 모델 훈련 과정에서 층화 k-Fold 교차검증을 함께 수행하여 과적합을 예방하였고 최적 모델을 구현하는 데 적합한 초매개변수를 탐색하였다.BR 훈련 결과 Random Forest 알고리즘으로 훈련한 모델의 예측 성능이 원천 데이터 및 과대표집 데이터로 훈련한 모든 경우에서 가장 우수하였다. AUC값을 기준으로 할 때 원천 데이터로 훈련한 Random Forest(RF) 모델 성능은 0.76에 근사했고 과대표집 데이터로 훈련한 Random Forest 모델(RF_over) 성능은 0.85 수준으로 향상했다. 투입변수 중요도를 평가한 결과 50여 개 투입변수 중'면허_취득/미취득', '전공기능사' 등 전공 전문성과 관련된 변수가'전문병 선발'여부에 가장 큰 영향을 미친 것으로 나타났다.BR 추가적으로 모델의 편향성을 점검하기 위해 원천 데이터와 과대표집 데이터를 무작위로 표집하여 평가를 실시한 결과 RF와 RF_over 두 모델의 AUC 값이 모두 0.5에 수렴하는 결과를 보였다. 이는 훈련한 머신러닝 모델이 특정 변수에 의존하지 않으면서 상당한 수준의 성능을 보이는 것으로 이해할 수 있다.BR 본 연구의 결과는 머신러닝을 활용한 군특성화고 연구의 가능성을 제시할 뿐 아니라 실제 교육현장에서 군특성화고 정책의 효과성에 기여하는 요소를 특정할 수 있음을 보여준다. 이러한 결과는 군특 전문병의 원활한 선발과 수급을 위해 전공 전문성 및 교육훈련을 강화한 인적자원관리의 필요성을 제기한다. 또한 이를 통해 머신러닝을 활용한 인사이트 획득과 데이터에 기반한 전사적 국방인적자원관리의 가능성을 모색할 수 있을 것으로 기대한다.
    As high-wind energy potential regions are less common now; it is becoming more crucial to generate wind energy in places where the wind velocity is light to moderate. This study uses the WERA model to estimate and compare the performances of 4 commercial wind turbines under low power density wind regimes. Wind turbines of 5 kW-rated capacity, from four prominent manufacturers, were considered in the study. The turbine's velocity power response and the site's Rayleigh probability density of wind velocity were used to model these turbines' performance at four typical sites with different average wind speeds in Kerala namely Thiruvananthapuram, Kollam, Kottayam, Pathanamthitta. The turbine's performances are quantified with the energy production and capacity factor at different locations. It was revealed that the turbine's velocity power response is a crucial factor influencing the system performance. Reduction in the cut-in and rated wind speeds seems to improve the system's output in areas with low wind velocity.
    Rayleigh distribution
    Small wind turbine
    This research aimed to predict smart phone prices using two supervised machine learning algorithms: Decision Tree and Random Forest Regression. Data was collected from the Indian e-Commerce website Flip kart using Python libraries such as Beautiful Soup and Selenium, and was cleaned and pre-processed for analysis. The results showed that the Decision Tree algorithm had an R^2of 89.3%. The Random Forest classifier showed the R^2 value with an accuracy score of 82.8%. The study offers a method for accurately predicting smart phone prices that could be useful to determine the cost of their products and ultimately benefit the entire smart phone market. Key Word: Smartphone, Price Prediction, Machine Learning, Decision Tree, Random Forest Regression.
    Python
    Supervised Learning
    Citations (1)
    Random Forest (RF) is a powerful supervised learner and has been popularly used in many applications such as bioinformatics. In this work we propose the guided random forest (GRF) for feature selection. Similar to a feature selection method called guided regularized random forest (GRRF), GRF is built using the importance scores from an ordinary RF. However, the trees in GRRF are built sequentially, are highly correlated and do not allow for parallel computing, while the trees in GRF are built independently and can be implemented in parallel. Experiments on 10 high-dimensional gene data sets show that, with a fixed parameter value (without tuning the parameter), RF applied to features selected by GRF outperforms RF applied to all features on 9 data sets and 7 of them have significant differences at the 0.05 level. Therefore, both accuracy and interpretability are significantly improved. GRF selects more features than GRRF, however, leads to better classification accuracy. Note in this work the guided random forest is guided by the importance scores from an ordinary random forest, however, it can also be guided by other methods such as human insights (by specifying $\lambda_i$). GRF can be used in "RRF" v1.4 (and later versions), a package that also includes the regularized random forest methods.
    Interpretability
    Feature (linguistics)
    Random testing
    Citations (60)
    Breast cancer is one cancer that is becoming more prevalent every day. It's becoming worse due to a lack of detection. Lowering the death rate may be possible with quick detection. Based on the Wisconsin Breast Cancer dataset, this study suggests a machine learning-based strategy for identifying breast cancer. There were five distinct machine learning algorithms tested. Logistic Regression has given 94.73% accuracy, Decision Tree has 92.98% accuracy, Random Forest has 98.24% accuracy, and Support Vector Machine (SVM) has 96.49% accuracy. Random Forest has given the highest accuracy which is 98.24 %.
    The main objective of the study is to classify the music genre using the features that are extracted from audio files. The classification is done using Novel Random Forest and Decision Tree and corresponding results are compared in terms of accuracy. Materials and Methods: The GTZAN dataset used in this study is obtained from the MARSYAS website, which is used for Music Information Retrieval, consists of 1000 music files in the .au format. It is also referred to as the standard dataset to the date. The acoustic features of music called Mel-frequency cepstral coefficients(MFCC) that create patterns and help to predict the genre are extracted from the Music files. The data analysis, model training, and testing process are done entirely on the Jupyter platform. The Sample size was 20 per group. The pretest power obtained was 0.08. Results: From the experimental results it is observed that Novel Random forest gives an accuracy of 71.78% while Decision tree gives an accuracy of 59.89%. The classification process is done with both Novel Random Forest and Decision Tree, where the sample size N is 20 for two groups proposed (N=20) and comparison (N=20). The pretest power obtained is 0.08. Conclusion: In this study, it is found that the Random Forest model outperforms the Decision tree model in terms of accuracy by predicting the music genre efficiently.
    Mel-frequency cepstrum
    Sample (material)
    Tree (set theory)
    Wind speed profile in the atmospheric layer is critical factor for wind turbine capacity factor estimation. Prior to the installation of wind farm, it is essential to estimate expected energy output in order to assess the economic viability of the project. Wind speeds measurements are generally carried out at 10 or 30 m whereas most turbines in commercial use at present have hub heights between 60 and 100 m. Therefore, wind speed measurements are extrapolated to the wind turbine hub height. In this paper, 16 different extrapolation methods were reviewed and compared to determine wind speed, power density and their energy generation estimation capability. Two different error analyses were used to determine the best method. Utilized wind data was gathered from Turkey for 10, 30 and 50 m.
    Capacity factor
    Heipang in Plateau State is classified under moderate wind speed regime in Nigeria, thus, has high potential for wind electricity generation. Due to high cost, it is difficult to design a wind turbine for a particular site; therefore, the designer of the wind energy project has to choose from the available options in markets, which come in different sizes and speed characteristics. This paper is aimed at evaluating the performance of some selected wind turbines in Heipang wind speed regime. The method used is based on wind speed analysis and computation of the capacity factors of wind turbines expressed as a product of wind turbines’ power output models and probability distribution of wind speed regime of Heipang; and the total annual energy generation of the wind turbines using Wind Energy Resources Analysis (WERA) software. Results showed that Heipang has an annual mean wind speed of 6.3 m/s and its wind speed regime best fitted into Weibull probability distribution function with average Weibull shape and scale parameters of 3.05 and 7.03 m/s respectively at 10 m height. For small (<10kW), medium (10kW-250kW) and large (>250kW) wind turbines classifications; WT4, WT14 and WT25 have the highest capacity factors of 0.61, 0.7 and 0.53 respectively and WT2, WT22, WT27 have the highest total annual energy generation of 0.0054, 1.12 and 4.03 GWh/year respectively. In conclusion, Heipang has high wind speed potential for wind power technology and wind turbines with higher annual energy generation are better options for selections for wind power generation applications
    Citations (0)
    The use of credit cards is increasing in today's digital era. This increase has resulted in many cases of fraud which have had a negative impact on credit card owners. To overcome this, many financial institutions have developed credit card fraud detection systems that can identify suspicious transactions. This study uses a classification method, namely random forest and decision tree to identify illegal transactions using a credit card, which then compares the results and attempts to create a model that can be useful for detecting fraud using a credit card that is more accurate and effective. The result of this study is that the accuracy provided by the Decision Tree Classifier is 0.98, while the accuracy provided by the Random Forest Classification is also 0.975. The conclusion obtained that the decision tree has a higher level of accuracy compared to the Random Forest Classification Algorithm, which is 98%. On the other hand, the Random Forest classification algorithm has a slightly lower level of accuracy compared to the Decision Tree classification algorithm, with an accuracy rate of 97.5%
    Credit card fraud
    Random tree
    Statistical classification
    Using a decision support system (DSS) that classifies various cancers provides support to the clinicians/researchers to make better decisions that can aid in early cancer diagnosis, thereby reducing chances of incorrect disease diagnosis. Thus, this work aimed at designing a classification model that can predict accurately for 5 different cancer types comprising of 20 cancer exomes, using the mutations identified from whole exome cancer analysis. Initially, a basic model was designed using supervised machine learning classification algorithms such as K-nearest neighbor (KNN), support vector machine (SVM), decision tree, naïve bayes and random forest (RF), among which decision tree and random forest performed better in terms of preliminary model accuracy. However, output predictions were incorrect due to less training scores. Thus, 16 essential features were then selected for model improvement using 2 approaches. All imbalanced datasets were balanced using SMOTE. In the first approach, all features from 20 cancer exome datasets were trained and models were designed using decision tree and random forest. Balanced datasets for decision tree model showed an accuracy of 77%, while with the RF model, the accuracy improved to 82% where all 5 cancer types were predicted correctly. Area under the curve for RF model was closer to 1, than decision tree model. In the second approach, all 15 datasets were trained, while 5 were tested. However, only 2 cancer types were predicted correctly. To cross validate RF model, Matthew's correlation co-efficient (MCC) test was performed. For method 1, the MCC test and MCC cross validation was found to be 0.7796 and 0.9356 respectively. Likewise, for second approach, MCC was observed to be 0.9365, corroborating the accuracy of the designed model. The model was successfully deployed using Streamlit as a web application for easy use. This study presents insights for allowing easy cancer classifications.
    Tree (set theory)
    Citations (1)