logo
    Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds
    33
    Citation
    46
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.
    Keywords:
    Statistical classification
    Machine Learning based learning algorithms gives the machine the power to learn on its own without being explicitly programmed by a programmer. It helps in automating the task like classification, clustering, etc. which previously required human intervention. Machine Learning-based classification algorithms today are being widely used for the diagnosis of various diseases such as Breast Cancer. Breast Cancer is one of the primary causes of death among women all over the world in the past recent years. In this paper, Random Forest, Logistics Regression, Decision Tree, Naive Bayes and SVM classification algorithms have been implemented. The experiment in this article has been conducted on the most popular Wisconsin Diagnosis Breast Cancer Dataset (WDBC)[l]. The main objective of conducting this experiment is to analyze the correctness and accuracy of classification algorithms used to classify cancer-causing tumours (Malignant) from non-cancerous tumours (Benign) with the highest precision and accuracy. The experiment results show that Random Forest gave the highest accuracy amongst all the classification algorithms.
    Statistical classification
    Malware classification is one of the most important issues in Information security, because of the huge new numbers of these malwares. Therefore, more classification methods have been proposed. Random forest (RF) is one of the extremely method in many studies or deferent feature extraction methods. It has been considered as one of the efficient methods of malware classification due to it is accurate results. In this paper, machine learning based RF classifier had been proposed to evaluate the performance of the Random Forest implementation. The RF classifier showed high performance as a detector. It has a good capability of classifying huge number of features with unimportant features. Both training and classifying accuracy have increased by reduction of the number of training feature in dataset. The RF classifier have achieved 95.3% of accuracy.
    Statistical classification
    For classifying Alzheimer's disease (AD) by analyzing medical image data, in this paper a computer-aided diagnosis method is proposed based on random forest algorithm. In this study functional magnetic resonance imaging (fMRI) data including 34 AD patients, 35 mild cognitive impairments (MCI) and 35 normal controls (NC) is collected. Firstly, functional connection between the different regions of whole brain is calculated using Pearson correlation coefficient. Then the importance of the functional connection between different brain regions is measured and the important features are selected using the random forest algorithm. Finally, classification is performed using support vector machine (SVM) classifier with ten-fold cross-validation. The classification model based on random forest and SVM has a good effect on the recognition of AD, and the classification accuracy rate can reach 90.68%. Functional connection characteristics can be effectively analyzed by the random forest algorithm which can distinguish AD, MCI and NC accurately. At the same time, the abnormal brain regions of AD pathogenesis can be obtained. The related experimental results can provide an objective reference for the early clinical diagnosis of AD.
    Statistical classification
    Citations (3)
    Skin disease is a very vulnerable and severe issue in today's world. Skin disorder classification is crucial for diagnosis. Several new data mining algorithms have been developed to classify and interpret medical images. The functionality of the K-Nearest Neighbors (KNN) Random Forest (RF) Algorithm is described in this article, along with an analysis of its results. Furthermore, this study demonstrates a high-performing approach that saves both effort and money. The proposed model is designed based on KNN and the Random Forest algorithm. The patient can use this model to classify his skin disease as a primary detection, and the doctor also can ensure his judgment by using this proposed model. Traditional skin disease diagnosis is an expensive and time-consuming procedure. This paper's proposed classification model will identify ten different skin diseases. The Random Forest algorithm has a testing accuracy of 94.22 percent, and K-Nearest Neighbors (KNN) has a testing accuracy of 95.23 percent. The KNN algorithm has an F1 Score of 95.98 percent, whereas the Random Forest (RF) algorithm has an F1 Score of 95.94 percent. It can be increased by expanding the dataset and more feature extraction. This approach may benefit individuals with skin illness who are looking to save money and time as well as avoid skin cancer by identifying cancer at an early stage.
    Statistical classification
    Feature (linguistics)
    Establishing characteristics of the shelter animal which determine its outcome is an important task for solving the problem of homeless and abused animals. The main goal of this research was to identify which machine learning algorithm can provide the most accurate prediction of the outcome for an animal, based on its main features. The first step in this research was the transformation of data into a proper form for the implementation of the algorithms. Furthermore, several machine learning algorithms were trained in order to achieve the best possible classification results. The results of the algorithms were compared and the most suitable algorithms were selected based on their performance metrics. This research proposes using a combination of multiple data preprocessing techniques, imbalanced data and machine learning algorithms for predicting the outcome for shelter animal based on its characteristics. K-Nearest Neighbors and C4.5 algorithms provided the best classification results in this research.
    Statistical classification
    Data pre-processing
    A recent study by the World Health Organization sheds light on the alarming increase in cardiovascular diseases, contributing to approximately 17.9 million deaths annually. This study delves into the effectiveness of employing the Random Forest algorithm, a robust machine learning approach, to forecast the likelihood of heart disease based on diverse risk factors. By leveraging a dataset encompassing demographic, clinical, and lifestyle attributes, the Random Forest model underwent training to categorize individuals into two groups: those with or without heart disease. Through meticulous feature selection and ensemble learning, the algorithm adeptly captures intricate relationships among predictors, thereby augmenting prediction accuracy. Evaluation metrics including accuracy and AUC-ROC curve were employed in order to determine model's effectiveness. Impressively, our model achieves a prediction accuracy of 97%. Moreover, a comparative analysis with other prominent machine learning models such as Naive Bayes, Support Vector Machine (SVM), Logistic Regression (LR), XGBoost, Decision Tree revealed that the Random Forest approach outperforms others in terms of accuracy and efficiency in prediction tasks. Keywords: Random Forest (RF), Machine Learning (ML), Accuracy, Classification.
    Ensemble Learning
    Predictive modelling
    Citations (0)
    This paper presents a random forest-based face image classification method. The random forest is an ensemble learning method that grows many classification trees. Each tree gives a classification. The forest selects the classification that has the most votes. Three experiments are performed. The random forest-based method together with several existing approaches are trained and evaluated. The experimental results are presented and discussed.
    Contextual image classification
    Statistical classification
    Ensemble Learning
    Tree (set theory)
    Citations (24)
    In the case of diabetes classification, a lot of research has focused on one dataset with limited attributes, so that various methods are used to optimize the classification process and improve accuracy. This research aims to prove that attribute selection has an important role in the classification process, especially in the Random Forest (RF) method. RF has a bagging feature that is quite reliable in the learning process on an unbalanced dataset. In this research, two datasets that have different types and numbers of attributes were compared. The results of the test show that the attributes age, bs _ fast, bs _pp, plasma_r, plasma_f, hba1c, and type/class have an important role in improving the accuracy of the classification of diabetes Mellitus. By using the RF method, data cleaning and attribute selection can produce 100% accuracy on Abel Vika's Diabetes dataset, even though it uses a relatively small number of trees. This has also been validated by the k-fold cross-validation method.
    Statistical classification