Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database

Alejandro Heredia-Langner,Kristin H. Jarman,Brett G. Amidan,Joel G. Pounds

Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database

2013

Alejandro Heredia-Langner
Kristin H. Jarman
Brett G. Amidan
Joel G. Pounds

This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the set of predictors to around 8 factors that can be validated using reputable medical and public health resources.

Keywords:

Feature selection
Categorical variable
Binary classification
Small set
Genetic algorithm
Machine learning
Conditional probability
Pattern recognition
Computer science
Artificial intelligence
feature discovery
Data mining
Binary number

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations