A Resource Aware Parallelized Back Propagation Neural Network in Enabling Efficient Large-Scale Digital Health Data Processing

2019 
Along with the development of digital health, efficient machine learning is anxiously needed to handle the growing health data. Among various machine learning algorithms, back propagation neural network (BPNN) shows great effectiveness in both academia and industrial fields. However, it is frequently reported that the conventional BPNN algorithm encounters low efficiency issue in dealing with large-scale digital health data. Therefore this paper presents a Hadoop based parallelized BPNN algorithm which is able to process the large-scale data efficiently. In order to complement the potential accuracy loss issue for the parallelized data processing, ensemble learning techniques are also involved. Additionally although Hadoop supplies a number of default schedulers, the heterogeneous distributed computing environment may still impact the efficiency of the parallelized BPNN. Consequently, this paper also presents a gene expression programming (GEP) algorithm based load balancing approach, which enables the computing resource awareness and the optimal scheduling of the parallelized BPNN. The experiments employ the classification task as the underlying testing basis. Two types of the experiments are carried out, in which the first one focuses on evaluating the accuracy of the presented algorithm with classifying the benchmark dataset; the second one focuses on evaluating the efficiency of the presented algorithm with classifying the large-scale dataset. The experimental results show the effectiveness of the presented resource aware parallelized BPNN algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    2
    Citations
    NaN
    KQI
    []