GSP Distributed Deep Learning Used for the Monitoring System

2021 
Monitoring system will collect massive data and store them in many distributed data centers. The traditional distributed deep learning needs to centralize data to the Parameter Server before training, then shuffle the data, and finally divide the data to each computing node uniformly. When training a neural network model with the data from monitoring system, it will result in huge communication overhead. We propose the Grouped–Based–on–Local–Data Synchronous Parallel (GSP) distributed deep learning. The use of local data can greatly reduce the amount of communication. At the same time, it can also bring another advantage, the similarity between the data centers can be calculated based on the background of monitor lens and the annotation of the data, then group computing nodes according to the similarity of the data centers they belong to and set up the grouped synchronous communication mechanism based on the Parameter Server architecture. The result of experiment shows that the iteration time of training by GSP is greatly reduced and the test precision is consistent.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []