Parallel implementation of artificial neural network training

Stefano Scanzio,Sandro Cumani,Roberto Gemello,Franco Mana,Pietro Laface

Parallel implementation of artificial neural network training

2010

Stefano Scanzio
Sandro Cumani
Roberto Gemello
Franco Mana
Pietro Laface

In this paper we describe the implementation of a complete ANN training procedure for speech recognition using the block mode back-propagation learning algorithm. We exploit the high performance SIMD architecture of GPU using CUDA and its C-like language interface. We also compare the speed-up obtained implementing the training procedure only taking advantage of the multi-thread capabilities of multi-core processors. Our approach has been tested by training acoustic models for large vocabulary speech recognition tasks, showing a 6 times reduction of the time required to train real-world large size networks with respect to an already optimized implementation using the Intel MKL libraries.

Keywords:

Backpropagation
SIMD
Multi-core processor
CUDA
Signal processing
Graphics processing unit
Architecture
Machine learning
Artificial neural network
Artificial intelligence
Theoretical computer science
Computer science
Computer engineering
Pattern recognition
Hidden Markov model
Exploit
simd architecture
large size

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations