Video Surveillance for Violence Detection Using Deep Learning

2020 
In order to detect violence through surveillance cameras, we provide a neural architecture which can sense violence and can be a measure to prevent any chaos. This architecture uses a pre-trained ResNet-50 model to extract features from the video frames and then feeds them further into a ConvLSTM block. We use a short-term difference of video frames to provide more robustness in order to get rid of occlusions and discrepancies. Convolutional neural networks allow us to get more concentrated spatio-temporal features in the frames, which aids the sequential nature of videos to be fed in LSTMs. The model incorporates a pre-trained convolutional neural network connected to convolutional LSTM layer. The model takes raw videos as an input, converts it into frames, and outputs a binary classification of violence or non-violence label. We have pre-processed the video frames using cropping, dark-edge removal, and other data augmentation techniques to make data get rid of unnecessary details. For evaluation of the performance of our proposed method, three standard public datasets were used, and accuracy as the metric evaluation is used.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    5
    Citations
    NaN
    KQI
    []