Breaking down violence: A deep-learning strategy to model and classify violence in videos

2018 
Detecting violence in videos through automatic means is significant for law enforcement and analysis of surveillance cameras with the intent of maintaining public safety. Moreover, it may be a great tool for protecting children from accessing inappropriate content and help parents make a better informed decision about what their kids should watch. However, this is a challenging problem since the very definition of violence is broad and highly subjective. Hence, detecting such nuances from videos with no human supervision is not only technical, but also a conceptual problem. With this in mind, we explore how to better describe the idea of violence for a convolutional neural network by breaking it into more objective and concrete parts. Initially, our method uses independent networks to learn features for more specific concepts related to violence, such as fights, explosions, blood, etc. Then we use these features to classify each concept and later fuse them in a meta-classification to describe violence. We also explore how to represent time-based events in still-images as network inputs; since many violent acts are described in terms of movement. We show that using more specific concepts is an intuitive and effective solution, besides being complementary to form a more robust definition of violence. When compared to other methods for violence detection, this approach holds better classification quality while using only automatic features.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    12
    Citations
    NaN
    KQI
    []