Robust human action recognition via long short-term memory

2013 
The long short-term memory (LSTM) neural network utilizes specialized modulation mechanisms to store information for extended periods of time. It is thus potentially well-suited for complex visual processing, where the current video frame must be considered in the context of past frames. Recent studies have indeed shown that LSTM can effectively recognize and classify human actions (e.g., running, hand waving) in video data; however, these results were achieved under somewhat restricted settings. In this effort, we seek to demonstrate that LSTM's performance remains robust even as experimental conditions deteriorate. Specifically, we show that classification accuracy exhibits graceful degradation when the LSTM network is faced with (a) lower quantities of available training data, (b) tighter deadlines for decision making (i.e., shorter available input data sequences) and (c) poorer video quality (resulting from noise, dropped frames or reduced resolution). We also clearly demonstrate the benefits of memory for video processing, particularly, under high noise or frame drop rates. Our study is thus an initial step towards demonstrating LSTM's potential for robust action recognition in real-world scenarios.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    48
    Citations
    NaN
    KQI
    []