This paper addresses the problem of detecting and localizing abnormal activities in crowded scenes. A spatiotemporal Laplacian eigenmap method is proposed to extract different crowd activities from videos. This is achieved by learning the spatial and temporal variations of local motions in an embedded space. We employ representatives of different activities to construct the model which characterizes the regular behavior of a crowd. This model of regular crowd behavior allows the detection of abnormal crowd activities both in local and global contexts and the localization of regions which show abnormal behavior. Experiments on the recently published data sets show that the proposed method achieves comparable results with the state-of-the-art methods without sacrificing computational simplicity.
This paper investigates a new method to simulate pedestrian crowd movement in a large and complex virtual environment, representing a public place such as a shopping mall. To demonstrate pedestrian dynamics, we consider different group sizes with various categories of pedestrian sharing a crowded environment. Each category has its own characteristics, such as gender, age, position, velocity, and energy. The proposed method uses the multi-group microscopic model to generate real-time trajectories for all people moving in the defined virtual environment. Additionally, an agent-based model is introduced for modelling group behaviour. Based on the proposed method, every pedestrian in each group can continuously adjust their attributes and optimize their path towards the desired visiting points, while avoiding obstacles and other pedestrians. The simulation results show that the proposed method can describe a realistic simulation of dynamic behaviour in a given virtual environment.
The importance of video surveillance techniques has considerably increased since the latest terrorist incidents. Safety and security have become critical in many public areas, and there is a specific need to enable human operators to remotely monitor the activity across large environments. For these reasons, multicamera systems are needed to provide surveillance coverage across a wide area, ensuring object visibility over a large range of depths. In the development of advanced visual-based surveillance systems, a number of key issues critical to its successful operation must be addressed. This article describes the low-level image and video processing techniques needed to implement a modern surveillance system. In particular, the change detection methods for both fixed and mobile cameras (pan and tilt) are introduced and the registration methods for multicamera systems with overlapping and nonoverlapping views are discussed.
In this work we propose a novel method for supervised, keyshots based video summarization by applying a conceptually simple and computationally efficient soft, self-attention mechanism. Current state of the art methods leverage bi-directional recurrent networks such as BiLSTM combined with attention. These networks are complex to implement and computationally demanding compared to fully connected networks. To that end we propose a simple, self-attention based network for video summarization which performs the entire sequence to sequence transformation in a single feed forward pass and single backward pass during training. Our method sets a new state of the art results on two benchmarks TvSum and SumMe, commonly used in this domain.