This paper describes a method to partition a video sequence into shots and subshots. By subshots, we mean one or a combination of the three camera motions of pan, tilt and zoom. The proposed technique detects both hard cuts and gradual transitions in MPEG compressed video using a single technique. We also present a motion estimation algorithm to compute the dominant motion represented by an affine model. The motion information is used to refine the location of dissolves as well as to subdivide the shot into subshots, thus providing a characterization of camera motion. We consider the dissimilarity between the I-, P- and B-frames with respect to the type of macroblocks used for encoding. Unlike previous algorithms reported, our method requires minimal decompression of the video sequence and uses very loose thresholds. The algorithm is evaluated on several types of video sequences to demonstrate its effectiveness.
This paper presents a multi-sensor architecture with an adaptive multi-sensor management system suitable for control and navigation of autonomous maritime vessels in hazy and poor-visibility conditions. This architecture resides in the autonomous maritime vessels. It augments the data from on-board imaging sensors and weather sensors with the AIS data and weather data from sensors on other vessels and the on-shore vessel traffic surveillance system. The combined data is analyzed using computational intelligence and data analytics to determine suitable course of action while utilizing historically learnt knowledge and performing live learning from the current situation. Such framework is expected to be useful in diverse weather conditions and shall be a useful architecture to provide autonomy to maritime vessels.
We present a new class of human psychology inspired descriptors that exhibits the ability to yield meaningful structural descriptions of an object. Our framework involves (1) detecting salient pairings of lines segments which are extracted from the line edge map of an image and (2) exploiting these pairs of line segments to construct the structural descriptors. Specifically, we integrate the spatial qualities of the line segments with the perceptually salient colors of the image to jointly identify the salient pairings of the line segments. We term such pairings of line segments as the quadrangles. By harnessing the spatial configurations and the geometrical relationships between the quadrangles, we design descriptors which characterize the local structures of an object. Promising recognition results of the four-legged animals are presented.
Graph based representation has been widely used in modelling spatio-temporal relationships in video understanding. Although effective, existing graph-based approaches focus on capturing the human-object relationships while ignoring fine-grained semantic properties of the action components. These semantic properties are crucial for understanding the current situation, such as where does the action takes place, what tools are used and functional properties of the objects. In this work, we propose a graph-based representation called Situational Scene Graph (SSG) to encode both human-object relationships and the corresponding semantic properties. The semantic details are represented as predefined roles and values inspired by situation frame, which is originally designed to represent a single action. Based on our proposed representation, we introduce the task of situational scene graph generation and propose a multi-stage pipeline Interactive and Complementary Network (InComNet) to address the task. Given that the existing datasets are not applicable to the task, we further introduce a SSG dataset whose annotations consist of semantic role-value frames for human, objects and verb predicates of human-object relations. Finally, we demonstrate the effectiveness of our proposed SSG representation by testing on different downstream tasks. Experimental results show that the unified representation can not only benefit predicate classification and semantic role-value classification, but also benefit reasoning tasks on human-centric situation understanding. We will release the code and the dataset soon.