In a dynamic environment, semantic information can assist the SLAM system in eliminating dynamic point interference. However, most three-dimensional semantic segmentation methods are computationally expensive, which also have low segmentation accuracy for both distant and small objects. We propose a PMF-SLAM method to fully exploit the interaction between 3D semantic segmentation and SLAM and achieve efficient scene perception. The PMF-SLAM system includes three parts: MSF-SegNet model, Interactive SLAM module, Pose-Guiding segmentation module. To improve the accuracy of distant and small objects, MSF-SegNet merges point-wise global features and voxel-wise local features from two branches by a designed symmetrical sparse convolution structure. In the Interactive SLAM module, the coarse-to-fine registration method based on semantic information completes the estimation of pose. To implement the interaction between Segmentation and SLAM, the Pose-Guiding segmentation module was built to assist the segmentation thread in improving inference efficiency and ensuring segmentation consistency over time. Extensive experiments including both local experiment and nuScenes dataset test have been conducted to validate the performance of the proposed method. Our method achieves better accuracy than multiple segmentation algorithms, significantly improving the segmentation performance of distant and small objects. And the trajectory estimation accuracy is better than multiple SLAM algorithms. Code is available at https://github.com/haroldgt/MSF-SegNet.
The rapid development of computer vision technology provides a basic guarantee for public security reliance on video surveillance. In current video surveillance based on static cameras, accurate and quick extractions of foreground regions of moving objects enable quicker analysis of the behavior of meaningful objects and thus improve the intelligent analysis level of video surveillance. However, there would always occur false detection in the extraction of foreground regions, because of the shaking of tree branches and leaves in the scene and the “ghosting” area caused by the delayed updating of the background model. To solve this problem, this paper proposes a method for the extraction of foreground regions by using spatio-temporal information. This method can accurately extract foreground regions of moving objects by utilizing the difference and complementarity between spatial domain methods and temporal domain methods and further in combination with image processing technology. Specifically, the foreground regions of moving objects can be extracted by the morphological processing of the combination of the spatial information and the morphologically processed temporal information in the video. The experimental results show that the proposed method for the extraction of foreground regions of moving objects in view of the spatio-temporal information can reduce false detections caused by the shaking of tree branches and leaves, and thus effectively extract foreground regions of moving objects.