Enhancing Salient Object Segmentation Through Attention
1
Citation
48
Reference
20
Related Paper
Citation Trend
Abstract:
Segmenting salient objects in an image is an important vision task with ubiquitous applications. The problem becomes more challenging in the presence of a cluttered and textured background, low resolution and/or low contrast images. Even though existing algorithms perform well in segmenting most of the object(s) of interest, they often end up segmenting false positives due to resembling salient objects in the background. In this work, we tackle this problem by iteratively attending to image patches in a recurrent fashion and subsequently enhancing the predicted segmentation mask. Saliency features are estimated independently for every image patch, which are further combined using an aggregation strategy based on a Convolutional Gated Recurrent Unit (ConvGRU) network. The proposed approach works in an end-to-end manner, removing background noise and false positives incrementally. Through extensive evaluation on various benchmark datasets, we show superior performance to the existing approaches without any post-processing.Keywords:
Market Segmentation
Benchmark (surveying)
Cite
Feature (linguistics)
Cite
Citations (15)
Salient object detection, which aims to identify and locate the most salient pixels or regions in images, has been attracting more and more interest due to its various real-world applications. However, this vision task is quite challenging, especially under complex image scenes. Inspired by the intrinsic reflection of natural images, in this paper we propose a novel feature learning framework for large-scale salient object detection. Specifically, we design a symmetrical fully convolutional network (SFCN) to learn complementary saliency features under the guidance of lossless feature reflection. The location information, together with contextual and semantic information, of salient objects are jointly utilized to supervise the proposed network for more accurate saliency predictions. In addition, to overcome the blurry boundary problem, we propose a new structural loss function to learn clear object boundaries and spatially consistent saliency. The coarse prediction results are effectively refined by these structural information for performance improvements. Extensive experiments on seven saliency detection datasets demonstrate that our approach achieves consistently superior performance and outperforms the very recent state-of-the-art methods.
Feature (linguistics)
Reflection
Cite
Citations (6)
Feature (linguistics)
Adaptability
Cite
Citations (1)
The performance of salient object segmentation has been significantly advanced by using the deep convolutional networks. However, these networks often produce blob-like saliency maps without accurate object boundaries. This is caused by the limited spatial resolution of their feature maps after multiple pooling operations and might hinder downstream applications that require precise object shapes. To address this issue, we propose a novel deep model—Focal Boundary Guided (Focal-BG) network. Our model is designed to jointly learn to segment salient object masks and detect salient object boundaries. Our key idea is that additional knowledge about object boundaries can help to precisely identify the shape of the object. Moreover, our model incorporates a refinement pathway to refine the mask prediction and makes use of the focal loss to facilitate the learning of the hard boundary pixels. To evaluate our model, we conduct extensive experiments. Our Focal-BG network consistently outperforms the state-of-the-art methods on five major benchmarks. We provide a detailed analysis of these results and demonstrate that our joint modeling of salient object boundary and mask helps to better capture the shape details, especially in the vicinity of object boundaries.
Pooling
Feature (linguistics)
Cite
Citations (38)
Foreground and background cues can assist humans in quickly understanding visual scenes. In computer vision, however, it is difficult to detect salient objects when they touch the image boundary. Hence, detecting salient objects robustly under such circumstances without sacrificing precision and recall can be challenging. In this paper, we propose a novel model for salient region detection, namely, the foreground-center-background (FCB) saliency model. Its main highlights as follows. First, we use regional color volume as the foreground, together with perceptually uniform color differences within regions to detect salient regions. This can highlight salient objects robustly, even when they touched the image boundary, without greatly sacrificing precision and recall. Second, we employ center saliency to detect salient regions together with foreground and background cues, which improves saliency detection performance. Finally, we propose a novel and simple yet efficient method that combines foreground, center, and background saliency. Experimental validation with three well-known benchmark data sets indicates that the FCB model outperforms several state-of-the-art methods in terms of precision, recall, F-measure, and particularly, the mean absolute error. Salient regions are brighter than those of some existing state-of-the-art methods.
Benchmark (surveying)
Foreground detection
Kadir–Brady saliency detector
Color difference
Cite
Citations (94)
Automatic estimation of salient object without any prior knowledge tends to greatly enhance many computer vision tasks. This paper proposes a novel bottom-up based framework for salient object detection by first modeling background and then separating salient objects from background. We model the background distribution based on feature clustering algorithm, which allows for fully exploiting statistical and structural information of the background. Then a coarse saliency map is generated according to the background distribution. To be more discriminative, the coarse saliency map is enhanced by a two-step refinement which is composed of edge-preserving element-level filtering and upsampling based on geodesic distance. We provide an extensive evaluation and show that our proposed method performs favorably against other outstanding methods on two most commonly used datasets. Most importantly, the proposed approach is demonstrated to be more effective in highlighting the salient object uniformly and robust to background noise.
Discriminative model
Feature (linguistics)
Upsampling
Cite
Citations (5)
Cite
Citations (5)
Benchmark (surveying)
Bounding overwatch
Prior information
Cite
Citations (19)
Feature (linguistics)
Kadir–Brady saliency detector
Cite
Citations (16)
In this work, we propose an efficient and effective approach for unconstrained salient object detection in images using deep convolutional neural networks. Instead of generating thousands of candidate bounding boxes and refining them, our network directly learns to generate the saliency map containing the exact number of salient objects. During training, we convert the ground-truth rectangular boxes to Gaussian distributions that better capture the ROI regarding individual salient objects. During inference, the network predicts Gaussian distributions centered at salient objects with an appropriate covariance, from which bounding boxes are easily inferred. Notably, our network performs saliency map prediction without pixel-level annotations, salient object detection without object proposals, and salient object subitizing simultaneously, all in a single pass within a unified framework. Extensive experiments show that our approach outperforms existing methods on various datasets by a large margin, and achieves more than 100 fps with VGG16 network on a single GPU during inference.
Bounding overwatch
Margin (machine learning)
Cite
Citations (11)