Transformer Region Proposal for Object Detection

2021 
We present a method for detecting objects in images using Transformer and Region Proposal. Our method discards the traditional convolution neural network, obtain the positional information and category of the detection object by finding the correspondence between the images through the self-attention mechanism. We extract the region proposal by setting anchors, and combine the positional encoding obtained by the image auto-encoder. Then input the features into the transformer encoder and decoder network to obtain the corresponding information of the target object, and obtain the positional information and category of target object. The experimental results on the public dataset proved, our method’s detection speed can reach 26 frames per second, which can meet the needs of real-time detection. In terms of detection accuracy, our method’s average precision can reach the 31.4, and under the large object can reach 55.2, which can reach the detection effect of the current mainstream convolutional neural network detection algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []