Exploring the Strengths of Neural Codes for Video Retrieval

Vidit Kumar,Vikas Tripathi,Bhaskar Pant

Exploring the Strengths of Neural Codes for Video Retrieval

2022

Websites like YouTube, Facebook, Twitter, etc. encounter large amounts of videos every day, mostly uploaded from mobile devices, digital cameras, etc. These videos rarely have metadata (semantic tags) attached, without which it is very difficult to retrieve similar videos without using content-based search techniques. More recently, two-dimensional convolutional networks (2d-CNN) have shown breakthrough performance over hand-engineered methods on image-related tasks in all aspects of computer vision field. The video is also composed of 2D frames arranged along time dimension, which can also be processed by 2d-CNN. In this paper, we investigate the significance of activations of CNN layers for video representation and analyzed its performance on the basis of nearest the neighbor search task, i.e. video retrieval. Three well-known CNN networks (AlexNet, GoogleNet and ResNet18) are exploited for feature extraction, and UCF101 dataset is chosen to conduct the experiment. The results showed that feature fusion of multiple CNN layers can strengthen the video representation.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations