FPGA-Based Vision Processing System for Automatic Online Player Tracking in Indoor Sports
2018
In recent years, there has been an increasing growth of using vision-based systems for tracking the players in team sports to evaluate and enhance their performance. Player tracking using vision systems is a very challenging task due to the nature of sports games, which includes severe and frequent interactions (e.g. occlusions) between the players. Additionally, these vision systems have high computational demands since they require processing of a huge amount of video data based on the utilization of multiple cameras with high resolution and high frame rate. As a result, most of the existing systems based on general-purpose computers are not able to perform online real-time player tracking, but track the players offline using pre-recorded video files, limiting, e.g., direct feedback on the player performance during the game. In this paper, we present a reconfigurable system to track the players in indoor sports automatically and without user interaction. The proposed system performs real-time processing of the incoming video streams from the cameras, achieving online player tracking. The teams are identified, and the players’ positions are detected based on the colors of their jerseys. FPGA technology is used to handle the compute-intensive vision processing tasks by implementing the video acquisition, video preprocessing, player segmentation, and team identification & player detection modules in hardware, realizing an online real-time system. While the pixel processing is performed in the FPGA, the less compute-intensive player tracking is performed on a general purpose computer. The maximum achieved frame rate for the FPGA implementation is 96.7 fps using a mature Xilinx Virtex-4 FPGA, and can be increased to 136.4 fps using a Xilinx Virtex-7 device. The Player tracking requires an average time of 2.5 ms per frame in the host-PC. As a result, the proposed reconfigurable system supports a maximum frame rate of 78.9 fps using two cameras with a resolution of 1392 × 1040 pixels each. Our results show that the achieved average precision and recall for player detection are up to 84.02% and 96.6%, respectively. Including player tracking, the achieved average precision and recall are up to 94.85% and 94.72%, respectively. Using the proposed FPGA implementation, a speedup by a factor of 15.2 is achieved compared to an OpenCV-based software implementation on a PC equipped with a 2.93 GHz Intel i7-870 CPU.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
33
References
3
Citations
NaN
KQI