Dataflow object detection system for FPGA-based smart camera
2016
Embedded computer vision based smart systems raise challenging issues in many research fields, including real-time vision processing, communication protocols or distributed algorithms. The amount of data generated by cameras using high resolution image sensors requires powerful computing systems to be processed at digital video frame rates. Consequently, the design of efficient and flexible smart cameras, with on-board processing capabilities, has become a key issue for the expansion of smart vision systems relying on decentralized processing at the image sensor node level. In this context, FPGA-based platforms, supporting massive data parallelism, offer large opportunities to match real-time processing constraints compared to platforms based on general purpose processors. In this paper, we describe the implementation, on such a platform, of a configurable object detection application, reformulated according to the dataflow model of computation. The application relies on the computation of the histogram of oriented gradients (HOG) and a linear SVM-based classification. It is described using the CAPH programming language, allowing efficient hardware descriptions to be generated automatically from high level dataflow specifications without prior knowledge of hardware description languages such as VHDL or Verilog. Results show that the performance of the generated code does not suffer from a significant overhead compared to handwritten HDL code. I. INTRODUCTION Traditional computer visions system often operate in a centralized manner, even for multi-camera applications , where the sequences of frames output by each camera are sent to a central computing unit. This central unit gathers information from all the available cameras and processes it in order to extract significant features. However as the number of source nodes increases, such a centralized approach quickly becomes infeasible because the central node becomes a bottleneck. This is specially true when high resolution cameras with high acquisition rates are deployed, for instance in object detection applications. In this context, and in the current state of network technology, the necessity to meet real-time processing constraints rules out any kind of centralized approach. As a result, in the last years, many distributed video systems have been proposed. They aim at overcoming the above-mentioned bottleneck issue by distributing the computational intensive tasks on the camera nodes. Such nodes are generally called Smart Cameras (SC). Image processing capability is added by embedding processing units such as general purpose processors (GPP), specialized processors (DSP) or field programmable gate arrays (FPGA). The latter solution has drawn a lot of attention in the past years because it offers large opportunities for exploiting the fine grain, regular, parallelism that most of image processing applications exhibit at the lowest levels of processing. However, programming FPGA-based platforms is traditionally done using hardware description languages (HDLs) – see figure 1 – and therefore requires expertise in digital design. This, in practice, hinders the applicability of FPGA-based solutions. As a response, a lot of work has been devoted in the past decade to the design and development of high-level languages and tools, aiming at allowing FPGAs to be used by programmers who are not experts in digital design, such as Catapult-C [1], Stream-C [2] or Impulse-C [3]. Most of these tools propose a direct conversion of C or C++ code into HDL (VHDL or Verilog). While attractive, this approach suffers from several drawbacks. First, C programs often rely on features which are difficult, if not impossible, to implement in hardware (dynamic memory allocation for instance). This means that code frequently has to be rewritten to be accepted by the compilers. Practically, this rewriting cannot be carried out without understanding why certain constructs have to be avoided and how to replace them by " hardware-compatible " equivalents. So a minimum knowledge of hardware design principles is actually required. Second, C is intrinsically sequential whereas hardware is truly parallel. In the current state-of-the-art, this cannot be done in a fully automatic way and the programmer is required to put annotations (pragmas) in the code to help the compiler, which
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
22
References
7
Citations
NaN
KQI