OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework
19
Citation
34
Reference
10
Related Paper
Citation Trend
Keywords:
Graphics processing unit
Compared with the CPU which is good at handling logic complexity service,GPGPU(general purpose graphic processing unit) is suitable for large-scale parallel processing computing.The emergence of CUDA(compute unified device architecture) accelerates the expansion of application of GPGPU.We accelerate the implementation of AES algorithm based on GPGPU and CUDA and achieve a total throughput of 6~7Gbit/s.Regardless of the time of data loading and storing,a throughput of 20Gbit/s towards an input size over 1MB can be achieved.
Speedup
Graphics processing unit
Cite
Citations (2)
Quantum walk
Cite
Citations (5)
이 논문에서는 데이터 병렬성이 매우 좋은 행렬 곱연산을 OpenMP, MPI, 그리고 CUDA 기술로 구현하고 전통적인 방식의 슈퍼컴퓨터와 CUDA를 이용하는 GPGPU 시스템의 성능 비교를 통해서 CUDA 시스템의 성능 확장성과 이 기술의 발전 가능성을 확인하였다.
Cite
Citations (0)
Graphics processing unit
Code (set theory)
Cite
Citations (10)
Aiming at the processing of GPU,this paper provides the solution to high-performance on GPU,including a detailed description of the CUDA programming model,the principle of optimization.It shows by the comparative experiment that CUDA owns strongly of the ability to the parallel processing and provides new methods and ideas to GPGPU.
Cite
Citations (2)
A spatial-color-based non-parametric background-foreground modeling strategy in a GPGPU by using CUDA is proposed. This strategy is suitable for augmented-reality applications, providing real-time high-quality results in a great variety of scenarios.
Cite
Citations (2)
Детально розглянуто основні принципи роботи зі спільною та розподіленою пам’яттю в технології NVidia CUDA. Описано шаблони взаємодії потоків і проблеми глобальної синхронізації. Проведено порівняльний аналіз основних технологій, що використовуються в підході GPGPU — Nvidia CUDA, OpenCL, Direct Compute.
Cite
Citations (0)
This paper proposes a new connected component labeling algorithm for GPGPU applications based on NVIDIA's CUDA. Various approaches and algorithms for connected component labeling with minimal execution time were designed, but the most of them have been focused on optimizing CPU algorithm. Therefore it is hard to apply these approaches to GPGPU programming models such as NVIDIA's CUDA. Today, GPGPU (General Purpose Graphic Processing Unit) technologies offer dedicated parallel hardware and programming model, and many applications are being moved onto the GPGPU. This algorithm is a multi-pass algorithm to utilize for GPGPU applications, and evaluation results show that maximum speedup is more than double compared with conventional CPU algorithms.
Speedup
Graphics processing unit
Component (thermodynamics)
Cite
Citations (15)
The structure of HPC is changing since the growing applications of GPGPU,this change points out a new direction for HPC development.CUDA provided by NIVIDIA is a programming environment using C language for developing parallel computing applications.The efficiency of compute system can be improved by using CUDA to speed large scale and extensive parallel computing on specified graphic cards.This essay mainly introduces the situation of the development of GPU and how to using CUDA to developing parallel computing applications.
Speedup
Cite
Citations (0)
平行程序与不同线级的并行(TLP ) 由代码节的系列组成。作为结果,在一个平行程序的一个线程例如在 CUDA 程序的一个 GPU 内核,仍然包含顺序的代码和平行的环,是相当普通的。为了利用如此的平行,循环,最近的开普勒·恩威迪亚体系结构介绍动态并行,它允许一个 GPU 线程开始另一个 GPU 内核,从而减少从一个中央处理器运行内核的开销。与动态并行,然而,一个父母线程能仅仅通过全球存储器与它的孩子线程交流,运行 GPU 内核的开销甚至在 GPU 以内是重要的。在这份报纸,我们首先学习包含这些基准没有的平行的环,和热点的一套 GPGPU 基准一个很高的环计数或 TLP 的高度。因而,用动态并行利用如此的平行的环的好处也被限制抵消它的开销。我们然后介绍我们的建议答案在 CUDA 利用嵌套的并行,叫作 CUDA-NP。与 CUDA-NP,当一个 GPU 程序开始时,我们开始启用线程的一个高数字,并且使用控制流动为不同代码节激活线程的不同数字。我们用一条基于指令的编译器途径实现我们的建议 CUDA-NP 框架。为一个 GPU 核,一个应用程序开发者仅仅需要为可并行化的代码节增加象 OpenMP 一样编译指示。然后,我们的 CUDA-NP 编译器自动地产生优化 GPU 内核。它支持减小和扫描原语,探索不同方法散布平行的环重复进线程,并且高效地管理在薄片上资源。我们的实验证明为一套 GPGPU 基准,它已经被优化了并且包含嵌套的并行,我们的建议 CUDA-NP 框架进一步平均到多达 6.69 次和 2.01 次改进表演。
Cite
Citations (0)