Computing on Knights and Kepler Architectures

Journal of Physics Conference Series (2014)

G. Bortolotti Marco Caberletti G. Crimi Andrea Ferraro F. Giacomini M. Manzali G. Maron M. Pivanti Davide Salomoni Sebastiano Fabio Schifano R. Tripiccione Marco Zanella

Citation

Reference

Related Paper

Citation Trend

Abstract:

A recent trend in scientific computing is the increasingly important role of co-processors, originally built to accelerate graphics rendering, and now used for general high-performance computing. The INFN Computing On Knights and Kepler Architectures (COKA) project focuses on assessing the suitability of co-processor boards for scientific computing in a wide range of physics applications, and on studying the best programming methodologies for these systems. Here we present in a comparative way our results in porting a Lattice Boltzmann code on two state-of-the-art accelerators: the NVIDIA K20X, and the Intel Xeon-Phi. We describe our implementations, analyze results and compare with a baseline architecture adopting Intel Sandy Bridge CPUs.

Keywords:

Porting

Xeon Phi

Implementation

Topics:

Lattice Boltzmann Simulation Studies

Advanced Data Storage Technologies

Generative Adversarial Networks and Image Synthesis

10.1088/1742-6596/513/5/052032

Cite

PDF

GEANT4-MT : bringing multi-threading into GEANT4 production

Sunil Ahn J. Apostolakis Makoto Asai D. Brandt Gene Cooperman

G EANT 4-MT is the multi-threaded version of the G EANT 4 particle transport code. (1, 2) The key goals for the design of G EANT 4-MT have been a) the need to reduce the memory footprint of the multi-threaded application compared to the use of separate jobs and processes; b) to create an easy migration of the existing applications; and c) to use efficiently many threads or cores, by scaling up to tens and potentially hundreds of workers. The first public release of a G EANT 4-MT prototype was made in 2011. We report on the revision of G EANT 4-MT for inclusion in the production-level release scheduled for end of 2013. This has involved significant re-engineering of the prototype in order to incorporate it into the main G EANT 4 development line, and the porting of G EANT 4-MT threading code to additional platforms. In order to make the porting of applications as simple as possible, refinements addressed the needs of standalone applications. Further adaptations were created to improve the fit with the frameworks of High Energy Physics (HEP) experiments. We report on performances measurements on Intel Xeon™, AMD Opteron™ the first trials of G EANT 4-MT on the Intel Many Integrated Cores (MIC) architecture, in the form of the Xeon Phi™ co-processor. (3) These indicate near-linear scaling through about 200 threads on 60 cores, when holding fixed the number of events per thread.

Porting

Xeon Phi

Threading (protein sequence)

Xeon

Memory footprint

10.1051/snamc/201404213

Cite

Citations (18)

Experiences Porting Scientific Applications to the Intel (KNL) Xeon Phi Platform

Nicholas Malaya Damon McDougall Craig Michoski Myoungkyu Lee Christopher S. Simmons

This paper presents experiences using Intel's KNL MIC platform on hardware that will be available in the Stampede 2 cluster launching in Summer 2017. We focus on 1) porting of existing scientific software; 2) observing performance of this software. Additionally, we comment on both the ease of use of KNL and observed performance of KNL as compared to previous generation "Knights Ferry" and "Knights Corner" Xeon Phi MICs [32]. Fortran, C, and C++ applications are chosen from a variety of scientific disciplines including computational fluid dynamics, numerical linear algebra, uncertainty quantification, finite element methods, and computational chemistry.

Xeon Phi

Porting

Xeon

Fortran

Linear algebra

Vectorization (mathematics)

10.1145/3093338.3093371

Cite

Citations (4)

Splotch: porting and optimizing for the Xeon Phi

arXiv (Cornell University) (2016)

Timothy Dykes C. Gheller Marzia Rivi Mel Krokos

With the increasing size and complexity of data produced by large scale numerical simulations, it is of primary importance for scientists to be able to exploit all available hardware in heterogenous High Performance Computing environments for increased throughput and efficiency. We focus on the porting and optimization of Splotch, a scalable visualization algorithm, to utilize the Xeon Phi, Intel's coprocessor based upon the new Many Integrated Core architecture. We discuss steps taken to offload data to the coprocessor and algorithmic modifications to aid faster processing on the many-core architecture and make use of the uniquely wide vector capabilities of the device, with accompanying performance results using multiple Xeon Phi. Finally performance is compared against results achieved with the GPU implementation of Splotch.

Porting

Xeon Phi

Coprocessor

Xeon

Multi-core processor

Source

Cite

Citations (0)

Experience of Porting and Optimization of Seismic Modelling on Multi and Many Cores of Hybrid Computing Cluster

Proceedings (2015)

Richa Rastogi Abhishek Srivastava Kirannmayi M. Sirasala Hitesh Chavhan Kiran Khonde

Summary In this paper, we report our experience of porting and optimization of legacy seismic acoustic modelling application on multi and many cores of hybrid architecture of PARAM Yuva II. This application was developed using MPI and used domain decomposition as parallelization approach across parallel processors. The same application has been modified for domain decomposition at node level and the parallel performance was improved using OpenMP within the node. The resultant application was optimized using different optimization techniques for multi core architecture of Intel’s Xeon, which further improved the performance of the application along with efficiency. The optimized application was then ported on many core architecture of Intel’s Xeon Phi in native and symmetric modes. The details of porting, optimizations and execution on Intel’s Xeon and on Xeon Phi in native and symmetric modes are given in the paper. Performance, scalability and efficiency of the application has been studied using multi and many cores and experimental results are presented.

Porting

Xeon Phi

Xeon

Multi-core processor

10.3997/2214-4609.201413106

Cite

Citations (1)

Explorations of the viability of ARM and Xeon Phi for physics processing

Journal of Physics Conference Series (2014)

David Abdurachmanov Kapil Arya J. Bendavid T. Boccali Gene Cooperman

We report on our investigations into the viability of the ARM processor and the Intel Xeon Phi co-processor for scientific computing. We describe our experience porting software to these processors and running benchmarks using real physics applications to explore the potential of these processors for production physics processing.

Porting

Xeon Phi

Xeon

10.1088/1742-6596/513/5/052008

Cite

Citations (8)

Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned

Ioannis E. Venetis Georgios Goumas Markus Geveler Dirk Ribbrock

In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Intel Xeon Phi coprocessor. Our efforts involved both the evaluation of programming models including OpenCL, POSIX threads and OpenMP and typical optimization strategies like parallelization and vectorization. Since the straightforward porting process of the already existing OpenCL version of the code encountered performance problems that require further analysis, we focused our efforts on the implementation and optimization of two core building block kernels for FEASTFLOW: an axpy vector operation and a sparse matrix-vector multiplication (spmv). Our experimental results on these building blocks indicate the Xeon Phi can serve as a promising accelerator for our software infrastructure.

Porting

Xeon Phi

Coprocessor

Vectorization (mathematics)

Source

Cite

Citations (12)

Symposium on Experiences of Porting and Optimising Code for Xeon Phi Processors

Advances in parallel computing (2016)

William Henry Jackson Michèle Weiland Mark Parsons Simon McIntosh‐Smith

Porting

Xeon Phi

Code (set theory)

Xeon

10.3233/978-1-61499-621-7-573

Cite

Citations (0)

Toward Parallel Modeling of Solidification Based on the Generalized Finite Difference Method Using Intel Xeon Phi

Lecture notes in computer science (2016)

Łukasz Szustak Kamil Halbiniak Adam Kulawik Joanna Wróbel Paweł Gepner

Xeon Phi

Porting

Coprocessor

Xeon

Code (set theory)

10.1007/978-3-319-32149-3_39

Cite

Citations (8)

Splotch: porting and optimizing for the Xeon Phi

arXiv (Cornell University) (2016)

Timothy Dykes C. Gheller Marzia Rivi Mel Krokos

Porting

Xeon Phi

Coprocessor

Xeon

Multi-core processor

10.48550/arxiv.1606.04427

Cite

Citations (1)

Porting to the Intel Xeon Phi: Opportunities and Challenges

C. Rosales

This work describes the challenges presented by porting code to the Intel Xeon Phi coprocessor, as well as opportunities for optimization and tuning. We use micro-benchmarks, code segments, assembly listings and application level results to illustrate the key issues in porting to the Xeon Phi coprocessor, always keeping in mind both portability and performance. While executing code on the Xeon Phi in native mode is fairly straightforward it can be a challenge to achieve good performance. The complexity of optimization increases as one introduces offload, distributed offload, or symmetric execution modes. We will initially focus on the fundamental issues that can prevent acceptable performance in native execution, and then address the key issues in data transfers due to either offloaded regions or MPI exchanges with the host CPU. Some of the issues are of a generic nature and affect any code using heterogeneous execution - PCIe bandwidth bottleneck -, and others are specific to the Xeon Phi and its software environment - Host/MIC MPI exchanges. We will also make an effort to indicate which issues are specific to this platform and which are of general applicability. In particular we will draw comparisons between the data management models in the Intel Xeon Phi and in the NVIDIA CUDA environment.

Porting

Xeon Phi

Software portability

Coprocessor

Code (set theory)

x86

Xeon

10.1109/xsw.2013.5

Cite

Citations (35)