A Principled Kernel Testbed for Hardware/Software Co-Design Research

2010 
A Principled Kernel Testbed for Hardware/Software Co-Design Research Alex Kaiser, Samuel Williams, Kamesh Madduri, Khaled Ibrahim, David Bailey, James Demmel, Erich Strohmaier Computational Research Division Lawrence Berkeley National Laboratory Abstract Recently, advances in processor architecture have be- come the driving force for new programming models in the computing industry, as ever newer multicore proces- sor designs with increasing number of cores are intro- duced on schedules regimented by marketing demands. As a result, collaborative parallel (rather than simply concurrent) implementations of important applications, programming languages, models, and even algorithms have been forced to adapt to these architectures to exploit the available raw performance. We believe that this op- timization regime is flawed. In this paper, we present an alternate approach that, rather than starting with an ex- isting hardware/software solution laced with hidden as- sumptions, defines the computational problems of inter- est and invites architects, researchers and programmers to implement novel hardware/software co-designed solu- tions. Our work builds on the previous ideas of compu- tational dwarfs, motifs, and parallel patterns by selecting a representative set of essential problems for which we provide: An algorithmic description; scalable problem definition; illustrative reference implementations; veri- fication schemes. This testbed will enable comparative research in areas such as parallel programming mod- els, languages, auto-tuning, and hardware/software co- design. For simplicity, we focus initially on the compu- tational problems of interest to the scientific computing community but proclaim the methodology (and perhaps a subset of the problems) as applicable to other commu- nities. We intend to broaden the coverage of this problem space through stronger community involvement. Introduction For decades, computer scientists have sought guidance on how to evolve architectures, languages, and program- ming models in order to improve application perfor- mance, efficiency, and productivity. Unfortunately, with- out an overarching direction, individual guidance is in- ferred from the existing software/hardware ecosystem, and each group often conducts their research indepen- dently assuming all other technologies remain fixed. Ar- chitects attempt to provide micro-architectural solutions to improve performance on fixed binaries. Researchers tweak compilers to improve code generation for exist- ing architectures and implementations, and they may in- vent new programming models for fixed processor and memory architectures and computational algorithms. In today’s rapidly evolving world of on-chip parallelism, these isolated and iterative improvements to performance may miss superior solutions in the same way gradient descent optimization techniques may get stuck in local minima. To combat this tunnel vision, previous work set forth a broad categorization of numerical methods of interest to the scientific computing community (the seven Dwarfs) and subsequently for the larger parallel computing com- munity in general (13 motifs), suggesting that these were the problems of interest that researchers should focus on [1, 2, 9]. Unfortunately, such broad brush strokes of- ten miss the nuance seen in individual kernels that may be similarly categorized. For example, the computational requirements of particle methods vary greatly between the naive but more accurate direct calculations and the particle-mesh and particle-tree codes. In this paper, we present an alternate methodology for testbed creation. For simplicity we restricted our domain to scientific computing. Superficially, this is reminis- cent of the computational kernels in Intel’s RMS work [12]. However, we proceed in a more regimented effort. We commence with the enumeration of problems, pro- ceed by providing not only reference implementations for each problem, but more importantly a mathematical definition that allows one to escape iterative approaches to software/hardware optimization. To ensure long term value, we augment each with both a scalable problem generator and a verification scheme. By no means is the
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    5
    Citations
    NaN
    KQI
    []