Hybrid-Core Computing for High-Throughput Bioinformatics

2011 
Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. Application-specific instructions executed by the coprocessor appear as extensions to the x86 instruction set architecture. This integrated approach provides users familiar C, C++ and FORTRAN development environments without the complexity of non-standard dialects or programming models. Thus the performance of application-specific hardware is achievable with the familiar programmability and deployment of a commodity server. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. This presentation will discuss the suitability of hybridcore servers' advanced architecture and compiler technology for sequence alignment and assembly applications. For example, the Smith-Waterman alignment algorithm is 172 times faster on Convey's HC-1 than the best software implementation on a commodity server. Such performance speeds research, while reducing energy consumption, floor space, and management effort. Most bioinformatics applications are similarly well suited for this architecture because they have low data interdependence, which greatly increases performance through hardware parallelism. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86 servers. Convey's highly parallel memory subsystem allows application-specific logic to simultaneously accesses 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de novo assemblers, greatly benefit from this type of memory architecture.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    3
    Citations
    NaN
    KQI
    []