Merrimac: Supercomputing with Streams

William J. Dally,Francois Labonte,Abhishek Das,Pat Hanrahan,Jung-Ho Ahn,Jayanth Gummaraju,Mattan Erez,Nuwan Jayasena,Ian Buck,Timothy J. Knight,Ujval J. Kapasi

Merrimac: Supercomputing with Streams

2003

Merrimac uses stream architecture and advanced interconnection networks to give an order of magnitude more performance per unit cost than cluster-based scientific computers built from the same technology. Organizing the computation into streams and exploiting the resulting locality using a register hierarchy enables a stream architecture to reduce the memory bandwidth required by representative applications by an order of magnitude or more. Hence a processing node with a fixed bandwidth (expensive) can support an order of magnitude more arithmetic units (inexpensive). This in turn allows a given level of performance to be achieved with fewer nodes (a 1-PFLOPS machine, for example, with just 8,192 nodes) resulting in greater reliability, and simpler system management. We sketch the design of Merrimac, a streaming scientific computer that can be scaled from a $20K 2 TFLOPS workstation to a $20M 2 PFLOPS supercomputer and present the results of some initial application experiments on this architecture.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

320

Citations