Merrimac: Supercomputing with Streams
2003
Merrimac uses stream architecture and advanced interconnection networks to give an order of magnitude more performance per unit cost than cluster-based scientific computers built from the same technology. Organizing the computation into streams and exploiting the resulting locality using a register hierarchy enables a stream architecture to reduce the memory bandwidth required by representative applications by an order of magnitude or more. Hence a processing node with a fixed bandwidth (expensive) can support an order of magnitude more arithmetic units (inexpensive). This in turn allows a given level of performance to be achieved with fewer nodes (a 1-PFLOPS machine, for example, with just 8,192 nodes) resulting in greater reliability, and simpler system management. We sketch the design of Merrimac, a streaming scientific computer that can be scaled from a $20K 2 TFLOPS workstation to a $20M 2 PFLOPS supercomputer and present the results of some initial application experiments on this architecture.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
27
References
320
Citations
NaN
KQI