A trend of emerging large-scale multi-core chip design is to employ multithreaded architectures such as the IBM Cyclops-64 (C64) chip that integrates large number of hardware thread units, main memory banks and communication hardwares on a single chip. A cellular supercomputer is being developed based on a 3D connection of the C64 chips. This paper introduces our design, implementation, and evaluation of the Cyclops Datagram Protocol (CDP) for the IBM C64 multithreaded architecture and the C64 supercomputer system. CDP is inspired by the TCP/IP protocol and its design is very simple and compact. The implementation of CDP leverages the abundant hardware thread-level parallelism provided by the C64 multithreaded architecture. The main contributions of this paper are: (1) We have completed a design and implementation of CDP that is used as the fundamental communication infrastructure for the C64 supercomputer system. It connects the C64 back-end to the front-end and forms a global uniform namespace for all nodes in the heterogeneous C64 system; (2) On a multithreaded architecture like C64, the CDP design and implementation effectively exploit the massive thread-level parallelism provided on the C64 hardware, achieving good performance scalability; (3) CDP is quite efficient. Its Pthread version can achieve around 90% channel capacity on the Gigabit Ethernet, even it is running at the user-level on a single processor machine; (4) Extensive application test cases are passed and no reliability problems have been reported.
A SMT processor can fetch and issue instructions from multiple independent hardware threads at every CPU cycle. Therefore, hardware resources are shared among the concurrently-running threads at a very fine grain level, whi ch can increase the utilization of processor pipeline. Howeve r, the concurrently-running threads in a SMT processor may interfere with each other and stall the CPU pipeline. We call this kind of pipeline stall inter-thread stall (ITS for short) or thread interlock. In this paper, we present our study on the ITS problem on an embedded heterogeneous SMT processor. Our experiments demonstrate that, for some test cases, 50% of the total pipeline stalls are caused by ITS. Therefore, we have developed a new instruction scheduling algorithm called be-nice instruction scheduling, based on Open64 Global Code Motion, to coordinate the conflicts between concurrent threads. The instruction scheduler use s the thread interference information (obtained by profiling ) as heuristics to decrease the number of ITS without sacrificing the overall CPU performance. The experimental results show that, for our current test cases the be-nice instructio n scheduler can reduce 15% of the inter-thread stall cycles, and increase the IPC of the critical thread by 2%-3%. The experiments are performed using the Open64 compiler infrastructure.
In this paper, we will study the on-chip network and memory hierarchy design of the Godson-T - a homogeneous many-core processor. Godson-T has 64 cores (with private L1 cache), and 16 global L2 cache banks. All these on-chip units are connected by a 2D 8 × 8 mesh network. Our study reveals that:(a) Global on-chip L2 cache can effectively alleviate the memory pressure caused by the data-thirsty on-chip computing engines. However, its potential is still limited by both the off-chip and the in-chip bandwidth, especially when increasing the number of active threads.(b) On-chip traffic congestion is largely caused by the intensive memory access requests issued from the on-chipcores. Therefore, the design of the on-chip network must consider the available performance of the datapath that connects the processor to the main memory. (c) In theory, different applications have different communication patterns (Berkeley's view). However, the application's runtime communication pattern is only determined by the design of the underlying memory hierarchy and on-chip interconnection. These conclusions are generally applicable to a wide variety of many-core processors with similar design.
This paper introduces the study of structure on EVA (ethylene-vinyl acetate)/hemp stem powder compound system. The micro behavior of EVA/hemp system was measured by SEM (Scanning Electron Microscopy), and its thermal properties were measured by DSC (Differential Scanning Calorimetry ). It indicates that hemp stem core has entered into EVA foaming system successfully when measured by FT-IR (Fourier Transform Infrared).