Memory access pattern

In computing, a memory access pattern or IO access pattern is the pattern with which a system or program reads and writes memory on secondary storage. These patterns differ in the level of locality of reference and drastically affect cache performance, and also have implications for the approach to parallelism and distribution of workload in shared memory systems. Further, cache coherency issues can affect multiprocessor performance, which means that certain memory access patterns place a ceiling on parallelism (which manycore approaches seek to break). In computing, a memory access pattern or IO access pattern is the pattern with which a system or program reads and writes memory on secondary storage. These patterns differ in the level of locality of reference and drastically affect cache performance, and also have implications for the approach to parallelism and distribution of workload in shared memory systems. Further, cache coherency issues can affect multiprocessor performance, which means that certain memory access patterns place a ceiling on parallelism (which manycore approaches seek to break). Computer memory is usually described as 'random access', but traversals by software will still exhibit patterns that can be exploited for efficiency. Various tools exist to help system designersand programmers understand, analyse and improve the memory access pattern, e.g., VTune, Vectorization Advisor, and others, including tools to address GPU memory access patterns Memory access patterns also have implications for security,which motivates some to try and disguise a program's activity for privacy reasons. The simplest extreme is the sequential access pattern, where data is read, processed, and written out with straightforward incremented/decremented addressing. These access patterns are highly amenable to prefetching. Strided or simple 2D,3D access patterns (e.g. stepping through multi-dimensional arrays) are similarly easy to predict, and are found in implementations of linear algebra algorithms and image processing. Loop tiling is an effective approach. Some systems with DMA provided a strided mode for transferring data between subtile of larger 2D arrays and scratchpad memory. A linear access pattern is closely related to 'strided', where a memory address may be computed from a linear combination of some index. Stepping through indices sequentially with a linear pattern yields strided access. A linear access pattern for writes (with any access pattern for non overlapping reads) may guarantee that an algorithm can be parallelised, which is exploited in systems supporting compute kernels. Nearest neighbor memory access patterns appear in simulation, and are related to sequential/strided patterns. An algorithm may traverse a data structure using information from the nearest neighbors of a data element (in one or more dimensions) to perform a calculation. These are common in physics simulations operating on grids. Nearest neighbor can also refer to inter-node communication in a cluster; Physics simulations which rely on such local access patterns can be parallelized with the data partitioned into cluster nodes, with purely nearest-neighbor communication between them, which may have advantages for latency and communication bandwidth. This use case maps well onto torus network topology. In 3D rendering, access patterns for texture mapping and rasterization of small primitives (with arbitrary distortions of complex surfaces) are far from linear, but can still exhibit spatial locality (e.g. in screen space or texture space) . This can be turned into good memory locality via some combination of morton order and tiling for texture maps and frame buffer data (mapping spatial regions onto cache lines), or by sorting primitives via tile based deferred rendering. It can also be advantageous to store matrices in morton order in linear algebra libraries. A scatter memory access pattern combines sequential reads with indexed/random addressing for writes. Compared to gather, It may place less load on a cache hierarchy since a processing element may dispatch writes in a 'fire and forget' manner (bypassing a cache altogether), whilst using predictable prefetching (or even DMA) for its source data.

Parent Topic

Child Topic

No Parent Topic