Analytical modeling of optimized sparse linear code

Pavel Tvrdik,Ivan Simecek

Analytical modeling of optimized sparse linear code

2004

Pavel Tvrdik
Ivan Simecek

In this paper, we describe source code transformations based on sw-pipelining, loop unrolling, and loop fusion for the sparse matrix-vector multiplication and for the Conjugate Gradient algorithm that enable data prefetching and overlapping of load and FPU arithmetic instructions and improve the temporal cache locality. We develop a probabilistic model for estimation of the numbers of cache misses for 3 types of data caches: direct mapped and s-way set associative with random and with LRU replacement strategies. Using HW cache monitoring tools, we compare the predicted number of cache misses with real numbers on Intel x86 architecture with L1 and L2 caches. The accuracy of our analytical model is around 97%. The errors in estimations are due to minor simplifying assumptions in our model.

Keywords:

Conjugate gradient method
Cache
Sparse matrix
Parallel computing
Computer science
CPU cache
Theoretical computer science
Source code
Loop unrolling
Multiplication
Loop fusion

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations