Cache optimization becomes increasingly important for achieving high computing performance, especially on current and future chip-multiprocessor (CMP) systems, which usually show a rather higher cache miss ratio than uni-processors. For such optimization, information about the access locality is needed in order to help the user in the tasks of data allocation, data transformation, and code transformation which are often used to enhance the utilization of cached data towards a better cache hit rate.
In this paper we demonstrate an analysis tool capable of detecting the spatial and temporal relationship between memory accesses and providing information, such as access pattern and access stride, which is required for applying some optimization techniques like address grouping, software prefetching, and code transformation. Based on the memory access trace generated by a code instrumentor, the analysis tool uses appropriate algorithms to detect repeated address sequences and the constant distance between accesses to the different elements of a data structure. This allows the users to pack data with spatial locality in the same cache block so that needed data can be loaded into the cache at the same time. In addition, the analysis tool computes the push back distance which shows how a cache miss can be avoided by reusing the data before replacement. This helps to reduce cache misses increasing therefore the temporal reusability of the working set.