Augmenting loop tiling with data alignment for improved cache performance | IEEE Journals & Magazine | IEEE Xplore