The SIMD DSP is highly efficient for embedded applications whose parallel data are aligned. However, there are many unaligned and irregular data accesses in typical embedded algorithms such as FFT, FIR. The vectorization of these kinds of algorithms will need many additional shuffle instruction operations in the SIMD architecture with alignment restriction, which greatly decreases the computation efficiency with the increasing SIMD width. This paper proposes an efficient vector memory unit (VMU) with 16 memory blocks on a 16-way SIMD DSP, M-DSP. Each memory block contains four groups of multi-bank memory structure with most-lowest-bit interleaved addressing and affords double bandwidth as needed to reduce the parallel vector access conflicts. A high-bandwidth data shuffle unit capable of dual vector accesses alignment is carried out in the vector access pipelining, which not only efficiently supports the unaligned access but also the special vector access patterns for FFT. The experimental results have shown that the VMU could afford conflict-free parallel accesses between DMA and vector Load/Stores operations with no more than 10% area overhead, and M-DSP achieves an ideal accelerate rate for FFT and FIR algorithms.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- An Efficient Vector Memory Unit for SIMD DSP
- Springer Berlin Heidelberg