Skip to main content
Erschienen in: EURASIP Journal on Wireless Communications and Networking 1/2010

Open Access 01.12.2010 | Research Article

VLSI Implementation of a Fixed-Complexity Soft-Output MIMO Detector for High-Speed Wireless

verfasst von: Di Wu (EURASIP Member), Johan Eilert, Rizwan Asghar, Dake Liu

Erschienen in: EURASIP Journal on Wireless Communications and Networking | Ausgabe 1/2010

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents a low-complexity MIMO symbol detector with close-Maximum a posteriori performance for the emerging multiantenna enhanced high-speed wireless communications. The VLSI implementation is based on a novel MIMO detection algorithm called Modified Fixed-Complexity Soft-Output (MFCSO) detection, which achieves a good trade-off between performance and implementation cost compared to the referenced prior art. By including a microcode-controlled channel preprocessing unit and a pipelined detection unit, it is flexible enough to cover several different standards and transmission schemes. The flexibility allows adaptive detection to minimize power consumption without degradation in throughput. The VLSI implementation of the detector is presented to show that real-time MIMO symbol detection of 20 MHz bandwidth 3GPP LTE and 10 MHz WiMAX downlink physical channel is achievable at reasonable silicon cost.

1. Introduction

Multi-antenna or multi-in and multiout (MIMO) technologies have been widely adopted by the latest wireless standards such as 3GPP LTE and WiMAX to enhance the spectrum efficiency. For MIMO systems, a major challenge is the symbol detection at the receiver. In particular, as channel coding (e.g., Turbo) is used, soft output (the log-likelihood ratio, LLR) must be computed as the input to the channel decoder. Consider a MIMO system with https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq1_HTML.gif transmit antennas and https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq2_HTML.gif receive antennas. Let https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq3_HTML.gif be a transmitted vector of length https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq4_HTML.gif , obtained by mapping a set of information bits onto an https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq5_HTML.gif -QAM constellation https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq6_HTML.gif . Then the received vector of length https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq7_HTML.gif is given by
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ1_HTML.gif
(1)
where https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq8_HTML.gif is an https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq9_HTML.gif complex-valued channel matrix which is assumed to be known. https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq10_HTML.gif is the transmitted symbol vector. https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq11_HTML.gif is noise vector and https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq12_HTML.gif is the received symbol vector. The optimum soft detector is Maximum-A-Posteriori (MAP) detector which computes
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ2_HTML.gif
(2)
Here " https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq13_HTML.gif " means all https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq14_HTML.gif for which the https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq15_HTML.gif th bit of https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq16_HTML.gif is equal to https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq17_HTML.gif . Computing (2) requires enumeration of the entire set of possible transmitted vectors. The complexity of doing this is usually not affordable in practice.
As a trade-off between performance and complexity, various MIMO detection methods such as sphere decoding [1, 2], fixed complexity sphere decoding [3, 4], and MFCSO decoding [5] have been proposed to reach near-MAP performance with lower complexity than MAP. In [6], VLSI implementation of a complexity reduced K-best detector for https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq18_HTML.gif MIMO and 16-QAM is presented for WiMAX/WiFi. In [7], VLSI implementation of a soft-output MIMO detector for https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq19_HTML.gif MIMO in WLAN is presented. Without QR decomposition unit being included, it consumes 135 kGate with a reduced candidate list. In [8], a K-best detector for https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq20_HTML.gif MIMO is implemented in a Xilinx Virtex-5 FPGA. However, the complexity of sphere decoding grows exponentially with the number of transmit antennas and polynomially in the size of the signal constellation. More importantly, the tree search used in sphere decoding is in principle a sequential procedure which is difficult to parallelize. In [3], a fixed-throughput sphere detector is proposed with fixed complexity and parallelism for hard decision. In [5], a low-complexity near-MAP detection method is proposed for high-order modulation (e.g., 64-QAM). The performance loss from MAP due to the suboptimal search introduced in MFCSO is proven by simulation to be small in [5]. However, in [5], the complexity of MFCSO is only presented in number of arithmetic operations without the silicon cost and processing latency being addressed and no comparison with prior art is made. Most importantly, none of these methods proposed have taken the system specific features of LTE (e.g., OFDMA and H-ARQ) into consideration and are mostly based on very simple channel models (e.g., AWGN). In [9], limited evaluation of MFCSO is carried out with a focus on LTE system.
In this paper, with the aid of more realistic LTE and WiMAX simulation chains and different channel models, several MIMO detection algorithms are applied to LTE and WiMAX systems and with their performance quantitatively evaluated. Second, although the MFCSO detection algorithm proposed by the authors in [5] has a very low detection complexity, under random AWGN channels, it requires relatively strong channel coding to maintain a near-MAP performance in frame error ratio [5]. In this paper, its performance with the aid of H-ARQ is investigated. In order to validate MFCSO from VLSI implementation perspectives, both FPGA and ASIC implementation of an MFCSO detector is presented. Note that most commercial terminals are limited by cost and power consumption, especially the power consumption of the analog part of each antenna chain. According to the LTE and WiMAX standards, https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq21_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq22_HTML.gif MIMO schemes are included as a good trade-off between performance gain and complexity (or power consumption). Hence, only these schemes are considered in here. The result is compared with a state-of-the-art soft-output sphere decoding (SSD) [1] and the K-best detector presented in [10] from both performance and cost aspects.
The remainder of the paper is organized as follows. In Section 2, the application of MIMO techniques in LTE and WiMAX is presented. Section 3 introduces the linear and MFCSO MIMO detection algorithms. Section 4 addresses the detection flow. The architecture of the detector is addressed in Section 5. The link-level simulation results are presented in Section 6. Section 7 analyzes the implementation complexity, and Section 8 presents the adaptive method used to optimize power efficiency. Section 9 presents both the FPGA-and ASIC-based implementation of the detector. Finally, Section 10 concludes the paper.

2. MultiAntenna in LTE and WiMAX

Wireless standards such as 3GPP LTE and WiMAX have incorporated MIMO transmission schemes to boost the peak data rate. Meanwhile, software-defined radio (SDR) technologies allow both of them to be supported by the same piece of hardware.
3GPP Long-Term Evolution (LTE) is the next generation radio access technology which incorporates Orthogonal Frequency Division Multiple Access (OFDMA) as the multiple access scheme in downlink. MIMO technologies are also mandatory in LTE to achieve the LTE bit-rate targets (e.g 100 Mbit/s peak data rate for downlink). As part of the receiver chain depicted in Figure 1, MIMO symbol detection is a significant challenge for VLSI implementation.
The input to the MIMO detector presented in this paper includes the estimated channel matrix
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ3_HTML.gif
(3)
the received symbol vector https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq23_HTML.gif , and the estimated noise variance https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq24_HTML.gif . The output of the detector is the LLR values of the demodulated bits.
In both LTE and WiMAX, spatial multiplexing (SM) and transmit diversity have been adopted as the two major MIMO schemes. SM is a MIMO technique aimed at maximizing the data throughput by exploiting the degrees of freedom in MIMO channels. Since the multiplexing gain is only available for high SNR region, spatial multiplexing is usually used when high SNR is available. STBC/SFBC [11] assumes the channel is stationary among adjacent time intervals or subcarriers so that a single codeword is mapped to these adjacent intervals or subcarriers to benefit from either time or frequency diversity in transmission. The most widely used STBC/SFBC scheme is Alamouti scheme in space or frequency domain. Since STBC/SFBC only requires a linear detector to achieve diversity, the detector design is easier. Note that in this paper, only open-loop MIMO is considered without feedback from the terminal.

2.1. Spatial Multiplexing

Spatial multiplexing is a MIMO technique aimed at maximizing the data throughput by exploiting the degrees of freedom in MIMO channels. Since the multiplexing gain is only available in high SNR region, spatial multiplexing is usually used when high SNR is available. As depicted in Figure 2(a), spatial multiplexing usually requires both https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq25_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq26_HTML.gif to be large. In general, the degree of freedom (multiplexing gain) is determined by https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq27_HTML.gif which is the rank of the channel matrix https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq28_HTML.gif . In case https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq29_HTML.gif is badly conditioned (e.g. when line-of-sight occurs, https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq30_HTML.gif becomes a singular matrix), the pseudoinversion of https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq31_HTML.gif in (15) using linear detection will be very difficult which requires very large dynamic range. In other words, the gain of spatial multiplexing heavily depends on the multipath fading. A dual-stream spatial multiplexing scheme is depicted in Figure 2(a) .

2.2. Transmit Diversity

Transmit diversity schemes that exploit the diversity gain of multi-antenna transmission have also been adopted by LTE and WiMAX. The Space-Time Block Coding (STBC) in WiMAX and Space-Frequency Block Coding (SFBC) in LTE [11] are both transmit diversity schemes to transmit data for guaranteed diversity while requiring only a low-complexity symbol detector on the receiver side. In both cases, the Alamouti matrix [12] is used because it is the only full-rate linear STBC (or SFBC) code with a diversity gain of 2. In other words, the transmit diversity schemes considered in this paper are Alamouti schemes in the space and frequency domains. This assumes the channels of either adjacent symbol intervals or subcarriers are identical, so that either time or frequency diversity will be achieved when a single codeword is mapped to different antennas within two adjacent time or frequency intervals. The basic https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq34_HTML.gif space-frequency channel matrix is defined as
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ4_HTML.gif
(4)

3. Soft-Output MIMO Detection

The optimum soft-output MIMO detector computes the Log-Likelihood Ratio (LLR) in (2). Commonly the sums in (2) are approximated by their largest terms ("log-max") which requires the solution of problems of the type https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq35_HTML.gif , subject to https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq36_HTML.gif . Since MAP provides the best theoretical performance, it is commonly used as a benchmark when comparing other algorithms
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ5_HTML.gif
(5)

3.1. Linear Detection

In linear detection such as Zero-forcing (ZF) and Minimum Mean Squared Error (MMSE), the receiver symbol vector https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq37_HTML.gif is multiplied with a linear filter:
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ6_HTML.gif
(6)
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ7_HTML.gif
(7)
The correlation between the elements in the noise vector https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq38_HTML.gif is neglected and the symbols in https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq39_HTML.gif are demodulate individually, treating the output of the model (6) as https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq40_HTML.gif independent scalar channels. Although linear detectors will incur a severe performance loss in slow fading channels [4], they have very low implementation cost compared to more advanced MIMO detection algorithms which makes them suitable for low-cost real-time implementations. As depicted in Figure 3, the linear detection procedure involves two parts: channel preprocessing and symbol demapping. The channel preprocessing procedure mainly consists of matrix multiplication and inversion as shown in (6) and (7).

3.2. Fixed-Complexity Soft Output (FCSO)

The Layered Orthogonal Lattice Detector (LORD) proposed in [13] and the FCSO MIMO detector presented in [4] are similar and use a suboptimal method to reduce the complexity at the cost of negligible performance loss. A general https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq41_HTML.gif MIMO system using 64-QAM is taken as a case study. Here each complex-valued symbol is considered to be one layer and only the top layer is exactly marginalized with the remaining three layers approximately marginalized. The channel-rate processing of FCSO involves the QRD of https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq42_HTML.gif rank-reduced channel matrices
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ8_HTML.gif
(8)
which generates an upper triangular matrix https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq43_HTML.gif , and a unitary matrix https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq44_HTML.gif so that
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ9_HTML.gif
(9)
Here https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq45_HTML.gif QRD is needed for different https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq46_HTML.gif .
The symbol-rate processing consists of the following steps.
(1) Pick one transmitted symbol https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq47_HTML.gif as the top layer. The entire constellation https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq48_HTML.gif is enumerated in the exact marginalization ( https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq49_HTML.gif in (5)) only for https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq50_HTML.gif . For the https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq51_HTML.gif candidate https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq52_HTML.gif in https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq53_HTML.gif , by canceling its effect on the received symbol vector https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq54_HTML.gif , a new vector
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ10_HTML.gif
(10)
is computed.
(2) By multiplying https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq55_HTML.gif with https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq56_HTML.gif from (9), compute
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ11_HTML.gif
(11)
(3) Based on https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq57_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq58_HTML.gif , using DFE, https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq59_HTML.gif can be estimated using hard decision. From this, compute the Euclidean distance
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ12_HTML.gif
(12)
and eventually the log-likelihood ratio (LLR). Taking a 64-QAM system as an example, as shown in the following:
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ13_HTML.gif
(13)
the LLR of the six bits that constitute the top-layer symbol can be computed using (12). This involves the computation of 64 different https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq60_HTML.gif as shown in (14)
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ14_HTML.gif
(14)

3.3. Modified FCSO (MFCSO)

Although the FCSO detector has substantially reduced the complexity compared to MAP detector, further reduction is still needed for a practical implementation with large signal constellations. In the following, further approximations and improvements to FCSO detection, namely Modified FCSO (MFCSO) detector [5], are elaborated. In [4], the entire constellation https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq61_HTML.gif is enumerated in the exact marginalization ( https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq62_HTML.gif in (5)). In this paper, instead of searching the full constellation https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq63_HTML.gif , we propose to sum over only a subset https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq64_HTML.gif of constellation points around an initial estimate https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq65_HTML.gif . This initial estimate will be obtained by zero-forcing detection. The size of https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq66_HTML.gif , denoted by https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq67_HTML.gif , is chosen to be 16 and 8 in this paper for the complexity and performance comparisons. In effect, the proposed detector is a further approximation of that in [4], which consists of only partially enumerating the symbols selected for exact marginalization (the set https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq68_HTML.gif in (5)).
Similar to FCSO, the channel-rate processing of MFCSO involves computing QRD https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq69_HTML.gif times, as shown in (9) and (8). As an overhead compared to FCSO, the coefficient matrix
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ15_HTML.gif
(15)
is needed to perform the ZF/MMSE-based initial estimate of https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq70_HTML.gif in (16) below. The symbol-rate processing of MFCSO is the following
(1) Linear detection (ZF/MMSE) is carried out to estimate the initial symbol vector
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ16_HTML.gif
(16)
Here https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq71_HTML.gif is the transmitted symbol vector, https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq72_HTML.gif is the https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq73_HTML.gif symbol in it.
(2) For each initially estimated symbol https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq74_HTML.gif , a candidate set https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq75_HTML.gif is created. https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq76_HTML.gif contains https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq77_HTML.gif lattice points close to https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq78_HTML.gif .
(3) For each point https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq79_HTML.gif , approximate marginalization is applied to the rest of the layers either via ZF or ZF-DFE. According to (17), a multiplication of https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq80_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq81_HTML.gif is needed for each https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq82_HTML.gif which is updated proportionally to the size of https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq83_HTML.gif and the symbol rate. However, note that
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ17_HTML.gif
(17)
where https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq84_HTML.gif is an https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq85_HTML.gif vector, which can be precalculated at channel rate.
(4) Using back substitution [14], https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq86_HTML.gif can be estimated from
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ18_HTML.gif
(18)
(5) https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq87_HTML.gif together with https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq88_HTML.gif form a complete possible transmitted symbol vector which has an Euclidean distance
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ19_HTML.gif
(19)
(6) In total, there will be https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq89_HTML.gif different https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq90_HTML.gif values for each layer, and there will be four layers each being the top layer once. Therefore, for a https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq91_HTML.gif system, https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq92_HTML.gif different https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq93_HTML.gif values need to be computed. In case https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq94_HTML.gif , there will be 64 different https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq95_HTML.gif values which is 1/4 compared to the FCSO proposed in [4].
(7) For the sake of low complexity, instead of MAP detection, the following approximation can be used, so that
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ20_HTML.gif
(20)
As presented in [5], the performance gap between MAP and MFCSO for https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq96_HTML.gif MIMO using 64-QAM and 3/4 convolutional coding was proven to be small when https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq97_HTML.gif (0.5 dB when FER= https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq98_HTML.gif ). The gap increases to 2 dB when https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq99_HTML.gif . On the other hand, the complexity of the detector when https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq100_HTML.gif is already feasible for VLSI implementation.

3.4. MFCSO in LTE and WiMAX

As a simplification of the general MFCSO algorithm presented in Section 3.3, a https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq101_HTML.gif MFCSO method for SM is elaborated in the following. Considering each complex-valued symbol as one layer, only one of them is exactly marginalized and the other is approximately marginalized (using DFE hard decision). The channel rate processing of MFCSO involves the QR decomposition (QRD) of two https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq102_HTML.gif channel matrices which are https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq103_HTML.gif in (3) and
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ21_HTML.gif
(21)
The QRD generates an upper triangular matrix https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq104_HTML.gif , and a unitary matrix https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq105_HTML.gif according to (9).
The detection procedure for https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq106_HTML.gif SM described in the following text is slightly different from the MFCSO presented in [5].
(1) Linear detection in (16) is carried out to estimate the https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq107_HTML.gif initial symbol vector
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ22_HTML.gif
(22)
Here https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq108_HTML.gif is the transmitted symbol vector, within which, https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq109_HTML.gif is the https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq110_HTML.gif symbol.
(2) For each initially estimated symbol https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq111_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq112_HTML.gif , a candidate set https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq113_HTML.gif is created. https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq114_HTML.gif contains https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq115_HTML.gif constellation points close to https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq116_HTML.gif .
(3) First https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq117_HTML.gif is chosen as the top-layer symbol. In order to perform DFE,
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ23_HTML.gif
(23)
needs to be computed. The same operation is needed once again when https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq118_HTML.gif is chosen as the top layer later.
(4) For the https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq119_HTML.gif constellation point https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq120_HTML.gif , its effect on https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq121_HTML.gif will have to be canceled out.
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ24_HTML.gif
(24)
Based on https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq122_HTML.gif , the partial Euclidean distance
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ25_HTML.gif
(25)
computed for the top-layer.
(5) DFE is applied to detect the other layer. Using back substitution [14], https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq123_HTML.gif can be estimated from
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ26_HTML.gif
(26)
(6) The estimated https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq124_HTML.gif together with https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq125_HTML.gif form a complete possible transmitted symbol vector https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq126_HTML.gif , from which an accumulated full Euclidean distance
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ27_HTML.gif
(27)
can be computed.
(7) In total, there will be https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq127_HTML.gif different https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq128_HTML.gif computed when https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq129_HTML.gif is chosen as the top layer. Then https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq130_HTML.gif is chosen as the top-layer symbol as well. Based on https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq131_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq132_HTML.gif , the same procedure needs to be done once again to compute another https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq133_HTML.gif different https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq134_HTML.gif . Hence, for the https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq135_HTML.gif system, https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq136_HTML.gif different https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq137_HTML.gif values need to be computed. They are used to update the LLR values in the end as described in [5].

4. Flow Analysis of MIMO Detection

Independent of the detection method, the processing flow of MIMO symbol detection can always be partitioned into two parts, namely channel-rate processing and symbol-rate processing as depicted in Figure 3.

4.1. Channel-Rate Preprocessing

The channel preprocessing is about the precalculation of equalization coefficient matrices from the estimated channel matrix https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq138_HTML.gif . According to (15)), the computation involved in linear detection is mainly matrix manipulation including matrix multiplication and inversion. Here the matrix https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq139_HTML.gif can be a complex-valued matrix of arbitrary size. As mentioned in [15], in practice, the size of https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq140_HTML.gif is typically between https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq141_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq142_HTML.gif . Although larger matrices (e.g., https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq143_HTML.gif ) can still be managed [15], the cost of real-time implementation will be much higher. For MFCSO, channel-rate processing includes the QR decomposition in (9). For MFCSO, aside from computing https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq144_HTML.gif , QR decomposition is also needed according to (9).

4.2. Symbol-Rate Processing

The symbol-rate processing in soft-output linear detection [16] is to demap the equalized complex values to soft bits. In case of near-MAP detection methods such as MFCSO, layered processing is involved which requires substantially more computational effort. As described in Section 3.3, the symbol-rate processing in MFCSO involves the multiplication, subtraction, and computing the Euclidean distance based on estimated symbols.

5. Architecture of the MIMO Detector

The block diagram of the MFCSO detector is depicted in Figure 4. The detector contains two major parts, the channel preprocessing unit (ChPU) and the detection unit (DU). As presented in Section 3.3 and [5], it is decided that the candidate set size https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq145_HTML.gif for 64-QAM. It allows real-time detection of both https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq146_HTML.gif STBC/SFBC and SM for LTE and WiMAX. Modulation schemes from QPSK to 64-QAM are supported.

5.1. Channel Preprocessing Unit

The ChPU as depicted in Figure 5 handles channel-rate processing tasks such as computation of https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq147_HTML.gif in (15) and the QR decomposition in (9). These are performed every time the estimated channel is updated. The computed coefficient matrices https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq148_HTML.gif will be stored in the coefficient buffer and fed to the LLR demapper as input. As depicted in Figure 5, ChPU contains two Complex-valued Multiply-and-ACcumulate (CMAC), an inverse-square-root unit and a 32-bit register file containing 24 registers. The ChPU is a programmable unit controlled by microcode. The operations supported by the ChPU are listed in Table 1. The method presented in [16] has been used to compute https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq149_HTML.gif , and the Modified Gram-Schmidt method [14] is used to compute https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq150_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq151_HTML.gif matrices in (9).
Table 1
Operations supported by ChPU.
Operation
Description
Cplx squared abs
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq152_HTML.gif
Sum squared abs
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq153_HTML.gif
Cplx inner product
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq154_HTML.gif
Cplx multiply
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq155_HTML.gif
 
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq156_HTML.gif
Cplx multiply-add
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq157_HTML.gif
 
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq158_HTML.gif
Real-Cplx multiply
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq159_HTML.gif
Real Inverse-Sqrt
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq160_HTML.gif

5.2. Detection Unit

The DU computes the LLR values using the method presented in Section 3 and the Log-Max approximation in (20)
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_Equ28_HTML.gif
(28)
The DU consists of a number of processing elements (PE) as illustrated in Figure 6 which can utilize the parallelism in the MFCSO algorithm. The computed LLR values https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq161_HTML.gif can be either directly passed to the channel decoder or combined with previously stored LLR values in the soft-buffer for H-ARQ. Since the processing in DU is at symbol rate which is much higher than the channel-rate processing in ChPU, a fully pipelined architecture is used in DU to allow the computation of 16 different https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq162_HTML.gif in (27) to be finished within 16 clock cycles. DU is configured by a control register and can bypass the functions defined in Section 3 to only enable MMSE detection with soft output. The MMSE mode can be used in power saving mode to reduce the power consumption with a loss of detection performance. A 16-bit fixed-point datatype with proper scaling is adopted in DU, the output LLR values are quantized to be 6-bit signed integers. The number of PE in the DU is decided at design time according to the processing load and latency analysis. In this paper, it is chosen to be two based on the latency analysis in Section 9.3.

5.3. Memory Subsystem

The MIMO detector itself does not contain memory except the small program memory. In order to store the temporarily computed https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq163_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq164_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq165_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq166_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq167_HTML.gif which are updated by the channel preprocessor at the channel rate, a coefficient buffer as depicted in Figure 4 is needed. The coefficient memory stores the above values for all data subcarriers (up to 20 MHz bandwidth for LTE and 10 MHz to WiMAX). The FIFO that stores the incoming data to the detector from the channel estimator and the subcarrier demapper is not shown in the figure, neither is the FIFO that passes the computed LLR values to the channel decoder hardware. Note that in case STBC is used, the number of data stored in https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq168_HTML.gif memory can be reduced almost by half owing to the Alamouti features of https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq169_HTML.gif , and no https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq170_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq171_HTML.gif matrices are needed.

6. Performance Evaluation

In order to evaluate the performance of various MIMO detection algorithms, simulation is carried out using link-level 3GPP LTE and WiMAX simulators [17, 18]. The simulators are developed using MATLAB and C.
It includes the complete physical layer signal processing such as timing/frequency synchronization, channel estimation, subcarrier demapping, rate-matching, and turbo decoding. H-ARQ based on CRC of coded blocks is also enabled to support chase combine (CC) with up to three retransmissions. The bandwidth is set to be 5MHz in the simulation, the velocity of UE is 3 km/h and the scenario is urban micro [19]. Perfect synchronization and channel estimation are assumed to focus the simulation on detection performance. The Turbo decoder runs at most six iterations with early stopping. The WiMAX simulator [17] also works on 5MHz bandwidth. Two channel coding methods used in the simulation are Reed-Solomon with Convolutional (RS-Conv) and Low-Density Parity-Check (LDPC) coding. Two channel models namely the 3GPP SCME [19] and ITU Pedestrian B (PedB) [17] channel models are used in this paper. It is assumed the channel is quasistatic within one OFDM symbol duration. Note that the 1-TTI latency is introduced for uplink ACK/NACK in the simulation.

6.1. https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq172_HTML.gif GPP LTE

Figure 7 shows the block error rate (BLER) of the LTE system with H-ARQ using different detection methods. The blue curves are the BLER of the first transmission while the red ones represent that of the first retransmission in H-ARQ. The figure shows that the BLER of the retransmission is drastically reduced compared to the first transmission which improves the throughput as shown later.
The result in Figures 8 and 9 shows that in case of 64-QAM and the weakest (rate https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq174_HTML.gif ) channel coding defined in LTE is used, for https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq175_HTML.gif SM, the FER performance of MAP is always better than that of MFCSO and https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq176_HTML.gif -best. MFCSO achieves lower FER than the https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq177_HTML.gif -best ( https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq178_HTML.gif ) used in [10] until very high SNR. MMSE has the worst FER performance. Note that in wireless systems, throughput is a more important performance factor than BER or FER because it has a direct effect on the user experience. Figure 9 shows that the gain in throughput brought by MFCSO against MMSE is significant (up to 12.6 Mbits/s, or https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq179_HTML.gif higher than the one achieved by MMSE). In comparison, the throughput performance degradation caused by the approximation in MFCSO is much smaller (up to 2.5 Mbits/s, or https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq180_HTML.gif lower than that achieved by MAP). The much smaller gap in throughput in comparison to that of FER mainly owes to the H-ARQ retransmission with chase combining. The result shows that even with a sub optimal detector (with much lower complexity than the optimal detector) and almost no channel coding, a throughput that is close to the one achievable by MAP detectors can still be reached when H-ARQ is used. The throughput gain of MFCSO over the K-best is as significant as 5 Mbits/s ( https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq181_HTML.gif ), when SNR is 26 dB.
Figures 10 and 11 show the BLER and throughput of https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq182_HTML.gif SFBC with two different CQI values (9 and 15). The simulation shows that SFBC reaches https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq183_HTML.gif at much lower SNR than SM as depicted in Table 2, though the throughput is half.
Table 2
Minimum SNR to reach FER=0.01.
CQI
SFBC (MMSE)
SM (MFCSO)
SM (MMSE)
9
10 dB
17 dB
24 dB
15
24 dB
36 dB
N/A
Figure 12 depicts the achievable throughput using two-level adaptive modulation and coding (AMC). The result shows that when SNR is worse than 10 dB, SFBC achieves both higher throughput and lower BLER than SM even if MAP detector is used.

6.2. WiMAX

The result in Figures 13 and 14 shows that when mild channel coding (e.g., RS-Conv 3/4) is used without H-ARQ in the WiMAX system, MFCSO still achieves near-MAP performance in FER and MAP performance in throughput. It has a gain of more than 9 dB compared to the MMSE detector. The use of stronger code (e.g. LDPC) will bring a gain of 4 dB in throughput compared to RS-Conv. This shows that MFCSO has a very promising performance/complexity trade-off taking the advance of channel coding into consideration. The result also shows that once FER reaches 0.01, any further improvement of FER gives only negligible increase in throughput.

6.3. Impact of Channel Estimation Error

In most of the literatures [1, 3, 5], perfect channel state information (CSI) is assumed which is never true in reality. In [4], channel estimation error is emulated with a randomly generated error constrained by the value of its average power, and the affected FER is plotted. However, how the channel estimation error affects the link-level performance of MIMO detection with the presence of H-ARQ has not been studied according to the best knowledge of the authors. In this paper, based on the least square (LS) channel estimation, the impact of channel estimation error on link-level performance is investigated, which provides a realistic measurement of the achievable performance of the MFCSO detector in a practical system. In this paper, an LTE system with https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq186_HTML.gif (coding rate 0.8547, 64-QAM) and open-loop https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq187_HTML.gif MIMO scheme is simulated using PedB channel. For comparison purposes, the MFCSO detector is benchmarked against the soft-output sphere decoding (SSD) in [1] and the MAP detector. However, note that no complexity reduction of SSD as used in [1] is applied in this paper, thus, the SSD performance reaches the upper bound. As depicted in Figure 15 and 16, regardless of the channel estimation error, SSD always achieves the same BLER and throughput performance as MAP detection. In Figure 15, the slope of the BLER curve of MFCSO will decrease when SNR reaches 28 dB. Considered from traditional point of view, the BLER performance of MFCSO is significantly worse than SSD and MAP (more than 2 dB). However, as shown in Figure 16, the throughput performance of MFCSO is only negligibly lower (0.3 dB) than that of SSD and MAP. This further proves that MFCSO has a better performance/complexity trade-off when taking system-level impact into consideration. Figure 16 also shows the throughput gap between the case assuming perfect CSI and the one with realistic LS estimated CSI is 1.5 dB in the active region for https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq188_HTML.gif . In principle, channel estimation error will only cause the throughput curve to shift right by 1.5 dB.

7. Implementation Considerations

In LTE [11], taking a 5 MHz bandwidth LTE system as an example, up to 7 OFDM symbols need to be processed within one slot (0.5 ms) which contain 1900 data subcarriers. This means that there will be no more than https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq189_HTML.gif to finish the detection of each subcarrier on average. Therefore, proper detection methods have to be chosen in order to maximize the data rate at reasonable implementation cost.
As depicted in (7), for https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq190_HTML.gif SM, the MMSE detector needs to compute the inverse of a https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq191_HTML.gif matrix. It has been presented in [16] that the inversion of small matrices can be done using direct inversion which supplies sufficient precision for most of the channels. The FCSO and MFCSO detector involves the search of a number of trellis nodes as depicted in Table 3. The FCSO detector always visits the complete constellation (e.g., 16 for 16-QAM and 64 for 64-QAM), while MFCSO only visits a subset of it (e.g., 9 for 16-QAM and 16 for 64-QAM). Note that MFCSO requires MMSE detection to compute the initial estimate (22) which is an extra cost compared to FCSO. To the knowledge of the author, SSD with complexity reduction [1] has a similar complexity compared to FCSO, which is not analyzed in this paper due to the limited space.
Table 3
Complexity analysis for ASIC implementation (65 nm).
  
MMSE
MFCSO
FCSO
MAP
Num nodes
16-QAM
1
18
32
256
 
64-QAM
1
32
128
4096
Logic ( https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq192_HTML.gif )
64-QAM
0.08
0.2
0.6
20
In practice, the hardware is usually implemented taking both the cost and performance issues into consideration. Based on the complexity analysis in Table 6 and the performance analysis in Section 6, MFCSO falls into the favor of the authors to be chosen as the target algorithm for ASIC implementation. Using ST 65 nm CMOS process, while meeting the https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq193_HTML.gif constraint, the implemented detector supporting both MMSE and MFCSO for https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq194_HTML.gif  SM and up to 64-QAM modulation occupies less than https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq195_HTML.gif as proven later.

8. Adaptive Transmission and Detection

As depicted in Table 3, a detector supporting dual-mode MFCSO/MMSE detection consumes 2.5 times the area of the one only supporting MMSE. Hence, the former one is assumed to target high-end users willing to pay more in area and power for performance (e.g., laptops). The MMSE single-mode detector is in favor of low-end users for connectivity with minimum cost (e.g., smartphones). Note that the user cares about latency as well as throughput, and latency is partly determined by the number of retransmissions. Hence, it is also important to keep the retransmissions to a minimum (which requires low FER). Figure 12 shows that with AMC, SM using MFCSO detector always brings higher throughput when SNR is greater than 10 dB. For both types of users, when SNR is worse than 10 dB (as in Figure 12), SFBC is preferred instead of SM. For low-end users, SM can be used when https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq196_HTML.gif while SFBC is still preferable (due to the low FER thus fewer retransmissions resulting in low latency) to be used from 10 to 25 dB. For high-end users, SM is preferred when SNR is at least higher than 10 dB. On the other hand, the MMSE mode will consume substantiately lower power than the MFSCO mode, the high-end users might only want to switch to MFCSO-mode when there is enough battery power and high SNR (e.g., https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq197_HTML.gif 25 dB). When SNR is very low, SFBC is also preferred due to its robustness (as depicted in Figure 12). The SNR ranges suggested for the mode switching of two types of detector hardware are shown in Table 4. The adaptive scheme brings power efficiency and can supply best-effort performance in an economic way.
Table 4
Adaptive transmission and detection.
SNR range
SFBC
SM
High-end (MFCSO/MMSE)
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq198_HTML.gif
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq199_HTML.gif 10 dB
Low-end (MMSE only)
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq200_HTML.gif
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq201_HTML.gif 26 dB

9. Final VLSI Implementation

The implementation of our design is done in two steps. First, for fast prototype and to compare with the prior art in [10], the symbol detector is implemented using Xilinx FPGA. Second, ASIC flow including synthesis, floorplan, placement, and routing is carried out using ST 65 nm process libraries and Synopsys low-power design flow.

9.1. FPGA Prototype

Xilinx ISE and Core Generator were used to synthesize the design based on the Virtex2 xc2v6000 FPGA. The synthesis result is depicted in Table 5. The proposed implementation supports up to 64-QAM as described in Section 9.3. Table 5 shows that it consumes https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq202_HTML.gif fewer slices and https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq203_HTML.gif fewer embedded multipliers compared to the K-best detector presented in [10]. Note that the K-best FPGA implementation in [10] only supports the real-time detection of https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq204_HTML.gif QPSK spatial multiplexing in LTE. The FPGA-based detector presented in [8] covers a different antenna configuration, and most importantly the Virtex-5 FPGA used has a different architecture from the Virtex-2 FPGAs, which makes it difficult to make an area comparison.
Table 5
FPGA implementation result for real-time processing.
 
This work
Ref [10]
Algorithm
MFCSO
https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq205_HTML.gif -best LSD
Modulation supported
up to 64-QAM
FPGA type
Virtex2
Datatype
fixed-point
Wordlength (bits)
16
Num of slices
4381
15662
Num of MULT18X18s
48
108
Block RAMs
3
61
Frequency (MHz)
85
70
Throughput for 64-QAM (Mbps)
67.5
6
Table 6
ASIC implementation result.
Area of channel preprocessing unit (kgate)
35
Area of detection unit (kgate)
55
Cycles for https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq206_HTML.gif
3
Working frequency (MHz)
300
Throughput for 64-QAM (Mbps)
225

9.2. ASIC Implementation

Table 6 depicts the gate count, and working frequency of the ASIC implementation. In reality, the channel coefficients are updated less frequently than the received symbols, thus, they are saved in the coefficient memory which is not counted in [10]. In order to compare the area consumed by memory and the detector itself, a demo chip including a 172800 bit coefficient memory and a 19200 bit data memory for 5 MHz bandwidth is implemented using Cadence backend flow. As depicted in Figure 17, the total area of the detector is https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq207_HTML.gif with half of it consumed by the actual logic ( https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq208_HTML.gif 0.2  https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq209_HTML.gif ) of the detector and the other half by the memory. Note that the microcode memory is implemented as a piece of logic in the chip. The size of the memories depends on the number of subcarriers (or bandwidth) to be supported. The https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq210_HTML.gif -best detector in [20] supports https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq211_HTML.gif MIMO and 100 Mbps data rate. As mentioned in Section 3.3, the complexity of MFCSO is proportional to https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq212_HTML.gif . Hence, the area of the detection part for https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq213_HTML.gif will be four times of the presented https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq214_HTML.gif solution. Compared to the https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq215_HTML.gif figure of a https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq216_HTML.gif -best detector for https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq217_HTML.gif MIMO in 0.13- https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq218_HTML.gif m running at 270 MHz (which is https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq219_HTML.gif without memory in 65-nm according to CMOS scaling), the solution in this paper is https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq220_HTML.gif without memory. Also note that [20] does not include the channel preprocessing part which is expected to give a major contribution in area (it already consumes half of the area of this solution).

9.3. Processing Throughput

Taking the assumption made in [10], for LTE system with 5 MHz bandwidth, there will be at most 300 data subcarriers to be processed within one OFDM symbol duration which is https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq221_HTML.gif . This requires the detection of each data subcarrier to be finished within https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq222_HTML.gif . For the FPGA implementation which has a clock frequency of 90 MHz, this amount of time is equal to around 25 clock cycles. Note that the detector can process two subcarriers in parallel which means each subcarrier can be finished within 16 cycles. For https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq223_HTML.gif spatial multiplexing and 64-QAM (12 bits per subcarrier), this corresponds to https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq224_HTML.gif Mbps processing throughput.
The ASIC implementation can easily run at a clock frequency of 300 MHz which means 1570 data subcarriers can be computed within https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq225_HTML.gif . This corresponds to 225 Mbps processing throughput which allows real-time detection of 20 MHz bandwidth LTE downlink (containing up to 1200 data subcarriers) to be supported. Since the WiMAX 2004 [17] only uses 10 MHz bandwidth, it has a lower peak data rate than LTE, thus can be easily supported.
Note that the MFCSO detector can be switched to MMSE mode by poweringdown the major part of the DU. The detection in SFBC/STBC transmission schemes is in fact MMSE detection which can be handled by the MMSE mode. Since the MMSE mode will consume substantially less power than the MFCSO mode, the detector is switched to MMSE mode when the terminal enters power-saving mode.

10. Conclusion

In this paper, the VLSI implementation of a fixed complexity near-MAP MIMO detector ASIC is presented for multistandard wireless terminals. It achieves near-MAP throughput during LTE simulations, even with a relatively weak channel code and with high-order modulation (e.g., https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq226_HTML.gif ). Furthermore, based on the adaptive scheme proposed in Section 8, a good performance and power trade-off can be achieved. In comparison to prior art such as the https://static-content.springer.com/image/art%3A10.1155%2F2010%2F893184/MediaObjects/13638_2009_Article_2053_IEq227_HTML.gif -best solution in [10], the detector presented achieves better performance and lower silicon cost. The impact of realistic channel estimation on detection performance is also presented.

Acknowledgments

The work of D. Wu, J. Eilert, R. Asghar, and D. Liu is supported by the Multibase Project from European Commission's 7th Framework in partner with Ericsson AB, Infineon AG, IMEC, Lund University, and KU-Leuven. The authors would like to thank ST Microelectronics for supplying 65nm process, ProfessorErik G. Larsson for discussion on MIMO detection, and Christian Mehlführer and the Christian Doppler Laboratory for Design Methodology of Signal Processing Algorithms at Vienna University of Technology, for contributions on the LTE simulation chain.
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://​creativecommons.​org/​licenses/​by/​2.​0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Literatur
1.
Zurück zum Zitat Studer C, Wenk M, Burg A, Bölcskei H: Soft-output sphere decoding: performance and implementation aspects. Proceedings of the 40th Asilomar Conference on Signals, Systems, and Computers (ACSSC '06), November 2006 2071-2076. Studer C, Wenk M, Burg A, Bölcskei H: Soft-output sphere decoding: performance and implementation aspects. Proceedings of the 40th Asilomar Conference on Signals, Systems, and Computers (ACSSC '06), November 2006 2071-2076.
2.
Zurück zum Zitat Li M, Bougard B, Xu W, Novo D, Van Der Perre L, Catthoor F: Optimizing Near-ML MIMO detector for SDR baseband on parallel programmable architectures. Proceedings of the Conference on Design, Automation and Test in Europe (DATE '08), March 2008 444-449.CrossRef Li M, Bougard B, Xu W, Novo D, Van Der Perre L, Catthoor F: Optimizing Near-ML MIMO detector for SDR baseband on parallel programmable architectures. Proceedings of the Conference on Design, Automation and Test in Europe (DATE '08), March 2008 444-449.CrossRef
3.
Zurück zum Zitat Barbero LG, Thompson JS: Rapid prototyping of a fixed-throughput sphere decoder for MIMO systems. Proceedings of the IEEE International Conference on Communications (ICC '06), June 2006 3082-3087. Barbero LG, Thompson JS: Rapid prototyping of a fixed-throughput sphere decoder for MIMO systems. Proceedings of the IEEE International Conference on Communications (ICC '06), June 2006 3082-3087.
4.
Zurück zum Zitat Larsson EG, Jaldén J: Fixed-complexity soft MIMO detection via partial marginalization. IEEE Transactions on Signal Processing 2008, 56(8):3397-3407.MathSciNetCrossRef Larsson EG, Jaldén J: Fixed-complexity soft MIMO detection via partial marginalization. IEEE Transactions on Signal Processing 2008, 56(8):3397-3407.MathSciNetCrossRef
5.
Zurück zum Zitat Wu D, Larsson EG, Liu D: Implementation aspects of fixed-complexity soft-output MIMO detection. Proceedings of the 69th IEEE Vehicular Technology Conference (VTC '09), April 2009 Wu D, Larsson EG, Liu D: Implementation aspects of fixed-complexity soft-output MIMO detection. Proceedings of the 69th IEEE Vehicular Technology Conference (VTC '09), April 2009
6.
Zurück zum Zitat Moezzi-Madani N, et al.: A low-area flexible MIMO detector for WiMAX/WiFi standards. Proceedings of the Conference on Design, Automation and Test in Europe (DATE '10), March 2010, Dresden, Germany 1637-1640. Moezzi-Madani N, et al.: A low-area flexible MIMO detector for WiMAX/WiFi standards. Proceedings of the Conference on Design, Automation and Test in Europe (DATE '10), March 2010, Dresden, Germany 1637-1640.
7.
Zurück zum Zitat Cupaiuolo T, et al.: Low-complexity high throughput VLSI architecture of soft-output ML MIMO detector. Proceedings of the IEEE Dessign, Test and Automation in Europe, March 2010, Dresden, Germany Cupaiuolo T, et al.: Low-complexity high throughput VLSI architecture of soft-output ML MIMO detector. Proceedings of the IEEE Dessign, Test and Automation in Europe, March 2010, Dresden, Germany
8.
Zurück zum Zitat Amiri K, Cavallaro JR, Dick C, Rao RM: A high throughput configurable SDR detector for multi-user MIMO wireless systems. Journal of Signal Processing Systems. In press Amiri K, Cavallaro JR, Dick C, Rao RM: A high throughput configurable SDR detector for multi-user MIMO wireless systems. Journal of Signal Processing Systems. In press
9.
Zurück zum Zitat Wu D, Eilert J, Liu D: Evaluation of MIMO symbol detectors for 3GPP LTE terminals. Proceedings of the 17th European Signal Processing Conference (EUSIPCO '09), 2009, Glasgow, Scotland Wu D, Eilert J, Liu D: Evaluation of MIMO symbol detectors for 3GPP LTE terminals. Proceedings of the 17th European Signal Processing Conference (EUSIPCO '09), 2009, Glasgow, Scotland
10.
Zurück zum Zitat Ketonen J, Juntti M: SIC and K-best LSD receiver implementation for a MIMO-OFDM system. Proceedings of the 16th European Signal Processing Conference (EUSIPCO '08), August 2008 Ketonen J, Juntti M: SIC and K-best LSD receiver implementation for a MIMO-OFDM system. Proceedings of the 16th European Signal Processing Conference (EUSIPCO '08), August 2008
11.
Zurück zum Zitat 3GPP : Evolved Universal Terrestrial Radio Access (EUTRA): physical channels and modulation. Technical Specifications September 2008., (36.211 V8.4.0): 3GPP : Evolved Universal Terrestrial Radio Access (EUTRA): physical channels and modulation. Technical Specifications September 2008., (36.211 V8.4.0):
12.
Zurück zum Zitat Alamouti SM: A simple transmit diversity technique for wireless communications. IEEE Journal on Selected Areas in Communications 1998, 16(8):1451-1458. 10.1109/49.730453CrossRef Alamouti SM: A simple transmit diversity technique for wireless communications. IEEE Journal on Selected Areas in Communications 1998, 16(8):1451-1458. 10.1109/49.730453CrossRef
13.
Zurück zum Zitat Siti M, Fitz MP: A novel soft-output layered orthogonal lattice detector for multiple antenna communications. Proceedings of the IEEE International Conference on Communications (ICC '06), June 2006 1686-1691. Siti M, Fitz MP: A novel soft-output layered orthogonal lattice detector for multiple antenna communications. Proceedings of the IEEE International Conference on Communications (ICC '06), June 2006 1686-1691.
14.
Zurück zum Zitat Golub GH, Van Loan CF: Matrix Computations. 3rd edition. The Johns Hopkins University Press, Baltimore, Md, USA; 1996.MATH Golub GH, Van Loan CF: Matrix Computations. 3rd edition. The Johns Hopkins University Press, Baltimore, Md, USA; 1996.MATH
15.
Zurück zum Zitat Wu D, Eilert J, Liu D, Wang D, Al-Dhahir N, Minn H: Fast complex valued matrix inversion for multi-user STBC-MIMO decoding. Proceedings of the IEEE Computer Society Annual Symposium on VLSI: Emerging VLSI Technologies and Architectures (ISVLSI '07), March 2007 325-330.CrossRef Wu D, Eilert J, Liu D, Wang D, Al-Dhahir N, Minn H: Fast complex valued matrix inversion for multi-user STBC-MIMO decoding. Proceedings of the IEEE Computer Society Annual Symposium on VLSI: Emerging VLSI Technologies and Architectures (ISVLSI '07), March 2007 325-330.CrossRef
16.
Zurück zum Zitat Wu D, Eilert J, Liu D: Implementation of a high-speed MIMO soft-output symbol detector for software defined radio. Journal of Signal Processing Systems. In press Wu D, Eilert J, Liu D: Implementation of a high-speed MIMO soft-output symbol detector for software defined radio. Journal of Signal Processing Systems. In press
17.
Zurück zum Zitat Mehlführer C, Caban S, Rupp M: Experimental evaluation of adaptive modulation and coding in MIMO WiMAX with limited feedback. EURASIP Journal on Advances in Signal Processing 2008., 2008: Mehlführer C, Caban S, Rupp M: Experimental evaluation of adaptive modulation and coding in MIMO WiMAX with limited feedback. EURASIP Journal on Advances in Signal Processing 2008., 2008:
18.
Zurück zum Zitat Mehlführer C, Wrulich M, Ikuno JC, Bosanska D, Rupp M: Simulating the long term evolution physical layer. Proceedings of the 17th European Signal Processing Conference (EUSIPCO '09), 2009, Glasgow, Scotland Mehlführer C, Wrulich M, Ikuno JC, Bosanska D, Rupp M: Simulating the long term evolution physical layer. Proceedings of the 17th European Signal Processing Conference (EUSIPCO '09), 2009, Glasgow, Scotland
19.
Zurück zum Zitat Baum DS, Salo J, Milojevic M, Kyösti P, Hansen J: MATLAB implementation of the interim channel model forbeyond-3G systems (SCME). May 2005. Baum DS, Salo J, Milojevic M, Kyösti P, Hansen J: MATLAB implementation of the interim channel model forbeyond-3G systems (SCME). May 2005.
20.
Zurück zum Zitat Chen S, Zhang T, Xin Y: Relaxed K-best MIMO signal detector design and VLSI implementation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2007, 15(3):328-337.CrossRef Chen S, Zhang T, Xin Y: Relaxed K-best MIMO signal detector design and VLSI implementation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2007, 15(3):328-337.CrossRef
Metadaten
Titel
VLSI Implementation of a Fixed-Complexity Soft-Output MIMO Detector for High-Speed Wireless
verfasst von
Di Wu (EURASIP Member)
Johan Eilert
Rizwan Asghar
Dake Liu
Publikationsdatum
01.12.2010
Verlag
Springer International Publishing
DOI
https://doi.org/10.1155/2010/893184

Weitere Artikel der Ausgabe 1/2010

EURASIP Journal on Wireless Communications and Networking 1/2010 Zur Ausgabe