15.1 Introduction
In a relatively unknown paper published in 1974, Dorsch [
4] described a decoder for linear binary block (
n,
k) codes using soft decisions quantised to
J levels. The decoder is applicable to any linear block code and does not rely upon any particular features of the code, such as being a concatenated code or having a sparse parity-check matrix. In the Dorsch decoder, hard decisions are derived from the soft decisions using standard bit by bit detection, choosing the binary state closest to the received coordinate. The hard decisions are then ranked in terms of their likelihoods and candidate codewords are derived from a set of
k, independent, most likely bits. This is done by producing a new parity-check matrix
\(\mathbf {H_I}\) obtained by reordering the columns of the original
\(\mathbf {H}\) matrix according to the likelihood of each coordinate, and reducing the resulting matrix to echelon canonical form by elementary row operations. After evaluation of several candidate codewords, the codeword with the minimum soft decision metric is output from the decoder. A decoder using a similar principle, but without soft decision quantisation, has been described by Fossorier [
5,
6]. Other approaches, after ranking the reliability of the received bits, adopt various search strategies for finding likely codewords [
11] or utilise a hard decision decoder in conjunction with a search for errors in the least likely bit positions [
2,
15].
The power of the Dorsch decoder arises from the relatively unknown property that most codes,
on average, can correct almost
\(n-k\) erasures [
17], which is considerably more than the guaranteed number of correctable erasures of
\(d_{min}-1\), or the guaranteed number of correctable hard decision errors of
\(\frac{d_{min}-1}{2}\), where
\(d_{min}\) is the minimum Hamming distance of the code. In its operation, the Dorsch decoder needs to correct any combination of
\(n-k\) erasures which is impossible unless the code is an MDS code [
12]. Dorsch did not discuss this problem, or potential solutions, in his original paper [
4], although at least one solution is implied by the results he presented.
In this chapter, a solution to the erasure correcting problem of being able to solve
\(n-k\) erasures for a non-MDS code is described. It is based on using alternative columns of the parity-check matrix without the need for column permutations. It is also shown that it is not necessary to keep recalculating each candidate codeword and its associated soft decision metric in order to find the most likely codeword. Instead, an incremental correlation approach is adopted which features low information weight codewords and a correlation function involving only a small number of coordinates of the received vector [
17]. It is proven that maximum likelihood decoding is realised provided all codewords are evaluated up to a bounded information weight. This means that maximum likelihood decoding may be achieved for a high percentage of received vectors. The decoder lends itself to a low complexity, parallel implementation involving a concatenation of hard and soft decision decoding. It produces near maximum likelihood decoding for codes that can be as long as 1000 bits, provided the code rate is high enough. When implementing the decoder, it is shown that complexity may be traded-off against performance in a flexible manner. Decoding results, achieved by the decoder, are presented for some of the most powerful binary codes known and compared to Shannon’s sphere packing bound [
14].
The extension to non-binary codes is straightforward and this is described in Sect.
15.5.
15.2 The Incremental Correlation Dorsch Decoder
Codewords with binary coordinates having state 0 or 1, are denoted as:
$$\begin{aligned} \mathbf {x}= (x_0,x_1,x_2,\dots ,x_{n-1}) \end{aligned}$$
For transmission, bipolar transmission is used with coordinates having binary state 0 mapped to
\(+\)1 and having state 1 mapped to −1. Transmitted codewords are denoted as
$$\begin{aligned} \mathbf {c}= (c_0,c_1,c_2,\dots ,c_{n-1}) \end{aligned}$$
The received vector
\(\mathbf {r}\) consists of
n coordinates
\((r_0,r_1,r_2,\dots ,r_{n-1})\) equal to the transmitted codeword plus Additive White Gaussian Noise with variance
\(\sigma ^2\). The received vector processed by the decoder is assumed to have been matched filtered and free from distortion so that
\(\frac{1}{\sigma ^2}=\frac{2E_b}{N_o}\), where
\(E_b\) is the energy per information bit and
\(N_o\) is the single sided noise power spectral density. Accordingly,
$$\begin{aligned} \sigma ^2=\frac{N_o}{2E_b} \end{aligned}$$
The basic principle that is used is that the
k most reliable bits of the received vector are initially taken as correct and the
\(n-k\) least reliable bits are treated as erasures. The parity-check equations of the code, as represented by
\(\mathbf {H}\), are used to solve for these erased bits and a codeword
\({\hat{\mathbf {x}}}\) is obtained. This codeword is either equal to the transmitted codeword or needs only small changes to produce a codeword equal to the transmitted codeword. One difficulty is that, depending on the code, the
\(n-k\) least reliable bits usually cannot all be solved as erasures. This depends on the positions of the erased coordinates and the power of the code. Only Maximum Distance Separable (MDS) codes [
12] are capable of solving
\(n-k\) erasures regardless of the positions of the erasures in the received codeword. Unfortunately, there are no binary MDS codes apart from trivial examples. However, a set of
\(n-k\) erasures can always be solved from
\(n-k+s\) least reliable bit positions, and, depending on the code, s is usually a small integer. In order to obtain best performance it is important that the very least reliable bit positions are solved first, since the corollary that the
\(n-k\) least reliable bits usually cannot all be solved as erasures is that the
k most reliable bits, used to derive codeword
\({\hat{\mathbf {x}}}\), must include a small number of least reliable bits. However, for most received vectors, the difference in reliability between ranked bit
k and ranked bit
\(k+s\) is usually small. For any received coordinate, the a priori log likelihood ratio of the bit being correct is proportional to
\(\vert r_i\vert \). The received vector
\(\mathbf {r}\) with coordinates ranked in order of most likely to be correct is defined as
\((r_{\mu _0},r_{\mu _1},r_{\mu _2},\dots ,r_{\mu _{n-1}})\), where
\(|r_{\mu _0} \vert> |r_{\mu _1} \vert> |r_{\mu _2}\vert> \dots > |r_{\mu _{n-1}} \vert \).
The decoder is most straightforward for a binary MDS code. The codeword coordinates \((x_{\mu _0},x_{\mu _1},x_{\mu _2},\dots ,x_{\mu _{k-1}})\) are formed directly from the received vector \(\mathbf {r}\) using the bitwise decision rule \(x_{\mu _i}=1\) if \(r_{\mu _i} < 0\) else \(x_{\mu _i}=0\). The \(n-k\) coordinates \((x_{\mu _k},x_{\mu _{k+1}},x_{\mu _{k+2}},\dots ,x_{\mu _{n-1}})\) are considered to be erased and derived from the k most reliable codeword coordinates \((x_{\mu _0},x_{\mu _1},x_{\mu _2},\dots ,x_{\mu _{k-1}})\) using the parity-check equations.
For a non-MDS code, the
\(n-k\) coordinates cannot always be solved from the parity-check equations because the parity-check matrix is not a Cauchy or Vandermonde matrix [
12]. To get around this problem a slightly different order is defined
\((x_{\eta _0},x_{\eta _1},x_{\eta _2},\dots ,x_{\eta _{n-1}})\).
The label of the last coordinate \(\eta _{n-1}\) is set equal to \(\mu _{n-1}\) and \(x_{\eta _{n-1}} \) solved first by flagging the first parity-check equation that contains \(x_{\eta _{n-1}} \), and then subtracting this equation from all other parity-check equations containing \(x_{\eta _{n-1}}\). Consequently, \(x_{\eta _{n-1}}\) is now only contained in one equation, the first flagged equation.
The label of the next coordinate \(\eta _{n-2}\) is set equal to \(\mu _{n-2}\) and an attempt is made to solve \(x_{\eta _{n-2}}\) by finding an unflagged parity-check equation containing \(x_{\eta _{n-2}}\). In the event that there is not an unflagged equation containing \(x_{\eta _{n-2}}\), \(\eta _{n-2}\) is set equal to \(\mu _{n-3}\) the label of the next most reliable bit, \(x_{\mu _{n-3}}\) and the procedure repeated until an unflagged equation contains \(x_{\eta _{n-2}}\). As before, this equation is flagged that it will be used to solve for \(x_{\eta _{n-2}}\) and is subtracted from all other unflagged equations containing \(x_{\eta _{n-2}}\). The procedure continues until all of the \(n-k\) codeword coordinates \(x_{\eta _{n-1}},x_{\eta _{n-2}},x_{\eta _{n-3}},\dots , x_{\eta _{k}}\) have been solved and all \(n-k\) equations have been flagged. In effect, the least reliable coordinates are skipped if they cannot be solved. The remaining k ranked received coordinates are set equal to \((r_{\eta _0},r_{\eta _1},r_{\eta _2},\dots ,r_{\eta _{k-1}})\) in most reliable order, where \(|r_{\eta _0} \vert> |r_{\eta _1} \vert> |r_{\eta _2}\vert> \dots > |r_{\eta _{n-1}} \vert \) and \((x_{\eta _0},x_{\eta _1},x_{\eta _2},\dots ,x_{\eta _{k-1}})\) determined using the bit decision rule \(x_{\eta _i}=1\) if \( r_{\eta _i} < 0\) else \(x_{\eta _i}=0\). The flagged parity-check equations are in upper triangular form and have to be solved in reverse order starting with the last flagged equation. This equation gives the solution to \(x_{\eta _{k}}\) which is back substituted into the other equations and \(x_{\eta _{k+1}}\) is solved next, back substituted, and so on, with coordinate \(x_{\eta _{n-1}}\) solved last.
This codeword is denoted as \({\hat{\mathbf {x}}}\) and the mapped version of the codeword is denoted as \({\hat{\mathbf {c}}}\).
As is well-known [
13], the codeword most likely to be transmitted is the codeword, denoted as
\({\breve{\mathbf {x}}}\), which has the smallest squared Euclidean distance,
\(D({\breve{\mathbf {x}}})\), between the mapped codeword,
\({\breve{\mathbf {c}}}\), and the received vector.
$$\begin{aligned}\nonumber D({\breve{\mathbf {x}}})=\sum _{j=0}^{n-1}(r_j-\breve{c}_{j})^2 \end{aligned}$$
\( D({\breve{\mathbf {x}}})<D(\mathbf {{x}})\text { for all other codewords }\mathbf {{x}}. \)
Equivalently
\({\breve{\mathbf {x}}}\) is the codeword, after mapping, which has the highest cross correlation
$$\begin{aligned} Y({\breve{\mathbf {x}}})=\sum _{j=0}^{n-1}r_j\times \breve{c}_{j} \end{aligned}$$
(15.1)
\( Y({\breve{\mathbf {x}}})>Y(\mathbf {{x}})\text { for all other codewords }\mathbf {{x}}. \)
The decoder may be simplified if the cross correlation function is used to compare candidate codewords. The cross correlation is firstly determined for the codeword
\({\hat{\mathbf {x}}}\)
$$\begin{aligned} Y({\hat{\mathbf {x}}})=\sum _{j=0}^{n-1}r_j\times \hat{c}_{j} \end{aligned}$$
(15.2)
It is interesting to make some observations about
\(Y({\hat{\mathbf {x}}})\). Since the summation can be carried out in any order
$$\begin{aligned} Y({\hat{\mathbf {x}}})=\sum _{j=0}^{n-1}r_{\eta _j}\times \hat{c}_{\eta _j} \end{aligned}$$
(15.3)
and
$$\begin{aligned} Y({\hat{\mathbf {x}}})=\sum _{j=0}^{k-1}r_{\eta _j}\times \hat{c}_{\eta _j} +\sum _{j=k}^{n-1}r_{\eta _j}\times \hat{c}_{\eta _j} \end{aligned}$$
(15.4)
Considering the first term
$$\begin{aligned} \sum _{j=0}^{k-1}r_{\eta _j}\times \hat{c}_{\eta _j}=\sum _{j=0}^{k-1}|r_{\eta _j} \vert \end{aligned}$$
(15.5)
This is because the sign of
\(\hat{c}_{\eta _j}\) equals the sign of
\(\hat{c}_{\eta _j}\) for
\(j<k\). Thus, this term is independent of the code and Eq. (
15.4) becomes
$$\begin{aligned} Y({\hat{\mathbf {x}}})=\sum _{j=0}^{k-1}|r_{\eta _j} \vert +\sum _{j=k}^{n-1}r_{\eta _j}\times \hat{c}_{\eta _j} \end{aligned}$$
(15.6)
Almost all of the
k largest received coordinates (all of the
k largest terms for an MDS code) are contained in the first term of Eq. (
15.6) and this ensures that the codeword
\({\hat{\mathbf {x}}}\), after mapping, has a high correlation with
\(\mathbf {r}\).
A binary, (hard decision), received vector \(\mathbf {b}\) may be derived from the received vector \(\mathbf {r}\) using the bitwise decision rule \(b_j=1\) if \( r_j < 0\), else \(b_j=0\) for \(j=0\) to \(n-1\). It should be noted that in general the binary vector \(\mathbf {b}\) is not a codeword.
It is useful to define a binary vector
\({\hat{\mathbf {z}}}\) as
$$\begin{aligned} {\hat{\mathbf {z}}}=\mathbf {b} \oplus {\hat{\mathbf {x}}} \end{aligned}$$
(15.7)
The maximum attainable correlation
\(Y_{max}\) is given by
$$\begin{aligned} Y_{max}= \sum _{j=0}^{n-1}|r_{\eta _j} \vert \end{aligned}$$
(15.8)
This correlation value occurs when there are no bit errors in transmission and provides an upper bound to the maximum achievable correlation for
\({\breve{\mathbf {x}}}\). The correlation
\(Y({\hat{\mathbf {x}}})\) may be expressed in terms of
\(Y_{max}\) and
\({\hat{\mathbf {x}}}\) for
$$\begin{aligned} Y({\hat{\mathbf {x}}})=Y_{max} -2\sum _{j=0}^{n-1}\hat{z}_{\eta _j} \times |r_{\eta _j} \vert \end{aligned}$$
(15.9)
equivalently,
$$\begin{aligned} Y({\hat{\mathbf {x}}})=Y_{max} -Y_{\varDelta }({\hat{\mathbf {x}}}), \end{aligned}$$
(15.10)
where
\(Y_{\varDelta }({\hat{\mathbf {x}}})\) is the shortfall from the maximum achievable correlation for the codeword
\({\hat{\mathbf {x}}}\) and is evidently
$$\begin{aligned} Y_{\varDelta }({\hat{\mathbf {x}}})=2\sum _{j=0}^{n-1}\hat{z}_{\eta _j} \times |r_{\eta _j} \vert \end{aligned}$$
(15.11)
Some observations may be made about the binary vector
\({\hat{\mathbf {z}}}\). The coordinates
\(\hat{z}_{\eta _j}\) for
\(j=0\) to
\((k-1)\) are always equal to zero. The maximum possible weight of
\({\hat{\mathbf {z}}}\) is thus
\(n-k\) and the average weight is
\(\frac{n-k}{2}\) at low
\(\frac{E_b}{N_o}\) values. At high
\(\frac{E_b}{N_o}\) values, the average weight of
\({\hat{\mathbf {z}}}\) is small because there is a high chance that
\({\hat{\mathbf {x}}}\) is equal to the transmitted codeword. It may be seen from Eq. (
15.11) that, in general, the lower the weight of
\({\hat{\mathbf {z}}}\) the smaller will be
\(Y_{\varDelta }({\hat{\mathbf {x}}})\) and the larger will be the correlation value
\(Y({\hat{\mathbf {x}}})\).
Since there is no guarantee that the codeword
\({\hat{\mathbf {x}}}\) is the transmitted codeword, the decoder has to evaluate additional codewords since one or more of these may produce a correlation higher than
\({\hat{\mathbf {x}}}\). There are
\(2^k-1\) other codewords which may be derived by considering all other
\(2^k-1\) sign combinations of
\(c_{\eta _j}\) for
\(j=0\) to
\(k-1\). For any of these codewords denoted as
\(\mathbf {c_i}\) the first term of the correlation given in Eq. (
15.6) is bound to be smaller since
$$\begin{aligned} \sum _{j=0}^{k-1}r_{\eta _j}\times c_{i,\eta _j} <\sum _{j=0}^{k-1}|r_{\eta _j} \vert \end{aligned}$$
(15.12)
This is because there has to be, by definition, at least one sign change of
\(c_{i,\eta _j}\) compared to
\(\hat{c}_{\eta _j}\) for
\(j=0\) to
\(k-1\). In order for
\(Y(\mathbf {x_i})\) to be larger than
\(Y({\hat{\mathbf {x}}})\) the second term of the correlation
\(\sum _{j=k}^{n-1}r_{\eta _j}\times c_{i,\eta _j} \) which uses the bits from the solved parity-check equations must be larger than
\(\sum _{j=k}^{n-1}r_{\eta _j}\times \hat{c}_{\eta _j}\) plus the negative contribution from the first term.
However, the first term has higher received magnitudes than the second term because the received coordinates are ordered. It follows that codewords likely to have a higher correlation than
\({\hat{\mathbf {x}}}\) will have small number of differences in the coordinates
\(x_{\eta _j}\) for
\(j=0\) to
\(k-1\). As the code is linear these differences will correspond to a codeword and codewords may be generated that have low weight in coordinates
\(x_{\eta _j}\) for
\(j=0\) to
\(k-1\). These codewords are represented as
\({\tilde{\mathbf {x}}}_{\mathbf {i}}\) and referred to as low information weight codewords since coordinates
\(x_{\eta _j}\) for
\(j=0\) to
\(k-1\) form an information set. Thus, codewords
\(\mathbf {{x_i}}\) are given by
$$\begin{aligned} \mathbf {{x_i}}={\hat{\mathbf {x}}} \oplus {\tilde{\mathbf {x}}}_{\mathbf {i}} \end{aligned}$$
(15.13)
and
\({\tilde{\mathbf {x}}}_{\mathbf {i}}\) are codewords chosen to have increasing weight in coordinates
\(x_{\eta _j}\) for
\(j=0\) to
\(k-1\) as
i is incremented. This means that for increasing
i it will become less likely that a codeword will be found that has higher correlation than the correlation of a codeword already found.
The difference in the correlation value
\(Y_{\varDelta }(\mathbf {x_i})\) as a function of
\({\tilde{\mathbf {x}}}_{\mathbf {i}}\) may be derived. Firstly, the binary vector
\(\mathbf {{z_i}}\) is given by
$$\begin{aligned} \mathbf {{z_i}}=\mathbf {b} \oplus {\hat{\mathbf {x}}} \oplus {\tilde{\mathbf {x}}}_{\mathbf {i}} \end{aligned}$$
(15.14)
which may be simplified to
$$\begin{aligned} \mathbf {{z_i}}= {\hat{\mathbf {z}}} \oplus {\tilde{\mathbf {x}}}_{\mathbf {i}} \end{aligned}$$
(15.15)
The cross correlation
\(Y(\mathbf {x_i})\) is given by
$$\begin{aligned} Y(\mathbf {x_i})=Y_{max} -2\sum _{j=0}^{n-1}z_{i,\, \eta _j} \times |r_{\eta _j} \vert \end{aligned}$$
(15.16)
equivalently
$$\begin{aligned} Y(\mathbf {x_i})=Y_{max} -Y_{\varDelta }(\mathbf {x_i}) \end{aligned}$$
(15.17)
The shortfall from maximum correlation,
\(Y_{\varDelta }(\mathbf {x_i})\), is evidently
$$\begin{aligned} Y_{\varDelta }(\mathbf {x_i})=2\sum _{j=0}^{n-1}z_{i,\,\eta _j} \times |r_{\eta _j} \vert \end{aligned}$$
(15.18)
Substituting for
\(\mathbf {{z_i}}\) gives
\(Y_{\varDelta }(\mathbf {x_i})\) as a function of
\({\tilde{\mathbf {x}}}_{\mathbf {i}}\).
$$\begin{aligned} Y_{\varDelta }(\mathbf {x_i})=2\sum _{j=0}^{n-1}(\hat{z}_j \oplus \tilde{x}_{i \eta _j}) \times |r_{\eta _j} \vert \end{aligned}$$
(15.19)
It is apparent that instead of the decoder determining
\(Y(\mathbf {x_i})\) for each codeword,
\(\mathbf {x_i}\), it is sufficient for the decoder to determine
\(Y_{\varDelta }(\mathbf {x_i})\) for each codeword
\({\tilde{\mathbf {x}}}_{\mathbf {i}}\) and compare the value with the smallest value obtained so far, denoted as
\(Y_{\varDelta }(\mathbf {x_{min}})\), starting with
\(Y_{\varDelta }({\hat{\mathbf {x}}})\):
$$\begin{aligned} Y_{\varDelta }(\mathbf {x_{min}})=min\big (Y_{\varDelta }(\mathbf {x})\big ) \end{aligned}$$
(15.20)
Thus it is more efficient for the decoder to compute the correlation (partial sum) of the
\({\tilde{\mathbf {x}}}_{\mathbf {i}}\) instead of deriving
\(({\hat{\mathbf {x}}}\oplus {\tilde{\mathbf {x}}}_{\mathbf {i}})\) by solving
\(\mathbf {H_I}\) and computing the squared Euclidean distance. Since codewords
\({\tilde{\mathbf {x}}}_{\mathbf {i}}\) produce low weight in
\(\mathbf {{z_i}}\), the number of non-zero terms that need to be evaluated in Eq. (
15.18) is typically
\(\frac{n-k}{2}\) rather than the
\(\frac{n}{2}\) terms of Eq. (
15.1) which makes for an efficient, fast decoder. Before Eq. (
15.19) is evaluated, the Hamming weight of
\(\mathbf {{z_i}}\) may be compared to a threshold and the correlation stage bypassed if the Hamming weight of
\(\mathbf {{z_i}}\) is high. There is an associated performance loss and results are presented in Sect.
15.4.
The maximum information weight
\(w_{inf \, max}\) necessary to achieve maximum likelihood decoding may be upper bounded from
\(Y_{\varDelta }({\hat{\mathbf {x}}})\) and
\(|r_{\eta _{j}} \vert \) initially, updated by
\(Y_{\varDelta }(\mathbf {{x_{min}}})\) as decoding progresses, since
$$\begin{aligned} Y_{\varDelta }(\mathbf {x_i}) \ge \sum _{m=0}^{w_{inf}}|r_{\eta _{k-m-1}} \vert \end{aligned}$$
(15.21)
This is reasonably tight since there is a possibility of at least one codeword with information weight
\(w_{inf \, max}\), for which all of the coordinates of the binary vector
\(\mathbf {z_i}\) corresponding to the parity bits of
\({\tilde{\mathbf {x}}}_{\mathbf {i}}\) are zero. Correspondingly,
\(w_{inf \, max}\) is the smallest integer such that
$$\begin{aligned} \sum _{m=0}^{w_{inf \, max}}|r_{\eta _{k-m-1}} \vert \ge Y_{\varDelta }({\hat{\mathbf {x}}}) \end{aligned}$$
(15.22)
The codewords
\({\tilde{\mathbf {x}}}_{\mathbf {i}}\) may be most efficiently derived from the
\(\mathbf {G}\) matrix corresponding to the solved
\(\mathbf {H}\) matrix because the maximum information weight given by Eq. (
15.22) turns out to be small. Each row,
i, of the solved
\(\mathbf {G}\) matrix is derived by setting
\(x_{\eta _{j}}=0\) for
\(j=0\) to
\(k-1, j{\ne }i\), and using the solved parity-check equations to determine
\(x_{\eta _{j}}\) for
\(j=k\) to
\(n-1\). The maximum number of rows of the
\(\mathbf {G}\) matrix that need to be combined to produce
\({\tilde{\mathbf {x}}}_{\mathbf {i}}\) is
\(w_{inf \,max}\).