Published in:

Open Access 2017 | OriginalPaper | Chapter

18. Password Correction and Confidential Information Access System

Authors : Martin Tomlinson, Cen Jung Tjhai, Marcel A. Ambroze, Mohammed Ahmed, Mubarak Jibril

Published in: Error-Correction Coding and Decoding

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

This chapter considers the use of Reed–Solomon codes to correct user mistakes or missing parts of long entered passwords in an encrypted, information retrieval system or a password-based authentication system. Dynamic, user-specific mapping of Galois field elements is used to ensure that passwords, arbitrarily chosen by the user, are valid codewords. A typical system is described based on GF(256) and the ANSI character set with worked examples given for encoding and password correction. Security is also enhanced by the user-specific Galois field symbol mapping because this defeats Rainbow tables when used with long passwords.

18.1 Introduction and Background

Following the trend of an increasing need for security and protection of confidential information, personal access codes and passwords are increasing in length with the result that they are becoming more difficult to remember correctly. The system described in this chapter provides a solution to this problem by correcting small errors in an entered password without compromising the security of the access system. Moreover, additional levels of security are provided by the system by associating passwords with codewords of an error-correcting code and using a dynamic, user-specific, mapping of Galois field symbols. This defeats password attacking systems consisting of Rainbow tables because each user transmits what appears to be a random byte stream as a password. A description of this system was first published by the authors as a UK patent application in 2007 [1].

The system is a method for the encoding and decoding of passwords and the encoding and decoding of confidential information which is accessed by use of these passwords. Passwords may be composed of numbers and alphanumeric characters and easily remembered names, phrases or notable words are the most convenient from the point of view of users of the system.

Passwords are associated with the codewords of an error-correcting code and consequently any small number of errors of an entered password may be automatically corrected. Several additional parity symbols may be calculated by the system to extend the password length prior to hashing so as to overwhelm any attacks based on Rainbow tables. Dynamic mapping of code symbols is used to ensure a password when first registered by the user is a codeword of the error-correcting code. In this process sometimes a password, due to symbol contradictions, cannot be a codeword and an alternative word or phrase, which is a codeword, is offered to the user by the system. Alternatively, the user may elect to register a different password.

Feedback can be provided to users by the system of the number of errors corrected for each user, re-entered password. Valid passwords are associated with a subset of the totality of all the codewords of the error-correcting code and an entered password, may be confirmed to the user, as a valid password, or not.

Confidential information, for example Personal Identification Numbers (PIN)s, bank account numbers, safe combinations, or more general confidential messages, are encrypted at source and stored as a sequence of encrypted messages. Each encrypted message is uniquely associated with a cryptographic hash of a valid codeword. Retrieval of the confidential information is achieved by the user entering a password which is equal to the corresponding valid password or differs from a valid password in a small number of character positions. Any small number of errors are corrected automatically and feedback is provided to the user that a valid password has been decoded. The valid password is mapped to a single codeword from a very large number of codewords that comprise the error-correcting code.

On receiving a valid hash, the cloud sends the stored encrypted message that corresponds to the valid codeword. The encryption key may be derived from the reconstituted password in conjunction of other user entered credentials, such as a fingerprint. The retrieved encrypted message is decrypted and the confidential information displayed to the user.

Security is provided by the system at a number of different levels. Codewords of the error-correcting code are composed of a sequence of symbols with each symbol taken from a set of Galois Field (GF) elements. Any size of Galois Field may be used provided the number of GF elements is greater or equal to the alphabet of the language used to construct passwords. The mapping of alphabet characters to GF elements may be defined uniquely by each user and consequently there are at least q! possible mappings.

The number of possible codewords of the error-correcting code is extremely large and typically there can be $10^{500}$ possible codewords. The number of valid codewords in the subset of codewords is typically less than $10^2$ and so the brute force chance of a randomly selected codeword being a valid codeword is $\frac{1}{10^{500}}$. Even if an attacker, or an eavesdropper enters a valid codeword, the information that is obtained is encrypted and the confidential information cannot be retrieved without the encryption key, which requires the user’s credentials.

One possible application of the system is as an information retrieval app on a smartphone with encrypted information stored in the cloud. For each registered user a cloud-based message server has stored a list of cryptographic hashes of valid codewords of the error-correcting code and an associated list of encrypted messages or files. The mapping of password characters to codeword GF symbols is carried out within the user’s smartphone and is not able to be easily accessed by an eavesdropper or an attacker unless the smartphone is stolen along with user login credentials. Additionally, the decryption of received encrypted messages is also carried out within the user’s mobile phone. To access a long, hard to remember PIN or a long sequence of cryptic characters, the user can enter the password, which is mapped to a GF symbol stream, which is automatically corrected by the smartphone before cryptographic hashing. The hash is encrypted, using a random session key exchanged using public key cryptography, before being transmitted by the smartphone to the cloud. This is to prevent replay attacks. If the codeword hash is correct, the cloud transmits the corresponding encrypted message or file, together with an acknowledgement. The user’s smartphone receives the cipher text and decrypts it into the requested information.

18.2 Details of the Password System

A block diagram of the system showing how user defined passwords are mapped to sequences of GF symbols, encoded into codewords of an error-correcting code and associated with encrypted confidential information is shown in Fig. 18.1.

We consider as an example, a system using passwords consisting of sequences of up to 256 characters long with characters taken from the ANSI (American National Standards Institute) single byte character set and an error-correcting code which is a Reed–Solomon (RS) error-correcting code [2] described in Chaps. 7 and 11. RS codes are MDS codes, constructed from GF(q) field elements. For the finite field case, q is a prime or a power of a prime. In this example $q=2^8$ and RS codewords are constructed as sequences of GF(256) symbols. Codewords can be designed to be any length up to $q+1$ symbols long if the doubly extended version of the RS code is utilised.

In general, any character set may be used in the system and any RS code may be used provided the sequence of characters is less than or equal to the length of the error-correcting code and each symbol of the error-correcting code is from an alphabet size equal or greater than the alphabet size of the character set used to define passwords. For maximum security, the mapping is chosen by a cryptographic random number generator, with a seed provided by the user so that there is a high probability that the resulting mapping table is unique to each user of the information retrieval system.

It is convenient to use a binary base field and the Galois Field [3], GF(256), that is used is an extension field consisting of 8 binary GF(2) field elements, generated by residues of $\alpha ^n,\ n=0\ \text{ to }\;255$ modulo $1+x^2+x^3+x^4+x^8$, where $1+x^2+x^3+x^4+x^8$ is an example of a primitive polynomial, plus the zero symbol GF(0).

As an example, the registered password “silver” is considered, whose corresponding sequence of ANSI numbers is

$$\begin{aligned} 115\quad 105\quad 108\quad 118\quad 101\quad 114 \end{aligned}$$

As shown in Fig. 18.1, a mapping table is used to map these numbers to GF(256) symbols. In this example, the error-correcting code that is used is the (256, 254, 3) extended RS code which is capable of correcting either two erased symbols or one erroneous symbol, and the code has 254 information symbols and 2 parity-check symbols. The first two symbols are chosen as parity-check symbols and denoted as $p_1$ and $p_2$, respectively. Putting the parity symbols first is convenient because short codewords can easily by accommodated by assuming any unused information symbols have value zero and therefore do not affect the parity symbols. A general codeword of this code as an extended GF(256) RS code is

$$\begin{aligned} p_1\quad p_2\quad x_1\quad x_2\quad x_3\quad x_4\quad \ldots \quad x_{254} \end{aligned}$$

The general parity-check matrix of an extended RS code with $n-k$ parity-check symbols is

$$\begin{aligned} \mathbf {H} =&\left[ \begin{array}{ccccccc} &{} 1 &{} 1 &{} 1 &{} \ldots &{} 1 &{}1\\ &{} 1 &{} \alpha ^1 &{} \alpha ^2 &{} \ldots &{} \alpha ^{n-1}&{}0 \\ &{} 1 &{} \alpha ^2 &{} \alpha ^4 &{} \ldots &{} \alpha ^{2(n-1)}&{}0 \\ &{} 1 &{} \alpha ^3 &{} \alpha ^6 &{} \ldots &{} \alpha ^{3(n-1)}&{}0 \\ &{} \ldots &{}\ldots &{}\ldots &{}\ldots &{}\ldots &{}\\ &{} 1 &{} \alpha ^{n-k-1} &{} \alpha ^{2(n-k-1)} &{} \ldots &{} \alpha ^{{n-k-1}(n-1)}&{}0 \\ \end{array} \right] \end{aligned}$$

To provide more flexibility in symbol mapping, as described below, the generalised extended RS code may be used with parity-check matrix $\mathbf {H_{\eta }}$.

$$\begin{aligned} \mathbf {H_{\eta }} =&\left[ \begin{array}{ll@{}lllll} &{} \eta _0 &{} \eta _1 &{} \eta _2 &{} \ldots &{} \eta _{n-1} &{}\eta _{n}\\ &{} \eta _0 &{} \eta _1 \alpha ^1 &{} \eta _2 \alpha ^2 &{} \ldots &{} \eta _{n-1} \alpha ^{n-1}&{}0 \\ &{} \eta _0 &{} \eta _1 \alpha ^2 &{} \eta _2 \alpha ^4 &{} \ldots &{} \eta _{n-1} \alpha ^{2(n-1)}&{}0 \\ &{} \eta _0 &{} \eta _1 \alpha ^3 &{} \eta _2 \alpha ^6 &{} \ldots &{} \eta _{n-1} \alpha ^{3(n-1)}&{}0 \\ &{} \ldots &{}\ldots &{}\ldots &{}\ldots &{} \quad \ldots &{} \ldots \\ &{} \eta _0 &{} \eta _1 \alpha ^{n-k-1} &{} \eta _2 \alpha ^{2(n-k-1)} &{} \ldots &{} \eta _{n-1} \alpha ^{{n-k-1}(n-1)}&{}0 \\ \end{array} \right] \end{aligned}$$

The constants $\eta _1,\ \eta _2, \ \eta _3, \ \ldots \ \eta _n $ may be arbitrarily chosen provided they are non-zero symbols of GF(q).

With two parity-check symbols, only the first two rows of $\mathbf {H}$ are needed and we may conveniently place the last column first to obtain the reduced echelon parity-check matrix $\mathbf {H_2}$

$$\begin{aligned} \mathbf {H_2} =&\left[ \begin{array}{ccccccc} &{}1&{} 1 &{} 1 &{} 1 &{} \ldots &{} 1 \\ &{} 0 &{}1&{} \alpha ^1 &{} \alpha ^2 &{} \ldots &{} \alpha ^{n-1} \\ \end{array} \right] \end{aligned}$$

Any pseudo random, one to one, mapping of ANSI numbers to GF(256) symbols may be used. It is convenient to map always the null character, ANSI number = 32, to the field element GF(0) otherwise each password would consist of 256 characters and 256 password characters would have to be entered for each password. With the null character mapping, a shortened RS codeword is equal to the full length codeword since any of the GF(0) symbols may be deleted without affecting the parity-check symbols. Consequently, short passwords may be accommodated very easily.

It is possible to choose a fixed one to one mapping of ANSI numbers to GF(256) symbols and make this equal for all users but in this case many passwords on first registering would fail as non-valid passwords, unless arbitrarily assigned characters are allowed in the parity symbol positions. However, this is an unnecessary constraint on the system since codewords and passwords of different users are processed independently from each other. Moreover, security is enhanced if each user uses a different mapping.

In the following example, dynamic mapping is used and the mapping chosen is such that the information symbols of the RS codeword corresponding to “silver” are equal to a primitive root $\alpha $ raised to the power corresponding to the ANSI number of the character of the password in the same respective position as the codeword, except for the null character which is set to GF(0). As the codeword has parity symbols in the first two positions, and these symbols are a function of the other symbols in the codeword, (the information symbols), the mapping of the first two characters needs to be different. Accordingly, the codeword is

$$\begin{aligned} p_1\quad p_2\quad \alpha ^{108}\quad \alpha ^{118}\quad \alpha ^{101}\quad \alpha ^{114}\quad 0\quad \ldots \quad 0 \end{aligned}$$

From the parity-check matrix $\mathbf {H_2}$, the parity-check symbols are given by

$$\begin{aligned} p_2=\sum _{i=1}^{254}\alpha ^i x_i \end{aligned}$$

(18.1)

$$\begin{aligned} p_1=\sum _{i=1}^{254}x_i+p_2 \end{aligned}$$

(18.2)

After substituting into Eq. (18.2) and then Eq. (18.1), it is found that $p_1=\alpha ^{220}$ and $p_2=\alpha ^{57}$ and the complete RS codeword corresponding to the defined password “silver” is

$$\begin{aligned} \alpha ^{220}\quad \alpha ^{57}\quad \alpha ^{108}\quad \alpha ^{118}\quad \alpha ^{101}\quad \alpha ^{114}\quad 0\quad \ldots \quad 0 \end{aligned}$$

The RS codeword encoder is shown in Fig. 18.1 and uses the mapping of defined password characters to GF(256) symbols as input and outputs to the mapping table, as shown in Fig. 18.1, the mapping of the parity symbols. Accordingly, the mapping of the first two characters of the password is that ANSI number 115 is mapped to $\alpha ^{220}$ and ANSI number 220 is mapped to $\alpha ^{115}$ and the mapping table is updated accordingly, as indicated in Fig. 18.1 by the two directional vectors. Of course in order for these mappings to be valid, neither ANSI number 115, nor ANSI number 220, nor GF(256) symbols $\alpha ^{220}$, nor $\alpha ^{115}$ must be already mapped otherwise the mapping of prior defined passwords will be affected.

As new passwords are defined and new valid codewords calculated, it is relatively straightforward to amend the list of assigned and unassigned mappings of the mapping table in a dynamic manner. This dynamic mapping assignment is a feature in that it not only increases the range of possible passwords but has a secondary advantage. This secondary advantage arises from the mapping of entered passwords and the subsequent error-correction decoding. Any entered password character not having an assigned ANSI number cannot be part of a valid password and accordingly the corresponding GF(256) symbol is marked as an erased symbol. Since on average, twice as many erased characters can be corrected by an error-correcting code compared to the number of correctable erroneous characters, a distinct advantage arises.

The confidential information corresponding to “silver” is, for example:

“The safe combination is 29 53 77 22” and as shown in Fig. 18.1 confidential information input to the system is encrypted using an encryption key. The encryption key is usually chosen from a master encryption key, unique to each user. Once input, the confidential information is only retained in encrypted form. The encrypted confidential information associated with “silver” forms the encrypted text:

As shown in Fig. 18.2, in order to retrieve this confidential information, the user re-enters their password. However, this entered password is allowed to contain errors. For example the password “solver” may be entered and has the corresponding ANSI number sequence:

$$\begin{aligned} 115 \quad 111 \quad 108\quad 118\quad 101\quad 114\quad \ldots \quad 32\quad 32\quad 32 \end{aligned}$$

Following input of the password, as shown in Fig. 18.2, the mapping table is used to map the entered password into the GF(256) sequence

$$\begin{aligned} \alpha ^{220}\quad \alpha ^{88}\quad \alpha ^{108}\quad \alpha ^{118}\quad \alpha ^{101}\quad \alpha ^{114}\quad 0\ldots \quad 0 \end{aligned}$$

The decoder for the RS code, as shown in Fig. 18.2, decodes the sequence of GF(256) symbols, resulting from the mapping using the mapping table, into a codeword of the error-correcting code. In order to carry this out, two syndromes, $s_1$ and $s_2$, are calculated from the two parity-check equations for the extended (256, 254, 3) RS code:

$$\begin{aligned} s_1=\sum _{i=1}^{254}x_i+p_2+p_1 \end{aligned}$$

(18.3)

$$\begin{aligned} s_2=\sum _{i=1}^{254}\alpha ^i x_i+p_2 \end{aligned}$$

(18.4)

The two syndromes, in this case, are found both to be equal to $\alpha ^{43}$ indicating there is an error in the second information symbol position of the codeword and this error is $\alpha ^{43}$. If the same error had been in the third symbol position, say, the two syndromes would have been equal to $\alpha ^{43}$ and $\alpha ^{44}$.

Subtracting the error, $\alpha ^{43}$, from the entered symbol (after mapping) of $\alpha ^{88}$ produces the correct GF(256) symbol $\alpha ^{57}$. The codeword of the RS code is thus corrected to be

$$\begin{aligned} \alpha ^{220}\quad \alpha ^{88}\quad \alpha ^{108}\quad \alpha ^{118}\quad \alpha ^{101}\quad \alpha ^{114}\quad 0\ldots \quad 0 \end{aligned}$$

Applying the inverse mapping to each GF(256) symbol produces the ANSI number sequence 115 105 108 118 101 114 32 32 32 32...32

$$\begin{aligned} 115 \quad 105 \quad 108 \quad 118 \quad 101 \quad 114 \quad 32 \quad 32 \quad 32 \quad 32\ldots \quad 32 \end{aligned}$$

This corresponds to “silver”, the corrected entered password.

As shown in Fig. 18.2, the decoded codeword is compared to the list of valid codewords of the error-correcting code. The codewords of the error-correcting code are split into two groups, the valid codewords and rest of the codewords, invalid codewords. The codeword

$$\begin{aligned} \alpha ^{220}\quad \alpha ^{88}\quad \alpha ^{108}\quad \alpha ^{118}\quad \alpha ^{101}\quad \alpha ^{114}\quad 0\ldots \quad 0 \end{aligned}$$

is verified as a valid codeword associated with the encrypted confidential information: As shown in Fig. 18.2, this is decrypted, using the encryption key and the confidential information is output: “The safe combination is 29 53 77 22”

In a further extension of the system, as shown in Fig. 18.3, the encoded RS codeword, denoted as $\mathbf {c_x}$ which results from the mapped, defined password is convolved with a fixed RS codeword denoted as

$$\begin{aligned} \mathbf {y_x}=\alpha ^{y_0}+\alpha ^{y_1}x+\alpha ^{y_2}x^2+\alpha ^{y_3}x^3+\cdots \alpha ^{y_{255}}x^{254} \end{aligned}$$

Note that the standard polynomial-based RS codes of length $q-1$ are used in this system variation. The fixed RS codeword is the result of encoding a random set of GF(256) information symbols. The reason for doing this is to ensure that the resulting codeword after convolution, $\mathbf {r_x}$

$$\begin{aligned} \mathbf {r_x}=\mathbf {c_x}.\mathbf {y_x}\text { modulo } 1+x^{255} \end{aligned}$$

(18.5)

does not have a long sequence of GF(0) symbols which may compromise the security of the information retrieval system. Correspondingly, it is the codeword $\mathbf {r_x}$ which is associated with the encrypted message. As shown in Fig. 18.4, retrieval of encrypted information is carried out by entering a password. After the decoding of the sequence of GF(256) symbols resulting from the mapping of the entered password, using the mapping table, into a codeword of the error-correcting code, this codeword is convolved with the fixed codeword as shown in Fig. 18.4. The resulting codeword is compared to the list of valid codewords of the error-correcting code.

One feature, particularly with long passwords hard to remember, is that a partially known password may be entered, deliberately using characters known not to be contained in the mapping table, in order for the system to fill in the missing parts of the password. Characters may be reserved for this purpose. As a simple example, the password may be entered “si**er” where it is known that the character * will not be contained in the mapping table, because the character * had been previously defined as a reserved character. The corresponding codeword is

$$\begin{aligned} \alpha ^{220}\quad \alpha ^{88}\quad erase_1\quad erase_2 \quad \alpha ^{101}\quad \alpha ^{114}\quad 0\ldots \quad 0 \end{aligned}$$

where $erase_1$ and $erase_2$ represent erased (unknown) GF(256) symbols. The decoder for the RS (256,254,3) error-correcting code may be used to solve straightforwardly for these erased symbols. The first step is to produce a reduced echelon parity-check matrix with zeros in the columns corresponding to the positions of the erased symbols, bar one. The procedure is described in detail in Chap. 11.

For two erasures the procedure is trivial and the reduced echelon parity-check matrix $\mathbf {H_e}$ is

$$\begin{aligned} \mathbf {H_e} =&\left[ \begin{array}{ccccccc} &{}1&{} 1 &{} 1 &{} 1 &{} \ldots &{} 1 \\ &{} \alpha ^1 &{}(1+\alpha ^1)&{} 0 &{} (\alpha ^2+\alpha ^1) &{} \ldots &{} (\alpha ^{n-1}+\alpha ^1) \\ \end{array} \right] \end{aligned}$$

Now, the erased symbol $erase_2$ may be solved directly using the second row of $\mathbf {H_e}$

$$\begin{aligned} \alpha ^{220}\alpha ^1+\alpha ^{88}(1+\alpha ^1)+ erase_2(\alpha ^2+\alpha ^1)+ \alpha ^{101}(\alpha ^3+\alpha ^1)+\alpha ^{114}(\alpha ^3+\alpha ^1)=0 \end{aligned}$$

and

$$\begin{aligned} erase_2=(\alpha ^2+\alpha ^1)^{q-2} \alpha ^{221}+\alpha ^{88}+\alpha ^{89}+ \alpha ^{104}+ \alpha ^{102}+\alpha ^{117}+\alpha ^{115}=\alpha ^{118} \end{aligned}$$

Using the first row of $\mathbf {H_e}$, $erase_1$ can now be solved

$$\begin{aligned} \alpha ^{220}+\alpha ^{88}+ \alpha ^{118}+ erase_1+\alpha ^{101}+\alpha ^{114}=0 \end{aligned}$$

and

$$\begin{aligned} erase_1=\alpha ^{220}+\alpha ^{88}+ \alpha ^{118}+ \alpha ^{101}+\alpha ^{114}=\alpha ^{108} \end{aligned}$$

With the reverse mapping the complete password is reconstituted allowing the encrypted information to be retrieved as before.

The advantage of defining erasures is that each unknown symbol may be solved for each parity symbol in the RS codeword. Having a relatively large number of parity symbols allows several parts of the entered password to be filled in automatically. Obviously security is compromised if this procedure is used to extreme.

18.3 Summary

This chapter has described the use of Reed–Solomon codes to correct user mistakes or missing parts of long entered passwords. The system is ideally suited to a smartphone-based encrypted, information retrieval system or a password-based authentication system. Dynamic, user-specific mapping of Galois field elements is used to ensure that passwords, arbitrarily chosen by the user, are valid codewords. A typical system is described based on GF(256) and the ANSI character set with worked examples given. Security is also enhanced by having, user-specific, Galois field symbol mapping because, with long passwords, this defeats Rainbow tables.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the book’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Springer Professional

Abstract

18.1 Introduction and Background

18.2 Details of the Password System

18.3 Summary