Cyclic DNA codes over the ring based on the deletion distance☆
Introduction
DNA plays an important role as the carrier of genetic information in all living species. Blueprints of life in all species are stored in the DNA molecules. DNA sequences consist of four bases (nucleotides)-two purines, adenine (A), and guanine (G), and two pyrimidines, thymine (T) and cytosine (C). A single DNA strand is a quaternary sequence of the letters A, C, G, and T. The ends of a single DNA strand are chemically polar with the and the ends which implies that the strands of DNA are oriented. In nature a DNA molecule is a long string composed of two strands wound around each other (double helix). Each purine base is the Watson–Crick complement (WCC) of a unique pyrimidine base and vice versa. The WCC of A is T and vice versa and the WCC of C is G and vice versa. We describe this as , , and . A single DNA strand can pair with its WCC or its reverse complement strand where is the WCC of for all .
For example, the WCC of is
The chemical ties between the WCC pairs are different. A and T require two hydrogen bonds, while C and G need three hydrogen bonds. When a collection of double strands DNA are mixed and heated, the double strands disassociate into single strands through a process called melting. When a set of complementary strands are cooled, then each strand finds its WCC to form the DNA double helix structure through a process called hybridization.
Biomolecular computation is a newly developed technology that aims to use the computational power of molecules for information processing. Many computational models have been proposed and used to solve some mathematical problems. Adelman [5] described an experiment involving the use of DNA molecules to solve a seven node instance of the directed salesman combinatorial problem. In Adelman's experiment, the computation hinges upon a correct binding between DNA sequences and their WCC. In [6], Boneh et al., and independently Adelman et al. in [4] presented a molecular program for breaking the Data Encryption (DES) cryptographic system. Another important application of biomolecular computing is the design of DNA chips for mutational analysis and for sequences. In [14], it is showed that DNA molecules can be used as a storage media. They showed that the double helix of DNA has a period of 3.4 nm, corresponding to about 10 base pairs or 20 bits per period (each base represent two bits of information). Therefore the linear density of a storage along a DNA molecule is 1 bit per 017 nm, or 149 Mbits/in, which outperforms other electronic, magnetic, and optical technologies.
In all of the above applications the design of DNA strands that is suitable for such applications is an important problem. For example, DNA strands are designed so that each strand uniquely hybridizes with its WCC sequence and not to any other sequence. Another important criterion in designing DNA sequences is that codewords should not form secondary structures that cause these strands to be inactive. A secondary structure is formed when a sequence fold back onto itself by complementary base hybridization. It was reported in [14] that 30% of read-out attempts in a DNA storage system failed due to the formation of special type of secondary structures.
Most of previous work on designing of DNA strands focuses on constructing large sets of DNA codewords with prescribed minimum Hamming distance [2], [10], [9], [12]. Recently D’yachkov et al. in [7], [8] considered another distance measure called the deletion similarity that seems to be more suitable than the Hamming distance measure and slightly different than the Levenshtein metric [13]. For two quaternary n-sequences X and Y the energy of DNA hybridization measured by the longest common subsequence (not necessarily contiguous) of either strand or the reverse complement of the other strand. For that, we define the deletion similarity between X and Y (denoted by ) to be the longest common length of a sequence occurring as a (not necessary contiguous) subsequence of both sequences X and Y. Note that for any strands X and Y of length n we haveThe deletion similarity identifies the number of base pair bonds between X and , i.e., we have Example 1 If and then since is the longest common subsequence for both X and Y. Note that is not unique since is another subsequence of length 3.
In this paper we use the structure of cyclic codes over rings of four elements to design cyclic DNA codes based on the deletion similarity S that are suitable for biomolecular computation. For the last 20 years, cyclic codes over rings played a very important role in the area of error correcting codes [3], [11], [17], [18]. Many good binary codes were constructed and their structures were studied as Gray images of linear codes over rings of four elements [11], [18]. Our objective in this paper is to design cyclic DNA codes over the ring where . Our interest and motivation of designing cyclic DNA codes over the ring R comes from the following facts:
- 1.
The ring R is in a one to one correspondence with the DNA bases A, C, G, and T where we have , , , .
- 2.
Since the field is a subring of the ring , factorization of the polynomial over will be still valid over the ring R. This property does not hold over the ring or over the field . This implies that factoring over the ring R will be less of complexity than factoring over or .
- 3.
As will be seen later in the paper cyclic DNA codes depend on the existence of self-reciprocal polynomials that are divisors of . It is easy to see that any self-reciprocal divisor of over is also a self-reciprocal over R. The converse is not true in general. This implies that the number of cyclic DNA codes over R is at least equal to cyclic DNA codes over .
- 4.
The structure of the ring R among other rings with four elements makes it easier to study some properties of cyclic DNA codes such as the -content of the code. As will be seen in the paper and due to the algebra over the ring R, all possible spectra of the -content of a code C are determined by the Hamming weight enumerator of a subcode of C.
- 5.
The complexity of testing the secondary structure for a cyclic codes is less than other type of codes. In fact it has been shown in [16] that the overall complexity of computing the free-energy table (in the Nussinov–Jacobson algorithm) of a cyclic code is compared with for non-cyclic codes.
- 6.
As in the case of testing for secondary structure, it can be easily shown that the complexity of the dynamic programming algorithm to find the longest common subsequence between any two codewords in a cyclic code will be less than that of any other code.
The rest of the paper is organized as follows. Section 2 includes a description and the definition of cyclic DNA codes over the ring R. In Section 3, we find a unique set of generators for these codes as ideals in the ring . We show that generators of cyclic DNA codes are related to self-reciprocal polynomials that are divisors of . In Section 4, we study the -content and the deletion distance of a cyclic DNA code. Section 5 includes examples of cyclic DNA codes of different lengths with their deletion distance and their -content. Section 6 concludes the paper.
Section snippets
Cyclic DNA codes over the ring R
Let be the binary finite field. We consider the ring where . The finite field is considered as a subring of the ring R. So factorization of over will be still valid over the ring R. This property does not hold for the ring because is not a subring of . The letters A, C, G, and T map to , and , respectively. This mapping gives the following WCC pairing:A cyclic code C of length n over R is a
Generators of cyclic DNA codes
The elements of the ring R can be mapped to the elements of via the map whereLet C be a cyclic code in . We can extend the map to a map satisfies the conditions of a ring homomorphism with Let . It is easy to check that J is an ideal in and hence a cyclic code in . So where
The CG-content and the deletion distance D
Since the chemical bonds between the WCC pairs are different where A and T require two hydrogen bonds, while C and G need three hydrogen bonds, the total energy of the DNA molecule depends on the number of A and T pairs and the number of C and G pairs. For example stable, high energy duplexes hybridize at higher temperature than unstable, low energy duplexes. Because of this, it is always desirable in a DNA code to have all codewords with the same -content, so that they have similar melting
Example
In this section, we construct a concrete example illustrating the findings of the previous sections. Example 14 Let . Let where and be a cyclic code of length 6 and type . This code has 16 codewords. These codewords are given in Table 1. Here, for all . Further, for all and . Since , and , we have . So, this is a (6,1) DNA cyclic code.
Conclusion
In this paper cyclic DNA codes over the ring based on the deletion similarity distance have been studied. Our work of cyclic DNA codes over R was motivated by the simple and less complexity structure of these codes over R. We found a unique set of generators of these codes as ideals in the ring . We also studied the -content of these codes and their deletion distance. We showed that the -content and the deletion distance of a code C are related to a
Acknowledgment
The authors are thankful for reviewer's valuable remarks.
References (17)
- et al.
Construction of cyclic codes over for DNA computing
Journal of The Franklin Institute
(2006) - et al.
Linear construction for DNA codes
Theoretical Computer Science
(2005) - et al.
On the generators of codes of length
IEEE Transactions on Information Theory
(2003) - et al.
Cyclic codes over and
Designs, Codes and Cryptography
(2007) - et al.
On applying molecular computation to the Data Encryption Standard
Journal of Computational Biology
(1999) Molecular computation of solutions to combinatorial problems
Science
(1994)- D. Boneh, C. Dunworth, R. Lipton, Breaking DES using molecular computer, Princeton CS Tech-Report, Number CS-TR-489-95,...
- A. D’yachkov, A. Macula, T. Renz, P. Vilenkin, I. Ismagilov, New results on DNA codes, in: International Symposium on...
Cited by (47)
DNA codes over finite local Frobenius non-chain rings of length 4
2021, Discrete MathematicsReversible complement cyclic codes over Galois rings with application to DNA codes
2020, Discrete Applied MathematicsCitation Excerpt :Prior to our work, complement of an element in a ring has been defined in different ways by various authors. For reference see [1,3–5,7,8,10,11,16,19–22,24,25,30]. The following result is due to Kaur et al. [13].
Construction of cyclic DNA codes over the ring Z <inf>4</inf> [u]/〈u <sup>2</sup> −1〉 based on the deletion distance
2019, Theoretical Computer ScienceCitation Excerpt :[21] We get the following theorem from [21]. In this section, we give some examples of cyclic codes of different lengths over the ring R to illustrate the above results.
Constructing Double Cyclic Codes over F<inf>2</inf> + uF<inf>2</inf> for DNA Codes
2023, Journal of Computational BiologyReversible complement cyclic codes over Z<inf>4</inf>+ u Z<inf>4</inf>+ v Z <inf>4</inf>for DNA computing
2023, Discrete Mathematics, Algorithms and Applications
- ☆
The work of T. Abualrub and A. Ghrayeb was supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) while the first author was on sabbatical leave in the Department of Electrical and Computer Engineering, Concordia University, Montreal, Quebec, Canada.