Construction of cyclic codes over GF(4) for DNA computing

https://doi.org/10.1016/j.jfranklin.2006.02.009Get rights and content

Abstract

In this paper, we develop the theory for constructing linear and additive cyclic codes of odd length over GF(4) that are suitable for DNA computing. We call this class of codes reversible complement cyclic codes. We use this theory to study all such codes of lengths 7, 9, 11 and 13. We list the codes that have the largest number of codewords for a given minimum Hamming distance. We show that some of these codes have more codewords than previously known codes with the same minimum Hamming distance.

Introduction

DNA computing started in 1994 when Leonard Adleman solved a hard (NP-complete) computational problem such as the Hamiltonian Path problem by manipulations of DNA molecules in a test tube [1]. Because of the huge numbers of DNA molecules in a typical test tube, any method of computation based on DNA would seem to have potential for a massive parallelism, capacity, and power. This potential, however, is limited by the constraints imposed by the chemical structure of DNA.

A single DNA strand is a sequence of four possible nucleotides: adenine (A), guanine (G), cytosine (C) and thymine (T). The ends of a single DNA strand are chemically polar with the so-called 5 end and the 3 end. Since there are 4n possible single DNA strands of length n, and since DNA strands can be quickly and cheaply synthesized, DNA codewords can be used to store information at the molecular level, thereby providing a basis for bimolecular computation. It is known that 1 g of DNA contains 2.1×109 DNA bases. Since there are four DNA bases, each DNA basis can encode two bits which implies that 1 g of DNA can store about 4.2×1021 bits. In contrast, conventional storage technologies can store at most 109 bits.

Applications of DNA require success in achieving specific hybridization between a DNA codeword and its Watson–Crick complement, while minimizing false-positive and false-negative signals. Each single string can be paired up with a complementary string to form a double helix. The Watson–Crick complement of a DNA strand is the strand obtained by replacing each A by T and vice versa, each C by G and vice versa, and switching the 3 and 5 ends. For example, the Watson–Crick complement of 3-CCATTGA-5 is 5-GGTAACT-3. Specific hybridization is the process whereby a strand and its Watson–Crick complement bond to form a double helix. A false-positive results when a non-specific hybridization occurs, such as the one between a DNA strand and the Watson–Crick complement of a different DNA strand. Non-specific hybridization may also occur between a DNA strand and the reverse of a distinct strand. A false negative occurs when hybridization between a DNA strand and its Watson–Crick complement does not occur.

Several papers have proposed different techniques to construct a set of DNA codewords that are unlikely to form undesirable bonds with each other by hybridization [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. For example, in [6], [7], [8], four different constraints on DNA codes are considered, which are: the Hamming constraint for distance d, the reverse-complement constraint, the reverse constraint and the fixed GC-content (all codewords in this case have the same number of Gs and Cs) constraint. The purpose of the first three constraints is to make non-desirable hybridizations difficult to occur. The fixed GC-content constraint is used to make sure that similar melting temperatures are obtained [8]. In [9], the authors use stochastic search algorithms to design codewords that are suitable for DNA computing. In [10], [11], the authors develop genetic and evolutionary algorithms which select sets of DNA sequences that are less likely to form undesirable bonds. Also, in [12], the authors propose to use cyclic codes over GF(4) to construct DNA codes. However, they restrict their work to only linear reversible cyclic codes over GF(4).

In this paper, we use the theory of cyclic codes to construct codes that are suitable for DNA computing. In particular, we construct both linear and additive cyclic codes over GF(4) that are suitable for this application. Our interest in additive codes over GF(4) has been motivated by the correspondence between these codes and quantum codes, as described in [13]. The main advantage of additive codes over linear codes is that, for the same code length, there are more additive codes than linear codes. Therefore, it is more likely to find an additive code that has more codewords than a linear code with the same minimum Hamming distance (for both codes).

We first develop the theory for constructing linear and additive codes of odd length over GF(4) that satisfy the first three constraints mentioned above. Once such codes are constructed, the GC-content constraint can be easily incorporated by removing the codewords that violate this constraint. We use this theory to construct all linear and additive codes of length 7, 9, 11, and 13. We show that some of these codes have more codewords than the best previously known codes with the same minimum Hamming distance. The consequence of having more codewords is an increase in the number of valid DNA sequences, and hence more storage.

The rest of the paper is organized as follows. In Section 1, we give some background and recall some basic results for DNA codes and cyclic codes over GF(4). In Section 2, we develop the theory for constructing linear and additive reversible complement cyclic codes over GF(4). In Section 3, we list the best reversible complement cyclic codes of lengths 7, 9, 11, and 13. Finally, Section 4 concludes the paper.

Section snippets

Background

DNA occurs in sequences, represented by sequences of letters from the alphabet {A,C,G,T}. We define a DNA code of length n to be a set of codewords (u0,,un-1) where ui{A,C,G,T}. These codewords must satisfy the four constraints mentioned above. In this paper, we shall associate codes over {A,C,G,T} with codes over GF(4)={0,1,ω,ϖ} with ϖ=ω2, and ω2+ω+1=0. The letters A, C, G, and T map to 0, ω, ϖ, and 1, respectively. The Watson–Crick complement is given by Ac=T, Tc=A, Cc=G, and Gc=C. For each

Cyclic reversible complement codes over GF(4)

Lemma 5

Let xn-1=g1(x)g2(x)g3(x) where g1(x), g2(x), and g3(x) are nontrivial polynomials that divide xn-1 in GF(4)[x]/(xn-1). Then

  • (1)

    if g1(x) and g2(x) are self-reciprocal then g1(x)g2(x) is self-reciprocal,

  • (2)

    if either g1(x) or g2(x) (but not both) is not self-reciprocal then g1(x)g2(x) is not self-reciprocal,

  • (3)

    if g1(x) and g2(x) are not self-reciprocal then g1(x)g2(x) might be self-reciprocal.

Proof

Let xn-1=g1(x)g2(x)g3(x) where g1(x), g2(x), and g3(x) are nontrivial polynomials that divide (xn-1) in GF(4)[x]/(xn-

Results

In this section, we use the theory developed above to study linear and additive reversible complement cyclic codes over GF(4) of lengths 7, 9, 11, and 13. For the linear case there are only few codes, which are:

  • Length 7: There is one code given by C=(x6+x5+x4+x3+x2+x+1).

  • Length 9: There are three codes given by C1=(x2+x+1),C2=(x6+x3+1),and C3=(x8+x7+x6+x5+x4+x3+x2+x+1).

  • Length 11: There is one code given by C=(x10+x9+x8+x7+x6+x5+x4+x3+x2+x+1).

  • Length 13: There is one code given by C=(x12+x11+x10+x9

Concluding remarks

In this paper, we have developed the theory for constructing linear and additive cyclic codes of odd length over GF(4) that are suitable for DNA computing. We have studied all such codes of lengths 7, 9, 11, and 13. We have shown that some of these codes have more codewords than the best previously known codes with the same Hamming distance.

References (17)

  • J.L. Massey

    Reversible codes

    Inf. Control

    (1964)
  • L. Adleman

    Molecular computation of solutions to combinatorial problems

    Science

    (1994)
  • R. Deaton, R. Murphy, M. Garzon, D.R. Franceschetti, S.E. Stevens, Good encoding for DNA-based solutions to...
  • M. Garzon, P. Neathery, R. Deaton, M. Garzon, R.C. Murphy, D.R. Franceschetti, S.E. Stevens Jr., A new metric for DNA...
  • M. Garzon, R. Deaton, L.F. Nino, S.E. Stevens Jr., M. Wittner, Genome encoding for DNA computing, Proceedings of the...
  • L. Kari, R. Kitto, G. Thierrin, Codes, Involutions, and DNA Encoding, Lecture Notes in Computer Science, vol. 2300,...
  • O. King

    Bounds for DNA codes with constant GC-content

    J. Combin.

    (2003)
  • A. Marathe et al.

    On combinatorial DNA word design

    J. Comput. Biol.

    (2001)
There are more references available in the full text version of this article.

Cited by (65)

  • Future DNA computing device and accompanied tool stack: Towards high-throughput computation

    2021, Future Generation Computer Systems
    Citation Excerpt :

    The DNA computing technology relies on the DNA reaction mechanism such as DSD [11], DNA walker [10], programmable self-assembly [20], and others [21,22]. Those mechanism reveals the underlying components of digital logic circuits in DNA computing and the hierarchy they scale up to form the integrated circuits [23]. In particular, the lasted developed techniques are mostly based on the DSD mechanism for its robustness in the stochastic nature of molecular reactions [24].

  • Reversible complement cyclic codes over Galois rings with application to DNA codes

    2020, Discrete Applied Mathematics
    Citation Excerpt :

    Prior to our work, complement of an element in a ring has been defined in different ways by various authors. For reference see [1,3–5,7,8,10,11,16,19–22,24,25,30]. The following result is due to Kaur et al. [13].

View all citing articles on Scopus

This work was supported in part by Natural Sciences and Engineering Research Council of Canada (NSERC) Grant N00858, and was performed while the first author was on a short-term appointment in the Electrical and Computer Engineering Department, Concordia University, Montreal, Que., Canada.

View full text