Skip to main content
Top
Published in: Neural Computing and Applications 8/2018

22-02-2017 | New Trends in data pre-processing methods for signal and image classification

A novel numerical mapping method based on entropy for digitizing DNA sequences

Authors: Bihter Das, Ibrahim Turkoglu

Published in: Neural Computing and Applications | Issue 8/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recently, digital signal processing has been widely applied in the study of genomics. One of the genomic studies is identification of protein-coding regions. Where is a protein coded? How much is encoded? Where are growth and development regulated? The answer to these questions is possible by DNA sequences that can be classified as the exon and intron. In signal processing application, numerical signals are used due to symbolic signal nature of DNA sequence; yet, it must be converted from symbolic sequence to numeric sequence prior the analysis in data preprocessing. The bases in a DNA sequence are represented with four letters A, G, C and T. Each letter corresponds to a numeric value. In the literature, several numerical mapping techniques exist. In this paper, a novel numerical mapping approach has been proposed for converting string to numerical values. Each codon is mapped by improved fractional derivative of Shannon equation in this approach. For exon regions prediction, three methods have been used. These methods are singular value decomposition (SVD), discrete Fourier transform (DFT) and short-time Fourier transform (STFT). The performance of the proposed mapping technique has been evaluated based on the above-mentioned three classification methods. The proposed novel technique has showed more success in the identification of protein-coding regions as compared to the predominant existing mapping techniques SVD, DFT and STFT methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ficket JW, Tung CS (1992) Assessment of protein coding measures. Nucleic Acid Res 20(24):6441–6450CrossRef Ficket JW, Tung CS (1992) Assessment of protein coding measures. Nucleic Acid Res 20(24):6441–6450CrossRef
5.
go back to reference Kwan HK, Arniker SB (2009) Numerical representation of DNA sequences. In: IEEE international conference on electro/information technology, EIT ‘09, Windsor, pp 307–310 Kwan HK, Arniker SB (2009) Numerical representation of DNA sequences. In: IEEE international conference on electro/information technology, EIT ‘09, Windsor, pp 307–310
6.
go back to reference Grandhi DG, Vijaykumar C (2007) Simplex mapping for identifying the protein coding regions in DNA. TENCON-2007, Taiwan Grandhi DG, Vijaykumar C (2007) Simplex mapping for identifying the protein coding regions in DNA. TENCON-2007, Taiwan
7.
go back to reference Cristea PD (2002) Genetic signal representation and analysis. In: SPIE information conference biomedical optics, pp 77–84 Cristea PD (2002) Genetic signal representation and analysis. In: SPIE information conference biomedical optics, pp 77–84
8.
go back to reference Akhtar M, Epps J, Ambikairajah E (2007) On DNA numerical representations for period-3 based exon prediction. IEEE workshop on genomic signal processing and statistics (GENSIPS), pp 1–4. doi:10.1109/GENSIPS.2007.4365821 Akhtar M, Epps J, Ambikairajah E (2007) On DNA numerical representations for period-3 based exon prediction. IEEE workshop on genomic signal processing and statistics (GENSIPS), pp 1–4. doi:10.​1109/​GENSIPS.​2007.​4365821
9.
go back to reference Holden T, Subramaniam R, Sullivan R, Cheng E, Sneider C, Tremberger G, Flamholz JA, Leiberman DH, Cheung TD (2007) ATCG nucleotide fluctuation of deinococcus radiodurans radiation genes. In: Proceedings of society of photo-optical instrumentation engineers (SPIE), pp 1598–1609 Holden T, Subramaniam R, Sullivan R, Cheng E, Sneider C, Tremberger G, Flamholz JA, Leiberman DH, Cheung TD (2007) ATCG nucleotide fluctuation of deinococcus radiodurans radiation genes. In: Proceedings of society of photo-optical instrumentation engineers (SPIE), pp 1598–1609
11.
go back to reference Zahhad MA, Ahmed SM, Elrahman SAA (2012) Genomic analysis and classification of exon and intron sequences using DNA numerical mapping techniques. Int J Inf Technol Comput Sci. doi:10.5815/ijitcs.2012.08.03 Zahhad MA, Ahmed SM, Elrahman SAA (2012) Genomic analysis and classification of exon and intron sequences using DNA numerical mapping techniques. Int J Inf Technol Comput Sci. doi:10.​5815/​ijitcs.​2012.​08.​03
13.
go back to reference Zahhad MA, Ahmed SM, Elrahman SAA (2013) A new numerical mapping technique for recognition of exons and introns in DNA sequences. In: National radio science conference Zahhad MA, Ahmed SM, Elrahman SAA (2013) A new numerical mapping technique for recognition of exons and introns in DNA sequences. In: National radio science conference
14.
go back to reference Cosic I (1994) Macromolecular bioactivity: is it resonant interaction between macromolecules? Theory and applications. IEEE Trans Biomed Eng. doi:10.1109/10.335859 Cosic I (1994) Macromolecular bioactivity: is it resonant interaction between macromolecules? Theory and applications. IEEE Trans Biomed Eng. doi:10.​1109/​10.​335859
17.
go back to reference Buldyrev SV, Goilberger AL, Havlin S, Mantegna RN, Mastsa ME, Peng CK, Simons M, Stanley HE (1995) Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis. Phys Rev E 51(5):5084–5091. doi:10.1103/PhysRevE.51.5084 CrossRef Buldyrev SV, Goilberger AL, Havlin S, Mantegna RN, Mastsa ME, Peng CK, Simons M, Stanley HE (1995) Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis. Phys Rev E 51(5):5084–5091. doi:10.​1103/​PhysRevE.​51.​5084 CrossRef
18.
go back to reference Peng C-K, Buldyrev SV, Goldberger AL, Havlin S, Sciortino F, Simons M, Stanley HE, Goldberger AL, Havlin S, Peng CK, Stanley HE, Viswanathan GM (1998) Analysis of DNA sequences using methods of statistical physics. Phys A 249:430–438. doi:10.1016/S0378-4371(97)00503-7 CrossRefMATH Peng C-K, Buldyrev SV, Goldberger AL, Havlin S, Sciortino F, Simons M, Stanley HE, Goldberger AL, Havlin S, Peng CK, Stanley HE, Viswanathan GM (1998) Analysis of DNA sequences using methods of statistical physics. Phys A 249:430–438. doi:10.​1016/​S0378-4371(97)00503-7 CrossRefMATH
19.
go back to reference Hota MK (2011) Identification of protein-coding regions in eukaryotes using Fourier Transforms and Singular Value Decomposition using multiple length sliding windows. Int J Signal Imaging Syst Eng. doi:10.1504/IJSISE.2011.041604 Hota MK (2011) Identification of protein-coding regions in eukaryotes using Fourier Transforms and Singular Value Decomposition using multiple length sliding windows. Int J Signal Imaging Syst Eng. doi:10.​1504/​IJSISE.​2011.​041604
22.
go back to reference Golub GH, Van Loan CF (1989) Matrix computations, 2nd edn. Johns Hopkins University Press, BaltimoreMATH Golub GH, Van Loan CF (1989) Matrix computations, 2nd edn. Johns Hopkins University Press, BaltimoreMATH
23.
24.
go back to reference Kwan JYY, Kwan BYM, Kwan HK (2010) Spectral analysis of numerical exon and intron sequences. In: Proceedings of IEEE international conference on bioinformatics and biomedicine workshops, Hong Kong, pp 876–877 Kwan JYY, Kwan BYM, Kwan HK (2010) Spectral analysis of numerical exon and intron sequences. In: Proceedings of IEEE international conference on bioinformatics and biomedicine workshops, Hong Kong, pp 876–877
25.
go back to reference Vaidyanathan PP, ve Yoon B-J (2002) Gene and exon prediction using allpass-based filters. Workshop on genomic signal processing and statistics, Raleigh, NC, pp 45–55. doi:10.1016/S1672-0229(11)60007-7 Vaidyanathan PP, ve Yoon B-J (2002) Gene and exon prediction using allpass-based filters. Workshop on genomic signal processing and statistics, Raleigh, NC, pp 45–55. doi:10.​1016/​S1672-0229(11)60007-7
26.
go back to reference Hota MK, Srivastava VK (2010) Performance analysis of different DNA to numerical mapping techniques for identification of protein coding regions using tapered window based short-time Discrete Fourier Transform. In: 2010 international conference on power control and embedded systems. doi:10.1109/ICPCES.2010.5698675 Hota MK, Srivastava VK (2010) Performance analysis of different DNA to numerical mapping techniques for identification of protein coding regions using tapered window based short-time Discrete Fourier Transform. In: 2010 international conference on power control and embedded systems. doi:10.​1109/​ICPCES.​2010.​5698675
31.
go back to reference Kozarzewski B (2012) A method for nucleotide sequence analysis. Comput Methods Sci Technol 18(1):5–10CrossRef Kozarzewski B (2012) A method for nucleotide sequence analysis. Comput Methods Sci Technol 18(1):5–10CrossRef
34.
go back to reference Karcı A (2016) New kinds of entropy: fractional entropy. In: International conference on natural science and engineering (ICNASE’16). 19–20 March, Kilis Karcı A (2016) New kinds of entropy: fractional entropy. In: International conference on natural science and engineering (ICNASE’16). 19–20 March, Kilis
38.
go back to reference Akhtar M, Ambikairajah E, Epps J (2005) Detection of period-3 behavior in genomic sequences using singular value decomposition. In: International conference on emerging technologies, vol 12, p 430. doi:10.1186/1471-2105-12-430 Akhtar M, Ambikairajah E, Epps J (2005) Detection of period-3 behavior in genomic sequences using singular value decomposition. In: International conference on emerging technologies, vol 12, p 430. doi:10.​1186/​1471-2105-12-430
39.
go back to reference Das B, Turkoglu I (2016) A new mapping technique for separation of exons and introns by using DFT method. In: International conference on engineering and natural science, Sarajevo, vol 2, no 10, pp 2778–2784 Das B, Turkoglu I (2016) A new mapping technique for separation of exons and introns by using DFT method. In: International conference on engineering and natural science, Sarajevo, vol 2, no 10, pp 2778–2784
40.
go back to reference Das B, Turkoglu I (2016) Sayisal Haritalama Teknikleri ve Fourier Dönüşümü Kullanılarak DNA Dizilimlerinin Sınıflandırılması, (Turkish). J Fac Eng Archit Gazi Univ 31(4):921–932. doi:10.17341/gazimmfd.278447 Das B, Turkoglu I (2016) Sayisal Haritalama Teknikleri ve Fourier Dönüşümü Kullanılarak DNA Dizilimlerinin Sınıflandırılması, (Turkish). J Fac Eng Archit Gazi Univ 31(4):921–932. doi:10.​17341/​gazimmfd.​278447
41.
go back to reference Das B, Turkoglu I (2016) A new numerical mapping approach for identification protein coding regions in DNA sequences by using SVD method. In: International conference on engineering and natural science, Sarajevo, vol 2, no 10, pp 2773–2777 Das B, Turkoglu I (2016) A new numerical mapping approach for identification protein coding regions in DNA sequences by using SVD method. In: International conference on engineering and natural science, Sarajevo, vol 2, no 10, pp 2773–2777
Metadata
Title
A novel numerical mapping method based on entropy for digitizing DNA sequences
Authors
Bihter Das
Ibrahim Turkoglu
Publication date
22-02-2017
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 8/2018
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-017-2871-5

Other articles of this Issue 8/2018

Neural Computing and Applications 8/2018 Go to the issue

New Trends in data pre-processing methods for signal and image classification

Hybrid classifier based life cycle stages analysis for malaria-infected erythrocyte using thin blood smear images

New Trends in data pre-processing methods for signal and image classification

Automatic detection of sleep spindles with the use of STFT, EMD and DWT methods

New Trends in data pre-processing methods for signal and image classification

A comparative study on parameters of leaf-shaped patch antenna using hybrid artificial intelligence network models

Premium Partner