Automated detection of cancerous genomic sequences using genomic signal processing and machine learning
Introduction
Genomic Signal Processing or GSP includes the analysis and processing of genomic signals which are measurable events originating from the genomic sequence to obtain biological knowledge, and then translate that information into systems-based applications to diagnose genetic diseases and treat them [1]. GSP arises from the branch of Electronics Engineering called DSP or Digital Signal Processing, which uses mathematics-based transform techniques like Fast Fourier Transforms (FFTs), or Discrete Wavelet Transforms (DWTs) to analyze the genomic signals.
A Discrete Wavelet Transform (DWT) is a wavelet transform where the wavelets are discretely sampled. It is mainly used for signal coding, i.e. to represent a discrete signal. The DWT of a signal x is calculated by passing it through a series of filters: first the samples are passed through a low-pass filter and then decomposed using a high-pass filter. The decomposition halves the time resolution since only half the number of samples characterizes the entire signal and hence half the frequency band. The frequency band occupies half of previous frequency band and hence the resolution of frequency is increased.
Bioinformatics is a interdisciplinary field where techniques from fields such as computing, statistics and biology have paved way for solving a biological problem. Likewise here we have used Genomic Signal Processing technique which is Discrete Wavelet Transform (DWT) to identify the difference between cancerous and non-cancerous gene sequences using a sequence based approach.
Numerous methods for gene prediction are available M Stanke et al. work for gene prediction using Hidden Markov Model [2] and G Dodin et al. for gene pattern prediction using Digital Signal Processing methods [3]. Many computational methods have been developed to find the cancer causing gene sequences using sequence based method which includes Barman et al. work on prediction of cancer cell using Digital Signal Processing [4]. The use of the concept of Genomic Signal Processing (GSP) in bioinformatics pioneered in P.P. Vaidyanathan et al. work on use of signal-processing concepts in genomics and proteomics [5]. The Genomic Signal Processing techniques are applied and compared with traditional machine learning technique such as Hidden Markov Model in Marhon, A et al. study which stated that DSP based methods have high accuracy in gene finding when compared to other methods [6].
Section snippets
Materials and methods
We have tried to automate the gene identification for genes associated with cancer using genomic signal processing and to differentiate between different types of cancer.
Here we have processed gene sequences for lung, breast and ovarian cancer. The cancerous and non cancerous gene sequences are obtained for the above mentioned cancer types and converted into indicator sequences (complex representation of sequence that GSP techniques can recognize) and processed using GSP techniques like DWT and
Results and discussions
The classification of sequences for lung cancer, breast cancer and ovarian cancer as cancerous and non cancerous gene sequences which uses the statistical parameters obtained by applying Discrete Wavelet Transform (DWT). Table 1 depicts the statistical values extracted from cancerous and non-cancerous sequences for lung cancer.
Table 2, Table 3 depicts the statistical values extracted from cancerous and non-cancerous sequences for breast cancer and ovarian cancer respectively. The statistical
Conclusion
Genomic Signal processing methods detects the difference between cancerous and non cancerous gene sequences for lung, breast and ovarian cancer efficiently. Classification yielded a model with good accuracy but optimal model can be obtained only when above procedure is applied for all types of cancer.
Liu Dongwei received his M.S. degree in Material from Shanghai Institute of Technology, China. His research interest is mainly in the area of Polyurethane elastomer.
References (13)
Fourier and wavelet transform analysis, a tool for visualizing regular patterns in DNA sequences
J. Theoret. Biol.
(2000)Genomic signal processing (GSP)
Bioinform. Trends
(2006)Gene prediction with a hidden Markov model and a new intron submodel
Bioinformatics
(2003)- S. Barman, Prediction of cancer cell using digital signal processing, IJE (ISSN:...
- P.P. Vaidyanathan, The role of signal-processing concepts in genomics and proteomics,...
- A. Marhon, A brief comparison of DSP and HMM methods for gene finding,...
Cited by (9)
Clustering and classification of virus sequence through music communication protocol and wavelet transform
2021, GenomicsCitation Excerpt :The k-mers were mapped and transformed into discrete wavelet to get a numeric featured vector for the clustering [35]. A Haar wavelet filtering method was used to decompose the sequences for detecting cancerous genome by Liu et al. [36]. The author extracted statistical data of cancerous and non-cancerous genome and classified via machine learning [36].
Classification of ALL and CML malignancies being among the main types of leukaemia with graph neural networks and fuzzy logic algorithm
2023, Journal of the Faculty of Engineering and Architecture of Gazi UniversityA Gene Feature Extraction Method Based on Across-view Similarity Order Preserving
2023, Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information TechnologyClassification of exon and intron regions obtained using digital signal processing techniques on the DNA genome sequencing with EfficientNetB7 architecture
2022, Journal of the Faculty of Engineering and Architecture of Gazi UniversityA Robust Feature Extraction and Deep Learning Approach for Cancer Gene Prognosis
2022, International Journal of Biology and Biomedical Engineering
Liu Dongwei received his M.S. degree in Material from Shanghai Institute of Technology, China. His research interest is mainly in the area of Polyurethane elastomer.
Jia Runping received her Ph.D. degree in Material from TONGJI University in Shanghai, China. She is currently a professor in Shanghai Institute of Technology. Her research interest is mainly in the area of Polyurethane. She has published several research papers in scholarly journals in the above research areas and has participated in several conferences.
Wang Caifeng received her M.S. degree in Material from Shanghai Institute of Technology, China. Her research interest is mainly in the area of Polyurethane elastomer.
Arunkumar N has completed in his B.E., M.E. and Ph.D. in Electronics and Communication Engineering with specialization in Biomedical Engineering. He has a strong academic teaching and research experience of more than 10 years in SASTRA University, India. He is appreciated for his innovative research oriented teaching related practical life experiences to the principles of engineering. He is active in research and has been giving directions to active researchers across the globe. He has published more than 60 papers in peer reviewed academic journals with high impact factors. His main areas include machine learning, artificial intelligence and IoT. He is in the editorial board of few journals in his area of research.
K. Narasimhan received the M.Sc. degree with Electronics Specialization from Bharathidasan University, M.Tech. in Non destructive Testing from Regional Engineering College, Trichy and the Ph.D. degree from SASTRA University in the field of medical image processing. His research interests include Digital Image Processing, Medical Image analysis, Pattern Recognition, Digital Signal processing. He has published more than 50 papers in reputed international journals and conferences. He is currently working as Senior Assistant Professor in the Department of ECE, School of EEE, SASTRA Deemed University, Thanjavur. He is a Life Member of the Indian Society of Systems for Science and Engineering (ISSE).
M. Udayakumar graduated with an M.Tech. degree from the Department of Bioinformatics, SASTRA Deemed University, Thanjavur, India. He is an Assistant Professor III in the School of Chemical & Biotechnology, SASTRA Deemed University. His research work is mainly on designing tools, webserver application and database development for bioinformatics applications. His ongoing research is on structural analysis and crystallography studies on small molecules. He is a Life Member of the Indian Society of Systems for Science and Engineering (ISSE).
V. Elamaran received the B.E. degree in Electronics and Communication Engineering from Madurai Kamaraj University, and the M.E. degree in Systems Engineering and Operations Research from Anna University, India. Currently he is pursuing Ph.D. in the area of Low Power VLSI Design from SASTRA Deemed University, Thanjavur, India. His main research interests are signal, image and video processing, digital VLSI design circuits, design for testability, and FPGA based systems. He has published more than 80 research papers in reputed international journals and conferences. He is currently working as Assistant Professor in the Department of ECE, School of EEE, SASTRA Deemed University, Thanjavur. He is a Life Member of the Indian Society for Technical Education (ISTE) and the Indian Society of Systems for Science and Engineering (ISSE).