Skip to main content
Top
Published in: Wireless Personal Communications 2/2019

14-09-2018

Probabilistic Approach Processing Scheme Based on BLAST for Improving Search Speed of Bioinformatics

Authors: Yoon-Su Jeong, Seung-Soo Shin

Published in: Wireless Personal Communications | Issue 2/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

As researchers on bioinformatics using heuristic algorithms have been increasingly studied, information management used in various bioinformatics fields (new drug development, medical diagnosis, agricultural product improvement, etc.) has been studied mainly on BLAST algorithm. However, many of the algorithms that are being used in the large genome database use a complete sorting procedure, which takes a lot of time to search the database for proteins or nucleic acid sequences, which causes many problems in processing large amounts of bio information. We propose a BLAST-based probabilistic access processing method that can manage, analyze and process a large amount of bio data distributed based on information communication infrastructure and IT technology. The proposed method aims to improve the accessibility of data by linking weighted bioinformatics information with probability factors to easily access large capacity bio data. In addition, the proposed scheme classifies the priority information allocated to the bioinformatics information by hierarchical grouping according to the degree of similarity, thereby ensuring high accuracy of the search results of the bioinformatics information, and at the same time, the goal is to obtain low processing time by classifying information (type, attribute, priority, etc.) into weights by property. Previous researchers have suggested clustering algorithms for fragmentation of genetic information to solve the problem of haplotype assembly in genetics, or proposed particle swarm optimization methods similar to existing genetic algorithms using heuristic clustering method based on MEC model. In the performance evaluation, the proposed method improved the accuracy by average 13.5% and the efficiency of the data retrieval by average 19.7% more than previous scheme. The overhead of Bioinformatics information processing was 8.8% lower and the processing time was average 13.5% lower.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Disz, T., Kubal, M., Olson, R., Overbeek, R., & Stevens, R. (2005). Challenges in large scale distributed computing: bioinformatics, In Proceedings challenges of large applications in distributed environments, 2005. CLADE 2005 (pp. 57–65). Disz, T., Kubal, M., Olson, R., Overbeek, R., & Stevens, R. (2005). Challenges in large scale distributed computing: bioinformatics, In Proceedings challenges of large applications in distributed environments, 2005. CLADE 2005 (pp. 57–65).
2.
go back to reference Sumitomo, J., Hogan, J. M., Newell, F., & Roe, P. (2008). BioMashups: The new world of exploratory bioinformatics? In IEEE fourth international conference on eScience, 2008. eScience’08 (pp. 422–423). Sumitomo, J., Hogan, J. M., Newell, F., & Roe, P. (2008). BioMashups: The new world of exploratory bioinformatics? In IEEE fourth international conference on eScience, 2008. eScience’08 (pp. 422–423).
3.
go back to reference Lengauer, T. (1993). Algorithmic research problems in molecular bioinformatics. In Proceedings of the 2nd Israel symposium on the theory and computing systems, 1993 (pp. 177–192). Lengauer, T. (1993). Algorithmic research problems in molecular bioinformatics. In Proceedings of the 2nd Israel symposium on the theory and computing systems, 1993 (pp. 177–192).
4.
go back to reference Alterovitz, G., & Ramoni, M. F. (2007). Bioinformatics and proteomics: An engineering problem solving-based approach. IEEE Transactions on Education, 50(1), 49–54.MATHCrossRef Alterovitz, G., & Ramoni, M. F. (2007). Bioinformatics and proteomics: An engineering problem solving-based approach. IEEE Transactions on Education, 50(1), 49–54.MATHCrossRef
5.
go back to reference Saaty, T. L. (1990). How to make a decision: The analytic hierarchy process. European Journal of Operational Research, 48(1), 9–26.MathSciNetMATHCrossRef Saaty, T. L. (1990). How to make a decision: The analytic hierarchy process. European Journal of Operational Research, 48(1), 9–26.MathSciNetMATHCrossRef
6.
go back to reference Neelakanta, P., Chatterjee, S., Pappusetty, D., & Pavlovic, M. (2011). Information-theoretic algorithms in bioinformatics and bio-/medical-imaging: A review. In 2011 International conference on recent trends in information technology (ICRTIT) (pp. 183–188). Neelakanta, P., Chatterjee, S., Pappusetty, D., & Pavlovic, M. (2011). Information-theoretic algorithms in bioinformatics and bio-/medical-imaging: A review. In 2011 International conference on recent trends in information technology (ICRTIT) (pp. 183–188).
7.
go back to reference Roman, R., Zhou, J., & Lopez, J. (2009). Feed-forward artificial neural network based inference system applied in bioinformatics data-mining. In International joint conference on neural networks, 2009. IJCNN 2009 (pp. 1744–1749). Roman, R., Zhou, J., & Lopez, J. (2009). Feed-forward artificial neural network based inference system applied in bioinformatics data-mining. In International joint conference on neural networks, 2009. IJCNN 2009 (pp. 1744–1749).
8.
go back to reference Lau, K. W., & Siepen, J. (2006). Bioinformatic approaches to improve the identification of peptides from proteomics experiments. In The institution of engineering and technology seminar on signal processing for genomics (pp. 23–45). Lau, K. W., & Siepen, J. (2006). Bioinformatic approaches to improve the identification of peptides from proteomics experiments. In The institution of engineering and technology seminar on signal processing for genomics (pp. 23–45).
9.
go back to reference Jeong, Y. S., Lee, B. K., & Lee, S. H. (2006). An efficient device authentication protocol using bioinformatic. In 2006 International conference on computational intelligence and security (Vol. 1, pp. 855–858). Jeong, Y. S., Lee, B. K., & Lee, S. H. (2006). An efficient device authentication protocol using bioinformatic. In 2006 International conference on computational intelligence and security (Vol. 1, pp. 855–858).
10.
go back to reference Wang, R. S., Wu, L. Y., Li, Z. P., & Zhang, X. S. (2005). Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics, 21(10), 2456–2462.CrossRef Wang, R. S., Wu, L. Y., Li, Z. P., & Zhang, X. S. (2005). Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics, 21(10), 2456–2462.CrossRef
11.
go back to reference Wang, Y., Feng, E., & Wang, R. (2007). A clustering algorithm based on two distance functions for MEC model. Computational Biology and Chemistry, 31(2), 148–150.MATHCrossRef Wang, Y., Feng, E., & Wang, R. (2007). A clustering algorithm based on two distance functions for MEC model. Computational Biology and Chemistry, 31(2), 148–150.MATHCrossRef
12.
go back to reference Bustamam, A., Burrage, K., & Hamilton, N. A. (2012). Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(3), 679–692.CrossRef Bustamam, A., Burrage, K., & Hamilton, N. A. (2012). Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(3), 679–692.CrossRef
13.
go back to reference Xia, Y., Eugne Ng, T. S., & Sun, X. S. (2015). Blast: Accelerating high-performance data analytics applications by optical multicast. In 2015 IEEE conference on computer communications (INFORCOM) (pp. 1930–1938). Xia, Y., Eugne Ng, T. S., & Sun, X. S. (2015). Blast: Accelerating high-performance data analytics applications by optical multicast. In 2015 IEEE conference on computer communications (INFORCOM) (pp. 1930–1938).
14.
go back to reference Li, D., Li, Y., Wu, J., Su, S., & Yu, J. (2012). ESM: Efficient and scalable data center multicast routing. IEEE/ACM Transactions on Networking, 20(3), 944–955.CrossRef Li, D., Li, Y., Wu, J., Su, S., & Yu, J. (2012). ESM: Efficient and scalable data center multicast routing. IEEE/ACM Transactions on Networking, 20(3), 944–955.CrossRef
15.
go back to reference Li, D., Xu, M., Zhao, M.-C., Guo, C., Zhang, Y., & Wu, M.-Y. (2011). RDCM: Reliable data center multicast. In INFOCOM’11 (pp. 56–60). Li, D., Xu, M., Zhao, M.-C., Guo, C., Zhang, Y., & Wu, M.-Y. (2011). RDCM: Reliable data center multicast. In INFOCOM’11 (pp. 56–60).
16.
go back to reference Sun, X., Fan, L., Yan, L., Kong, L., Ding, Y., Guo, C., et al. (2011). Deliver bioinformatics services in public cloud: Challenges and research framework. In 2011 IEEE 8th international conference on e-business engineering (ICEBE) (pp. 352–357). Sun, X., Fan, L., Yan, L., Kong, L., Ding, Y., Guo, C., et al. (2011). Deliver bioinformatics services in public cloud: Challenges and research framework. In 2011 IEEE 8th international conference on e-business engineering (ICEBE) (pp. 352–357).
17.
go back to reference Oehmen, C., & Nieplocha, J. (2006). ScalaBLAST: A scalable implementation of BLAST for high-performance data-intensive bioinformatics analysis. IEEE Transactions on Parallel and Distributed Systems, 17(8), 740–749.CrossRef Oehmen, C., & Nieplocha, J. (2006). ScalaBLAST: A scalable implementation of BLAST for high-performance data-intensive bioinformatics analysis. IEEE Transactions on Parallel and Distributed Systems, 17(8), 740–749.CrossRef
18.
go back to reference Oehmen, C. S., & Baxter, D. J. (2013). ScalaBLAST 2.0: Rapid and robust BLAST calculations on multiprocessor systems. Bioinformatics, 29(6), 797–798.CrossRef Oehmen, C. S., & Baxter, D. J. (2013). ScalaBLAST 2.0: Rapid and robust BLAST calculations on multiprocessor systems. Bioinformatics, 29(6), 797–798.CrossRef
19.
go back to reference Altschul, S. F., Madden, T. L., Schaeffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389–3402.CrossRef Altschul, S. F., Madden, T. L., Schaeffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389–3402.CrossRef
20.
go back to reference Zhao, K., & Chu, X. (2014). G-BLASTN: Accelerating nucleotide alignment by graphics processors. Bioinformatics, 30(10), 1381–1391.CrossRef Zhao, K., & Chu, X. (2014). G-BLASTN: Accelerating nucleotide alignment by graphics processors. Bioinformatics, 30(10), 1381–1391.CrossRef
22.
go back to reference Lin, H., Ma, X., & Feng, W. (2010). Coordinating computation and I/O in massively parallel sequence search. IEEE Transactions on Parallel and Distributed Systems, 22(4), 529–543.CrossRef Lin, H., Ma, X., & Feng, W. (2010). Coordinating computation and I/O in massively parallel sequence search. IEEE Transactions on Parallel and Distributed Systems, 22(4), 529–543.CrossRef
23.
go back to reference Loh, P.-R., Baym, M., & Berger, B. (2012). Compressive genomics. Nature Biotechnology, 30(7), 627–630.CrossRef Loh, P.-R., Baym, M., & Berger, B. (2012). Compressive genomics. Nature Biotechnology, 30(7), 627–630.CrossRef
24.
go back to reference Lancia, G., Bafna, V., Istrail, S., Lippert, R., & Schwartz, R. (2001). SNPs problems, complexity, and algorithms. Algorithms—ESA 2001 (pp. 182–193). Heidelberg: Springer. Lancia, G., Bafna, V., Istrail, S., Lippert, R., & Schwartz, R. (2001). SNPs problems, complexity, and algorithms. Algorithms—ESA 2001 (pp. 182–193). Heidelberg: Springer.
25.
go back to reference Levy, S., et al. (2007). The diploid genome sequence of an individual human. PLoS Biology, 5(10), e254.CrossRef Levy, S., et al. (2007). The diploid genome sequence of an individual human. PLoS Biology, 5(10), e254.CrossRef
26.
go back to reference Bansal, V., & Bafna, V. (2008). HapCUT: An efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24(16), i153–i159.CrossRef Bansal, V., & Bafna, V. (2008). HapCUT: An efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24(16), i153–i159.CrossRef
27.
go back to reference Bansal, V., Halpern, A. L., Axelrod, N., & Bafna, V. (2008). An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Research, 18(8), 1336–1346.CrossRef Bansal, V., Halpern, A. L., Axelrod, N., & Bafna, V. (2008). An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Research, 18(8), 1336–1346.CrossRef
28.
go back to reference Kim, J. H., Waterman, M. S., & Li, L. M. (2007). Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi. Genome Research, 17(7), 1101–1110.CrossRef Kim, J. H., Waterman, M. S., & Li, L. M. (2007). Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi. Genome Research, 17(7), 1101–1110.CrossRef
29.
go back to reference Duitama, J., et al. (2012). Fosmid-based whole genome haplotyping of a HapMap trio child: Evaluation of single individual haplotyping techniques. Nucleic Acids Research, 40(5), 2041–2053.CrossRef Duitama, J., et al. (2012). Fosmid-based whole genome haplotyping of a HapMap trio child: Evaluation of single individual haplotyping techniques. Nucleic Acids Research, 40(5), 2041–2053.CrossRef
30.
go back to reference Aguiar, D., & Istrail, S. (2012). HapCompass: A fast cycle basis algorithm for accurate haplotype assembly of sequence data. Journal of Computational Biology, 19(6), 577–590.MathSciNetCrossRef Aguiar, D., & Istrail, S. (2012). HapCompass: A fast cycle basis algorithm for accurate haplotype assembly of sequence data. Journal of Computational Biology, 19(6), 577–590.MathSciNetCrossRef
31.
go back to reference Das, S., & Vikalo, H. (2015). SDhaP: Haplotype assembly for diploids and polyploids via semi-definite programming. BMC Genomics, 16(1), 260.CrossRef Das, S., & Vikalo, H. (2015). SDhaP: Haplotype assembly for diploids and polyploids via semi-definite programming. BMC Genomics, 16(1), 260.CrossRef
32.
go back to reference Puljiz, Z., & Vikalo, H. (2016). Decoding genetic variations: Communications inspired haplotype assembly. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13(3), 518–530.CrossRef Puljiz, Z., & Vikalo, H. (2016). Decoding genetic variations: Communications inspired haplotype assembly. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13(3), 518–530.CrossRef
33.
go back to reference He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., & Eskin, E. (2010). Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics, 26(12), i183–i190.CrossRef He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., & Eskin, E. (2010). Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics, 26(12), i183–i190.CrossRef
34.
go back to reference Qian, W., Yang, Y., Yang, N., & Li, C. (2007). Particle swarm optimization for SNP haplotype reconstruction problem. Applied Mathematics and Computation, 196(1), 266–272.MathSciNetMATHCrossRef Qian, W., Yang, Y., Yang, N., & Li, C. (2007). Particle swarm optimization for SNP haplotype reconstruction problem. Applied Mathematics and Computation, 196(1), 266–272.MathSciNetMATHCrossRef
35.
go back to reference Chuang, E. Y. (2013). Combination of high-throughput genomic technologies and bioinformatics for molecular characterization of cancer. In 2013 3rd international conference on instrumentation, communications, information technology, and biomedical engineering (ICICI-BME) (p. 1). Chuang, E. Y. (2013). Combination of high-throughput genomic technologies and bioinformatics for molecular characterization of cancer. In 2013 3rd international conference on instrumentation, communications, information technology, and biomedical engineering (ICICI-BME) (p. 1).
36.
go back to reference A. AI Mazari, “Bioinformatics and Healthcare Computing Models and Services on Grid Initiatives for Data Analysis and Management”, 2014 3rd International Conference on Advanced Computer Science Applications and Technologies (ACSAT), pp. 26-31, Dec. 2014. A. AI Mazari, “Bioinformatics and Healthcare Computing Models and Services on Grid Initiatives for Data Analysis and Management”, 2014 3rd International Conference on Advanced Computer Science Applications and Technologies (ACSAT), pp. 26-31, Dec. 2014.
Metadata
Title
Probabilistic Approach Processing Scheme Based on BLAST for Improving Search Speed of Bioinformatics
Authors
Yoon-Su Jeong
Seung-Soo Shin
Publication date
14-09-2018
Publisher
Springer US
Published in
Wireless Personal Communications / Issue 2/2019
Print ISSN: 0929-6212
Electronic ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-018-5955-3

Other articles of this Issue 2/2019

Wireless Personal Communications 2/2019 Go to the issue