Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 2/2018

06-06-2015 | Original Article

A new approach for the deep order preserving submatrix problem based on sequential pattern mining

Authors: Yun Xue, Tiechen Li, Zhiwen Liu, Chaoyi Pang, Meihang Li, Zhengling Liao, Xiaohui Hu

Published in: International Journal of Machine Learning and Cybernetics | Issue 2/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

As an effective biclustering model, order-preserving submatrix (OPSM) has been widely applied to biological gene expression data mining, which can capture the general tendency of the gene expression under some experimental conditions. Recently, biologists hope to find deep OPSMs with long patterns and comparatively fewer support rows, which are important for the interpretation of gene regulatory networks. However, the traditional exact mining algorithms based on Apriori principle could not deal with the deep OPSM problem, which often take a large minimum support threshold for pattern pruning, and inevitably miss some significant deep OPSMs. In this paper, a new exact algorithm is proposed for mining deep OPSMs. Firstly all the common subsequences shared by every two rows are found out, then the row sets corresponding to the same common subsequences are formed. Finally all the deep OPSMs with support beyond the given threshold will be obtained. Experiments have been done in both real and synthetic data sets, and the results show that this new algorithm is capable of mining all the deep OPSMs over a small support. Under different thresholds of minimum support, this algorithm reveals better performance than the traditional sequential pattern mining algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE Trans Comput Biol Bioinform 1(1):24–45CrossRef Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE Trans Comput Biol Bioinform 1(1):24–45CrossRef
2.
go back to reference Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386CrossRef Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386CrossRef
3.
go back to reference Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) “Automatic subspace clustering of high dimensional data for data mining applications”. In Proceedings of the 24th ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, vol. 27, no. 2, pp. 94–105 Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) “Automatic subspace clustering of high dimensional data for data mining applications”. In Proceedings of the 24th ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, vol. 27, no. 2, pp. 94–105
4.
go back to reference Aggarwal CC, Wolf JL, Yu PS, Procopiuc CM, Park JS (1999) Fast algorithms for projected clustering. In Proceedings of the 25th ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, vol. 22, no. 2, pp. 61–72 Aggarwal CC, Wolf JL, Yu PS, Procopiuc CM, Park JS (1999) Fast algorithms for projected clustering. In Proceedings of the 25th ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, vol. 22, no. 2, pp. 61–72
5.
go back to reference Aggarwal CC, Yu PS (2000) Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the 26th ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, vol. 29, no. 2, pp. 70–81 Aggarwal CC, Yu PS (2000) Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the 26th ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, vol. 29, no. 2, pp. 70–81
6.
go back to reference Jagadish HV, Madar J, Ng RT (1999) Semantic compression and pattern extraction with fascicles. In: Proceedings of the 25th International Conference on Very Large Data Bases, San Francisco, CA, USA, pp. 186–198 Jagadish HV, Madar J, Ng RT (1999) Semantic compression and pattern extraction with fascicles. In: Proceedings of the 25th International Conference on Very Large Data Bases, San Francisco, CA, USA, pp. 186–198
7.
go back to reference Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, San Diego, La Jolla, California, USA, pp. 93–103 Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, San Diego, La Jolla, California, USA, pp. 93–103
8.
9.
go back to reference Liu J, Wang W (2003) OP-Cluster: clustering by tendency in high dimensional space. In: Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 187–194 Liu J, Wang W (2003) OP-Cluster: clustering by tendency in high dimensional space. In: Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 187–194
10.
go back to reference Cheung L, Kevin YY, Cheung DW, Kao B, Michael KN (2007) On mining micro-array data by order-preserving submatrix. Int J Bioinform Res Appl 3(1):42–64CrossRef Cheung L, Kevin YY, Cheung DW, Kao B, Michael KN (2007) On mining micro-array data by order-preserving submatrix. Int J Bioinform Res Appl 3(1):42–64CrossRef
11.
go back to reference Gao BJ, Griffith OL, Ester M et al (2012) On the deep order-preserving submatrix problem: a best effort approach. J IEEE Trans Knowled Data Eng 24(2):309–325CrossRef Gao BJ, Griffith OL, Ester M et al (2012) On the deep order-preserving submatrix problem: a best effort approach. J IEEE Trans Knowled Data Eng 24(2):309–325CrossRef
12.
go back to reference Trapp AC, Prokopyev OA (2010) Solving the order-preserving submatrix problem via integer programming. J INFORMS J Comp 22(3):387–400CrossRefMATH Trapp AC, Prokopyev OA (2010) Solving the order-preserving submatrix problem via integer programming. J INFORMS J Comp 22(3):387–400CrossRefMATH
13.
go back to reference Das C, Maji P (2013) Possibilistic biclustering algorithm for discovering value-coherent overlapping δ-biclusters. Int J Mach Learn Cybernet 1–13 Das C, Maji P (2013) Possibilistic biclustering algorithm for discovering value-coherent overlapping δ-biclusters. Int J Mach Learn Cybernet 1–13
14.
go back to reference Xu X (2013) Enhancing gene expression clustering analysis using tangent transformation. Int J Mach Learn Cybernet 4(1):31–40CrossRef Xu X (2013) Enhancing gene expression clustering analysis using tangent transformation. Int J Mach Learn Cybernet 4(1):31–40CrossRef
15.
go back to reference Liu N, Chen F, Lu M (2013) Spectral co-clustering documents and words using fuzzy K-harmonic means. Int J Mach Learn Cybernet 4(1):75–83CrossRef Liu N, Chen F, Lu M (2013) Spectral co-clustering documents and words using fuzzy K-harmonic means. Int J Mach Learn Cybernet 4(1):75–83CrossRef
16.
go back to reference Ben-Dor A, Chor B, Karp R, Yakhini Z (2002) Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of the 6th Annual International Conference on Computational Biology, Washington, DC, USA, vol. 10, no. 3–4, pp. 49–57 Ben-Dor A, Chor B, Karp R, Yakhini Z (2002) Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of the 6th Annual International Conference on Computational Biology, Washington, DC, USA, vol. 10, no. 3–4, pp. 49–57
17.
go back to reference Barrett T, Troup DB, Wilhite SE et al (2009) NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 37:D885–D890CrossRef Barrett T, Troup DB, Wilhite SE et al (2009) NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 37:D885–D890CrossRef
18.
go back to reference Hubble J, Demeter J, Jin H et al (2009) Implementation of gene pattern within the Stanford microarray database. Nucleic Acids Res 37:D898–D901CrossRef Hubble J, Demeter J, Jin H et al (2009) Implementation of gene pattern within the Stanford microarray database. Nucleic Acids Res 37:D898–D901CrossRef
19.
go back to reference Albert R (2005) Scale-Free networks in cell biology. J Cell Sci 118(21):4947–4957CrossRef Albert R (2005) Scale-Free networks in cell biology. J Cell Sci 118(21):4947–4957CrossRef
20.
go back to reference Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, Avignon, France, vol. 1057, pp. 3–17 Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, Avignon, France, vol. 1057, pp. 3–17
21.
go back to reference Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) Parallel algorithms for discovery of association rules. Data Min Knowl Disc 1(4):343–373CrossRef Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) Parallel algorithms for discovery of association rules. Data Min Knowl Disc 1(4):343–373CrossRef
22.
go back to reference Pei J, Han J, Mortazavi-asl B et al. (2001) PrefixSpan: mining sequential patterns efficiently by prefix projected pattern growth. In: Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, pp. 215–226 Pei J, Han J, Mortazavi-asl B et al. (2001) PrefixSpan: mining sequential patterns efficiently by prefix projected pattern growth. In: Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, pp. 215–226
23.
go back to reference Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential PAttern mining using a bitmap representation. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 429–435 Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential PAttern mining using a bitmap representation. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 429–435
24.
go back to reference Wang H, Lin Z (2007) A novel algorithm for counting all common subsequences. In: Proceedings of IEEE International Conference on Granular Computing, pp. 635–640 Wang H, Lin Z (2007) A novel algorithm for counting all common subsequences. In: Proceedings of IEEE International Conference on Granular Computing, pp. 635–640
27.
go back to reference Tavazoie S, Hughes JD, Campbel MJ, Cho RJ, Church GM (1999) Systematic determination of genetic network architecture. Nat Genet 22(3):281–285CrossRef Tavazoie S, Hughes JD, Campbel MJ, Cho RJ, Church GM (1999) Systematic determination of genetic network architecture. Nat Genet 22(3):281–285CrossRef
28.
go back to reference Ideker T, Thorsson V, Ranish J et al (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292(5518):929–934CrossRef Ideker T, Thorsson V, Ranish J et al (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292(5518):929–934CrossRef
29.
go back to reference Xiao R, Badger TM, Simmen FA (2005) Dietary exposure to soy or whey proteins alters colonic global gene expression profiles during rat colon tumorigenesis. Mol Cancer 4(1):1CrossRef Xiao R, Badger TM, Simmen FA (2005) Dietary exposure to soy or whey proteins alters colonic global gene expression profiles during rat colon tumorigenesis. Mol Cancer 4(1):1CrossRef
30.
go back to reference Martin D, Brun C, Remy E et al (2004) GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol 5(12):R101CrossRef Martin D, Brun C, Remy E et al (2004) GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol 5(12):R101CrossRef
31.
go back to reference McLachlan GJ, Do K-A, Ambroise C (2005) Analyzing Microarray Gene Expression Data. John Wiley and Sons, HobokenMATH McLachlan GJ, Do K-A, Ambroise C (2005) Analyzing Microarray Gene Expression Data. John Wiley and Sons, HobokenMATH
33.
go back to reference Prelic A, Bleuler S, Zimmermann P et al (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129CrossRef Prelic A, Bleuler S, Zimmermann P et al (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129CrossRef
34.
go back to reference Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Int Inform Syst 17(2–3):107–145CrossRefMATH Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Int Inform Syst 17(2–3):107–145CrossRefMATH
35.
go back to reference Ihmels J, Bergmann S, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13):1993–2003CrossRef Ihmels J, Bergmann S, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13):1993–2003CrossRef
36.
go back to reference Ihmels J, Friedlander G, Bergmann S et al (2002) Revealing modular organization in the yeast transcriptional network. Nat Genet 31(4):370–377 Ihmels J, Friedlander G, Bergmann S et al (2002) Revealing modular organization in the yeast transcriptional network. Nat Genet 31(4):370–377
37.
go back to reference Murali TM, Kasif S (2003) Extracting conserved gene expression Motifs from gene expression data. In Pacific Symposium on Biocomputing, Kauai, Hawaii, pp. 77–88 Murali TM, Kasif S (2003) Extracting conserved gene expression Motifs from gene expression data. In Pacific Symposium on Biocomputing, Kauai, Hawaii, pp. 77–88
38.
go back to reference Voorhees EM (1986) Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Inf Process Manage 22(6):465–476CrossRef Voorhees EM (1986) Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Inf Process Manage 22(6):465–476CrossRef
39.
Metadata
Title
A new approach for the deep order preserving submatrix problem based on sequential pattern mining
Authors
Yun Xue
Tiechen Li
Zhiwen Liu
Chaoyi Pang
Meihang Li
Zhengling Liao
Xiaohui Hu
Publication date
06-06-2015
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 2/2018
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-015-0384-z

Other articles of this Issue 2/2018

International Journal of Machine Learning and Cybernetics 2/2018 Go to the issue