Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 2-3/2006

01.05.2006

Sequential Pattern Mining in Multi-Databases via Multiple Alignment

verfasst von: Hye-Chung Kum, Joong Hyuk Chang, Wei Wang

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 2-3/2006

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

To efficiently find global patterns from a multi-database, information in each local database must first be mined and summarized at the local level. Then only the summarized information is forwarded to the global mining process. However, conventional sequential pattern mining methods based on support cannot summarize the local information and is ineffective for global pattern mining from multiple data sources. In this paper, we present an alternative local mining approach for finding sequential patterns in the local databases of a multi-database. We propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. Approximate sequential patterns can effectively summerize and represent the local databases by identifying the underlying trends in the data. We present a novel algorithm, ApproxMAP, to mine approximate sequential patterns, called consensus patterns, from large sequence databases in two steps. First, sequences are clustered by similarity. Then, consensus patterns are mined directly from each cluster through multiple alignment. We conduct an extensive and systematic performance study over synthetic and real data. The results demonstrate that ApproxMAP is effective and scalable in mining large sequences databases with long patterns. Hence, ApproxMAP can efficiently summarize a local database and reduce the cost for global mining. Furthremore, we present an elegant and uniform model to identify both high vote sequential patterns and exceptional sequential patterns from the collection of these consensus patterns from each local databases.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proc. of International Conference on Data Engineering (ICDE), Taipei, Taiwan, pp. 3–14. Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proc. of International Conference on Data Engineering (ICDE), Taipei, Taiwan, pp. 3–14.
Zurück zum Zitat Ayres, J., Flannick, J., Gehrke, J., and Yiu, T. 2002. Sequential pattern mining using a bitmap representation. In Proceedings of the ACM International Conference on Knowledge discovery and data mining (SIGKDD), pp. 429–435. Ayres, J., Flannick, J., Gehrke, J., and Yiu, T. 2002. Sequential pattern mining using a bitmap representation. In Proceedings of the ACM International Conference on Knowledge discovery and data mining (SIGKDD), pp. 429–435.
Zurück zum Zitat Ertoz, L., Steinbach, M., and Kumar, V. 2003. Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In Third SIAM International Conference on Data Mining(SDM), San Fransico. CA, pp. 47–58. Ertoz, L., Steinbach, M., and Kumar, V. 2003. Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In Third SIAM International Conference on Data Mining(SDM), San Fransico. CA, pp. 47–58.
Zurück zum Zitat Fukunaga, K.K. and Narendra, P.M. 1975. A branch and bound algorithm for computing k-nearest neighbours. IEEE Transactions on Computers, 24: 750–753. Fukunaga, K.K. and Narendra, P.M. 1975. A branch and bound algorithm for computing k-nearest neighbours. IEEE Transactions on Computers, 24: 750–753.
Zurück zum Zitat Gotoh, O. 1999. Multiple sequence alignment: Algorithms and applications. Adv. Biophys., 36: 159–206. Gotoh, O. 1999. Multiple sequence alignment: Algorithms and applications. Adv. Biophys., 36: 159–206.
Zurück zum Zitat Gusfield, D. 1997. Algorithms on strings, trees, and sequences: Computer science and computational biology. Cambridge Univ. Press, Cambridge, England. Gusfield, D. 1997. Algorithms on strings, trees, and sequences: Computer science and computational biology. Cambridge Univ. Press, Cambridge, England.
Zurück zum Zitat Jain, A.K., Murty, M.N., and Flynn, P.J. 1999. Data clustering: A review. ACM Computing Surveys, 31(3): 264–323. Jain, A.K., Murty, M.N., and Flynn, P.J. 1999. Data clustering: A review. ACM Computing Surveys, 31(3): 264–323.
Zurück zum Zitat Kohavi, R., Brodley, C., Frasca, B., Mason, L., and Zheng, Z. 2000. KDD-CUP 2000 Organizers’ Report: Peeling the Onion. In Proc. SIGKDD Explorations, 2: 86–98. Kohavi, R., Brodley, C., Frasca, B., Mason, L., and Zheng, Z. 2000. KDD-CUP 2000 Organizers’ Report: Peeling the Onion. In Proc. SIGKDD Explorations, 2: 86–98.
Zurück zum Zitat Kum, H.C., Pei, J., Wang, W., and Duncan, D. 2003. ApproxMAP : Approximate mining of consensus sequential patterns. In Third SIAM International Conference on Data Mining(SDM), San Fransico, CA, pp. 311–315. Kum, H.C., Pei, J., Wang, W., and Duncan, D. 2003. ApproxMAP : Approximate mining of consensus sequential patterns. In Third SIAM International Conference on Data Mining(SDM), San Fransico, CA, pp. 311–315.
Zurück zum Zitat Kum, H.C., Paulsen, S., and Wang, W. 2005. Comparitive Study of Sequential Pattern Mining Models. Studies in Computational Intelligence: Foundations of Data Mining and Knowledge Discovery, Vol. 6, Springer, pp. 45–71. Kum, H.C., Paulsen, S., and Wang, W. 2005. Comparitive Study of Sequential Pattern Mining Models. Studies in Computational Intelligence: Foundations of Data Mining and Knowledge Discovery, Vol. 6, Springer, pp. 45–71.
Zurück zum Zitat Kum, H.C., Paulsen, S., and Wang, W. 2005. Comparitive Study of Sequential Pattern Mining Models. Studies in Computational Intelligence: Foundations of Data Mining and Knowledge Discovery, Vol. 6, Springer, pp. 45–71. Kum, H.C., Paulsen, S., and Wang, W. 2005. Comparitive Study of Sequential Pattern Mining Models. Studies in Computational Intelligence: Foundations of Data Mining and Knowledge Discovery, Vol. 6, Springer, pp. 45–71.
Zurück zum Zitat McPherson, G.R. and DeStefano, S. 2002. Applied Ecology and Natural Resource Management. Cambridge Univ. Press, Cambridge, England. McPherson, G.R. and DeStefano, S. 2002. Applied Ecology and Natural Resource Management. Cambridge Univ. Press, Cambridge, England.
Zurück zum Zitat Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., and Hsu, M.C. 2001. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. of International Conference on Data Engineering (ICDE), pp. 215–224. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., and Hsu, M.C. 2001. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. of International Conference on Data Engineering (ICDE), pp. 215–224.
Zurück zum Zitat Sander, J., Ester, M., Kriegel, H.P., and Xu, X. 1998. Density based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery, 2(2): 169–194. Sander, J., Ester, M., Kriegel, H.P., and Xu, X. 1998. Density based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery, 2(2): 169–194.
Zurück zum Zitat Sas Institute. 2000. Proc Modeclust. In SAS/STAT User Guide: Sas online Document. Sas Institute. 2000. Proc Modeclust. In SAS/STAT User Guide: Sas online Document.
Zurück zum Zitat Spiliopoulou, M. 1999. Managing interesting rules in sequence mining. In Proc. European Conf. on Principles and Practice of Knowledge Discovery in Databases, pp. 554–560. Spiliopoulou, M. 1999. Managing interesting rules in sequence mining. In Proc. European Conf. on Principles and Practice of Knowledge Discovery in Databases, pp. 554–560.
Zurück zum Zitat Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proc. 6th Intl. Conf Extending Database Technology (EDBT), pp. 3–17. Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proc. 6th Intl. Conf Extending Database Technology (EDBT), pp. 3–17.
Zurück zum Zitat Thompson, J., Plewniak, F., and Poch, O. 1999. A comprehensive comparison of multiple sequence alignment programs. Oxford Univ. Press. Nucleic Acids Research, 27(13): 2682–2690. Thompson, J., Plewniak, F., and Poch, O. 1999. A comprehensive comparison of multiple sequence alignment programs. Oxford Univ. Press. Nucleic Acids Research, 27(13): 2682–2690.
Zurück zum Zitat Wong, M.A. and Lane, T. 1983. A kth nearest neighbor clustering procedure. Journal of the Royal Statistical Society, Series B, 45: 362–368. Wong, M.A. and Lane, T. 1983. A kth nearest neighbor clustering procedure. Journal of the Royal Statistical Society, Series B, 45: 362–368.
Zurück zum Zitat Wu, X. and Zhang, S. 2003. Synthesizing high-frequency rules from different data sources. IEEE Trans. Knowledge Data Engineering, 15(2): 353–367. Wu, X. and Zhang, S. 2003. Synthesizing high-frequency rules from different data sources. IEEE Trans. Knowledge Data Engineering, 15(2): 353–367.
Zurück zum Zitat Wu, X., Zhang, C., and Zhang, S. 2005. Database classification for multi-database mining. Information System, 30(1): 71–88. Wu, X., Zhang, C., and Zhang, S. 2005. Database classification for multi-database mining. Information System, 30(1): 71–88.
Zurück zum Zitat Yan, X., Han, J., and Afshar, R. 2003. CloSpan: Mining closed sequential patterns in larege datasets. In Third SIAM International Conference on Data Mining (SDM), San Fransico, CA, pp. 166–177. Yan, X., Han, J., and Afshar, R. 2003. CloSpan: Mining closed sequential patterns in larege datasets. In Third SIAM International Conference on Data Mining (SDM), San Fransico, CA, pp. 166–177.
Zurück zum Zitat Yang, J., Yu, P.S., Wang, W., and Han, J. 2002. Mining long sequential patterns in a noisy environment. In Proc. of ACM Int’l Conf. On Management of Data (SIGMOD), Madison, WI, pp. 406–417. Yang, J., Yu, P.S., Wang, W., and Han, J. 2002. Mining long sequential patterns in a noisy environment. In Proc. of ACM Int’l Conf. On Management of Data (SIGMOD), Madison, WI, pp. 406–417.
Zurück zum Zitat Zaki, M.J. 1998. Efficient enumeration of frequent sequences. In 7th International Conference Information and Knowledge Management, pp. 68–75. Zaki, M.J. 1998. Efficient enumeration of frequent sequences. In 7th International Conference Information and Knowledge Management, pp. 68–75.
Zurück zum Zitat Zhang, C., Liu, M., Nie,W., and Zhang, S. 2004a. Identifying global exceptional patterns in multi-database mining. IEEE Computational Intelligence Bulletin, 3(1): 19–24. Zhang, C., Liu, M., Nie,W., and Zhang, S. 2004a. Identifying global exceptional patterns in multi-database mining. IEEE Computational Intelligence Bulletin, 3(1): 19–24.
Zurück zum Zitat Zhang, S., Wu, X., and Zhang, C. 2003. Multi-Database Mining. IEEE Computational Intelligence Bulletin, 2(1): 5–13. Zhang, S., Wu, X., and Zhang, C. 2003. Multi-Database Mining. IEEE Computational Intelligence Bulletin, 2(1): 5–13.
Zurück zum Zitat Zhang, S., Zhang, C., and Yu, J.X. 2004b. An efficient strategy for mining exceptions in multi-databases. Information System, 165(1–2): 1–20. Zhang, S., Zhang, C., and Yu, J.X. 2004b. An efficient strategy for mining exceptions in multi-databases. Information System, 165(1–2): 1–20.
Zurück zum Zitat Zhong, N., Yao, Y., and Ohsuga, S. 1999. Peculiarity oriented multi-database mining. In Proceedings of PKDD, pp. 136–146. Zhong, N., Yao, Y., and Ohsuga, S. 1999. Peculiarity oriented multi-database mining. In Proceedings of PKDD, pp. 136–146.
Metadaten
Titel
Sequential Pattern Mining in Multi-Databases via Multiple Alignment
verfasst von
Hye-Chung Kum
Joong Hyuk Chang
Wei Wang
Publikationsdatum
01.05.2006
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 2-3/2006
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-005-0017-3

Weitere Artikel der Ausgabe 2-3/2006

Data Mining and Knowledge Discovery 2-3/2006 Zur Ausgabe