Skip to main content
Erschienen in: Journal of Intelligent Information Systems 3/2015

01.12.2015

Subspace clustering of data streams: new algorithms and effective evaluation measures

verfasst von: Marwan Hassani, Yunsu Kim, Seungjin Choi, Thomas Seidl

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nowadays, most streaming data sources are becoming high dimensional. Accordingly, subspace stream clustering, which aims at finding evolving clusters within subgroups of dimensions, has gained a significant importance. However, in spite of the rich literature of subspace and projected clustering algorithms on static data, only three stream projected algorithms are available. Additionally, existing subspace clustering evaluation measures are mainly designed for static data, and cannot reflect the quality of the evolving nature of data streams. On the other hand, available stream clustering evaluation measures care only about the errors of the full-space clustering but not the quality of subspace clustering. In this article we present a method for designing new stream subspace and projected algorithms. We propose also, to the first of our knowledge, the first subspace clustering measure that is designed for streaming data, called SubCMM: Subspace Cluster Mapping Measure. SubCMM is an effective evaluation measure for stream subspace clustering that is able to handle errors caused by emerging, moving, or splitting subspace clusters. Additionally, we propose a novel method for using available offline subspace clustering measures for data streams over the suggested new algorithms within the Subspace MOA framework.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aggarwal, CC, Han, J, Wang, J, Philip, SY (2003). A framework for clustering evolving data streams. In Proceedings of the 29th International Conference on very large data bases - Volume 29, VLDB ’03 (pp. 81–92). Aggarwal, CC, Han, J, Wang, J, Philip, SY (2003). A framework for clustering evolving data streams. In Proceedings of the 29th International Conference on very large data bases - Volume 29, VLDB ’03 (pp. 81–92).
Zurück zum Zitat Aggarwal, CC, Han, J, Wang, J, Philip, SY (2004). A framework for projected clustering of high dimensional data streams. In Proceedings of VLDB ’04 (pp. 852–863). Aggarwal, CC, Han, J, Wang, J, Philip, SY (2004). A framework for projected clustering of high dimensional data streams. In Proceedings of VLDB ’04 (pp. 852–863).
Zurück zum Zitat Aggarwal, CC, Wolf, JL, Philip, SY, Procopiuc, C, Park, JS (1999). Fast algorithms for projected clustering. SIGMOD Record, 28(2), 61–72.CrossRef Aggarwal, CC, Wolf, JL, Philip, SY, Procopiuc, C, Park, JS (1999). Fast algorithms for projected clustering. SIGMOD Record, 28(2), 61–72.CrossRef
Zurück zum Zitat Agrawal, R, Gehrke, J, Gunopulos, D, Raghavan, P (1998). Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of SIGMOD ’98 (pp. 94–105). Agrawal, R, Gehrke, J, Gunopulos, D, Raghavan, P (1998). Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of SIGMOD ’98 (pp. 94–105).
Zurück zum Zitat Assent, I, Krieger, R, Müller, E, Inscy, TS (2008). Indexing subspace clusters with in-process-removal of redundancy. In ICDM (pp. 719–724). Assent, I, Krieger, R, Müller, E, Inscy, TS (2008). Indexing subspace clusters with in-process-removal of redundancy. In ICDM (pp. 719–724).
Zurück zum Zitat Beyer, K, Goldstein, J, Ramakrishnan, R, Shaft, U (1999). When is “nearest neighbor” meaningful? In Proceedings of ICDT ’99 (pp. 217–235). Beyer, K, Goldstein, J, Ramakrishnan, R, Shaft, U (1999). When is “nearest neighbor” meaningful? In Proceedings of ICDT ’99 (pp. 217–235).
Zurück zum Zitat Bohm, C, Kailing, K, Kriegel, H-P, Kroger, Peer (2004). Density connected clustering with local subspace preferences. In ICDM (pp 27–34). Bohm, C, Kailing, K, Kriegel, H-P, Kroger, Peer (2004). Density connected clustering with local subspace preferences. In ICDM (pp 27–34).
Zurück zum Zitat Bringmann, B, & Zimmermann, A (2007). The chosen few: On identifying valuable patterns. In ICDM (pp. 63–72). Bringmann, B, & Zimmermann, A (2007). The chosen few: On identifying valuable patterns. In ICDM (pp. 63–72).
Zurück zum Zitat Cao, F, Ester, M, Qian, W, Zhou, A (2006). Density-based clustering over an evolving data stream with noise. In 2006 SIAM conference on data mining (pp. 328–339). Cao, F, Ester, M, Qian, W, Zhou, A (2006). Density-based clustering over an evolving data stream with noise. In 2006 SIAM conference on data mining (pp. 328–339).
Zurück zum Zitat Chen, Y, & Li, T (2007). Density-based clustering for real-time stream data. In Proceedings of KDD’07 (pp. 133–142). Chen, Y, & Li, T (2007). Density-based clustering for real-time stream data. In Proceedings of KDD’07 (pp. 133–142).
Zurück zum Zitat Ester, M, Kriegel, H-P, Sander, J, Xiaowei, X (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of KDD’96 (pp. 226–231). Ester, M, Kriegel, H-P, Sander, J, Xiaowei, X (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of KDD’96 (pp. 226–231).
Zurück zum Zitat Jiawei, H, & Micheline, K. (2006). Data Mining: Concepts And Techniques.: Elsevier Science & Tech. Jiawei, H, & Micheline, K. (2006). Data Mining: Concepts And Techniques.: Elsevier Science & Tech.
Zurück zum Zitat Hassani, M, Kim, Y, Seidl, T (2013). MOA: Subspace stream clustering evaluation using the MOA framework. In Proceedings of DASFAA ’13 (2) (pp. 446–449). Hassani, M, Kim, Y, Seidl, T (2013). MOA: Subspace stream clustering evaluation using the MOA framework. In Proceedings of DASFAA ’13 (2) (pp. 446–449).
Zurück zum Zitat Hassani, M, Kranen, P, Seidl, T (2011). Precise anytime clustering of noisy sensor data with logarithmic complexity. In Proceedings 5th International Workshop on Knowledge Discovery from Sensor Data (SensorKDD 2011) in conjunction with KDD’11 (pp. 52–60). Hassani, M, Kranen, P, Seidl, T (2011). Precise anytime clustering of noisy sensor data with logarithmic complexity. In Proceedings 5th International Workshop on Knowledge Discovery from Sensor Data (SensorKDD 2011) in conjunction with KDD’11 (pp. 52–60).
Zurück zum Zitat Hassani, M, Müller, E, Seidl, T (2009). EDISKCO: energy efficient distributed in-sensor-network k-center clustering with outliers. In Proceedings of SensorKDD ’10 Workshop in conj. with KDD ’09 (pp. 39–48). Hassani, M, Müller, E, Seidl, T (2009). EDISKCO: energy efficient distributed in-sensor-network k-center clustering with outliers. In Proceedings of SensorKDD ’10 Workshop in conj. with KDD ’09 (pp. 39–48).
Zurück zum Zitat Hassani, M, Spaus, P, Gaber, MM, Seidl, T (2012). Density-based projected clustering of data streams. In Proceedings of the 6th international conference on scalable uncertainty management (SUM 2012) (pp 311–324). Hassani, M, Spaus, P, Gaber, MM, Seidl, T (2012). Density-based projected clustering of data streams. In Proceedings of the 6th international conference on scalable uncertainty management (SUM 2012) (pp 311–324).
Zurück zum Zitat Jain, A, Zhang, Z, Chang, EY (2006). Adaptive non-linear clustering in data streams. In Proceedings of CIKM ’06 (pp. 122–131). Jain, A, Zhang, Z, Chang, EY (2006). Adaptive non-linear clustering in data streams. In Proceedings of CIKM ’06 (pp. 122–131).
Zurück zum Zitat Kaufman, L, & Rousseeuw, PJ. (1990). Finding groups in data: An introduction to cluster analysis. Applied probability and statistics. Wiley: Wiley series in probability and mathematical statistics.CrossRef Kaufman, L, & Rousseeuw, PJ. (1990). Finding groups in data: An introduction to cluster analysis. Applied probability and statistics. Wiley: Wiley series in probability and mathematical statistics.CrossRef
Zurück zum Zitat Kranen, P, Kremer, H, Jansen, T, Seidl, T, Bifet, A, Holmes, G, Pfahringer, B, Read, J (2012). Stream data mining using the MOA framework. In DASFAA (2) (pp. 309–313). Kranen, P, Kremer, H, Jansen, T, Seidl, T, Bifet, A, Holmes, G, Pfahringer, B, Read, J (2012). Stream data mining using the MOA framework. In DASFAA (2) (pp. 309–313).
Zurück zum Zitat Kremer, H, Kranen, P, Jansen, T, Seidl, T, Bifet, A, Holmes, G, Pfahringer, B (2011). An effective evaluation measure for clustering on evolving data streams. In Proceedings of KDD’11 (pp. 868–876). Kremer, H, Kranen, P, Jansen, T, Seidl, T, Bifet, A, Holmes, G, Pfahringer, B (2011). An effective evaluation measure for clustering on evolving data streams. In Proceedings of KDD’11 (pp. 868–876).
Zurück zum Zitat Kriegel, H-P, Peer, K, Renz, M, Wurst, S (2005). A Generic framework for efficient subspace clustering of high-dimensional data. In Proceedings of ICDM’ 05 (pp. 250–257) Kriegel, H-P, Peer, K, Renz, M, Wurst, S (2005). A Generic framework for efficient subspace clustering of high-dimensional data. In Proceedings of ICDM’ 05 (pp. 250–257)
Zurück zum Zitat Kriegel, H-P, Kröger, P, Ntoutsi, I, Zimek, A (2010). Towards subspace clustering on dynamic data: An incremental version of predecon. In Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques, StreamKDD ’10 (pp. 31–38). Kriegel, H-P, Kröger, P, Ntoutsi, I, Zimek, A (2010). Towards subspace clustering on dynamic data: An incremental version of predecon. In Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques, StreamKDD ’10 (pp. 31–38).
Zurück zum Zitat Kröger, P, Kriegel, H-P, Kailing, K (2004). Density-connected subspace clustering for high-dimensional data. In SDM (pp. 246–257). Kröger, P, Kriegel, H-P, Kailing, K (2004). Density-connected subspace clustering for high-dimensional data. In SDM (pp. 246–257).
Zurück zum Zitat Lin, G, & Chen, L. (2008). A grid and fractal dimension-based data stream clustering algorithm. In ISISE ’08 (pp. 66 –70). Lin, G, & Chen, L. (2008). A grid and fractal dimension-based data stream clustering algorithm. In ISISE ’08 (pp. 66 –70).
Zurück zum Zitat Moise, G, Sander, J, Ester, M (2006). P3C: A robust projected clustering algorithm. IEEE International Conference on Data Mining, 0, 414–425. Moise, G, Sander, J, Ester, M (2006). P3C: A robust projected clustering algorithm. IEEE International Conference on Data Mining, 0, 414–425.
Zurück zum Zitat Müller, E, Assent, I, Günnemann, S, Jansen, T, Seidl, T (2009). Opensubspace: An open source framework for evaluation and exploration of subspace clustering algorithms in wek. In Open Source in Data Mining Workshop at PAKDD (pp. 2–13). Müller, E, Assent, I, Günnemann, S, Jansen, T, Seidl, T (2009). Opensubspace: An open source framework for evaluation and exploration of subspace clustering algorithms in wek. In Open Source in Data Mining Workshop at PAKDD (pp. 2–13).
Zurück zum Zitat Müller, E, Günnemann, S, Assent, I, Seidl, T (2009). Evaluating clustering in subspace projections of high dimensional data. PVLDB, 2(1), 1270–1281. Müller, E, Günnemann, S, Assent, I, Seidl, T (2009). Evaluating clustering in subspace projections of high dimensional data. PVLDB, 2(1), 1270–1281.
Zurück zum Zitat Ntoutsi, I, Zimek, A, Palpanas, T, Kröger, P, Kriegel, H-P (2012). Density-based projected clustering over high dimensional data streams. In Proceedings of SDM ’12 (pp. 987–998). Ntoutsi, I, Zimek, A, Palpanas, T, Kröger, P, Kriegel, H-P (2012). Density-based projected clustering over high dimensional data streams. In Proceedings of SDM ’12 (pp. 987–998).
Zurück zum Zitat Park, NH, & Lee, WS. (2007). Grid-based subspace clustering over data streams. In Proceedings of CIKM ’07 (pp. 801–810). Park, NH, & Lee, WS. (2007). Grid-based subspace clustering over data streams. In Proceedings of CIKM ’07 (pp. 801–810).
Zurück zum Zitat Patrikainen, A, & Meila, M (2006). Comparing subspace clusterings. EEE Transactions on Knowledge and Data Engineering, 18(7), 902–916.CrossRef Patrikainen, A, & Meila, M (2006). Comparing subspace clusterings. EEE Transactions on Knowledge and Data Engineering, 18(7), 902–916.CrossRef
Zurück zum Zitat Sequeira, K, & Schism, MZ (2005). A new approach to interesting subspace mining. International Journal of Business Intelligence and Data Mining, 1(2), 137–160.CrossRef Sequeira, K, & Schism, MZ (2005). A new approach to interesting subspace mining. International Journal of Business Intelligence and Data Mining, 1(2), 137–160.CrossRef
Zurück zum Zitat Zhao, Y, & George, K. (2002). Criterion functions for document clustering: Experiments and analysis Technical report: University of Minnesota. Zhao, Y, & George, K. (2002). Criterion functions for document clustering: Experiments and analysis Technical report: University of Minnesota.
Metadaten
Titel
Subspace clustering of data streams: new algorithms and effective evaluation measures
verfasst von
Marwan Hassani
Yunsu Kim
Seungjin Choi
Thomas Seidl
Publikationsdatum
01.12.2015
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 3/2015
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-014-0319-2

Weitere Artikel der Ausgabe 3/2015

Journal of Intelligent Information Systems 3/2015 Zur Ausgabe

Premium Partner