Skip to main content

2017 | OriginalPaper | Buchkapitel

Partition-Based Clustering with Sliding Windows for Data Streams

verfasst von : Jonghem Youn, Jihun Choi, Junho Shim, Sang-goo Lee

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data stream clustering with sliding windows generates clusters for every window movement. Because repeated clustering on all changed windows is highly inefficient in terms of memory and computation time, a clustering algorithm should be designed with considering only inserted and deleted tuples of windows. In this paper, we address this problem by sliding window aggregation technique and cluster modification strategy. We propose a novel data structure for construction and maintenance of 2-level synopses. This data structure enables to update synopses efficiently and support precise sliding window operations. We also suggest a modification strategy to decide whether to append new synopses to pre-existing clusters or perform clustering on whole synopses according to the difference between probability distributions of the original and updated clusters. Experimental results show that proposed method outperforms state-of-the-art methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++: a clustering algorithm for data streams. J. Exp. Algorithmics 17, 2.4:2.1–2.4:2.30 (2012) Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++: a clustering algorithm for data streams. J. Exp. Algorithmics 17, 2.4:2.1–2.4:2.30 (2012)
2.
Zurück zum Zitat Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 81–92 (2003). VLDB Endowment Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 81–92 (2003). VLDB Endowment
3.
Zurück zum Zitat Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006)CrossRef Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006)CrossRef
4.
Zurück zum Zitat Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 234–243. ACM (2003) Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 234–243. ACM (2003)
5.
Zurück zum Zitat Braverman, V., Lang, H., Levin, K., Monemizadeh, M.: Clustering problems on sliding windows. In: Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1374–1390. Society for Industrial and Applied Mathematics (2016) Braverman, V., Lang, H., Levin, K., Monemizadeh, M.: Clustering problems on sliding windows. In: Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1374–1390. Society for Industrial and Applied Mathematics (2016)
6.
Zurück zum Zitat Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: 2006 SIAM Conference on Data Mining, pp. 328–339. SIAM (2006) Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: 2006 SIAM Conference on Data Mining, pp. 328–339. SIAM (2006)
7.
Zurück zum Zitat Dang, X.H., Lee, V., Ng, W.K., Ciptadi, A., Ong, K.L.: An EM-based algorithm for clustering data streams in sliding windows. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds.) DASFAA 2009. LNCS, vol. 5463, pp. 230–235. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00887-0_18 CrossRef Dang, X.H., Lee, V., Ng, W.K., Ciptadi, A., Ong, K.L.: An EM-based algorithm for clustering data streams in sliding windows. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds.) DASFAA 2009. LNCS, vol. 5463, pp. 230–235. Springer, Heidelberg (2009). doi:10.​1007/​978-3-642-00887-0_​18 CrossRef
8.
Zurück zum Zitat Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004) Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004)
9.
Zurück zum Zitat Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)CrossRef Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)CrossRef
10.
Zurück zum Zitat Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)CrossRef Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)CrossRef
11.
Zurück zum Zitat Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Rec. 34(1), 39–44 (2005)CrossRef Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Rec. 34(1), 39–44 (2005)CrossRef
13.
Zurück zum Zitat Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho André, C.P.L.F., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 13:1–13:31 (2013) Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho André, C.P.L.F., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 13:1–13:31 (2013)
14.
Zurück zum Zitat Sun, L., Guo, C.: Incremental affinity propagation clustering based on message passing. IEEE Trans. Knowl. Data Eng. 26(11), 2731–2744 (2014)CrossRef Sun, L., Guo, C.: Incremental affinity propagation clustering based on message passing. IEEE Trans. Knowl. Data Eng. 26(11), 2731–2744 (2014)CrossRef
15.
Zurück zum Zitat Wan, L., Ng, W.K., Dang, X.H., Yu, P.S., Zhang, K.: Density-based clustering of data streams at multiple resolutions. ACM Trans. Knowl. Discov. Data 3(3), 14:1–14:28 (2009) Wan, L., Ng, W.K., Dang, X.H., Yu, P.S., Zhang, K.: Density-based clustering of data streams at multiple resolutions. ACM Trans. Knowl. Discov. Data 3(3), 14:1–14:28 (2009)
16.
Zurück zum Zitat Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. SIGMOD Rec. 25(2), 103–114 (1996)CrossRef Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. SIGMOD Rec. 25(2), 103–114 (1996)CrossRef
17.
Zurück zum Zitat Zhang, X., Furtlehner, C., Germain-Renaud, C., Sebag, M.: Data stream clustering with affinity propagation. IEEE Trans. Knowl. Data Eng. 26(7), 1644–1656 (2014)CrossRef Zhang, X., Furtlehner, C., Germain-Renaud, C., Sebag, M.: Data stream clustering with affinity propagation. IEEE Trans. Knowl. Data Eng. 26(7), 1644–1656 (2014)CrossRef
18.
Zurück zum Zitat Zhou, A., Cao, F., Qian, W., Jin, C.: Tracking clusters in evolving data streams over sliding windows. Knowl. Inf. Syst. 15(2), 181–214 (2008)CrossRef Zhou, A., Cao, F., Qian, W., Jin, C.: Tracking clusters in evolving data streams over sliding windows. Knowl. Inf. Syst. 15(2), 181–214 (2008)CrossRef
Metadaten
Titel
Partition-Based Clustering with Sliding Windows for Data Streams
verfasst von
Jonghem Youn
Jihun Choi
Junho Shim
Sang-goo Lee
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-55699-4_18