Skip to main content

2016 | OriginalPaper | Buchkapitel

An Effective Cluster Assignment Strategy for Large Time Series Data

verfasst von : Damir Mirzanurov, Waqas Nawaz, JooYoung Lee, Qiang Qu

Erschienen in: Web-Age Information Management

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The problem of clustering time series data is of importance to find similar groups of time series, e.g., identifying people who share similar mobility by analyzing their spatio-temporal trajectory data as time series. YADING is one of the most recent and efficient methods to cluster large-scale time series data, which mainly consists of sampling, clustering, and assigning steps. Given a set of processed time series entities, in the sampling step, YADING clusters are found by a density-based clustering method. Next, the left input data is assigned by computing the distance (or similarity) to the entities in the sampled data. Sorted Neighbors Graph (SNG) data structure is used to prune the similarity computation of all possible pairs of entities. However, it does not guarantee to choose the sampled time series with lower density and therefore results in deterioration of accuracy. To resolve this issue, we propose a strategy to order the SNG keys with respect to the density of clusters. The strategy improves the fast selection of time series entities with lower density. The extensive experiments show that our method achieves higher accuracy in terms of NMI than the baseline YADING algorithm. The results suggest that the order of SNG keys should be the same as the clustering phase. Furthermore, the findings also show interesting patterns in identifying density radiuses for clustering.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ding, R., Wang, Q., Dang, Y., Fu, Q., Zhang, H., Zhang, D.: Yading: fast clustering of large-scale time series data. Proc. VLDB Endowment 8(5), 473–484 (2015)CrossRef Ding, R., Wang, Q., Dang, Y., Fu, Q., Zhang, H., Zhang, D.: Yading: fast clustering of large-scale time series data. Proc. VLDB Endowment 8(5), 473–484 (2015)CrossRef
2.
Zurück zum Zitat Li, F., Li, H., Qu, Q.: Composite pattern query expression over medical data streams. In: BMEI, pp. 1–5 (2009) Li, F., Li, H., Qu, Q.: Composite pattern query expression over medical data streams. In: BMEI, pp. 1–5 (2009)
3.
Zurück zum Zitat Liu, S., Qu, Q., Chen, L., Ni, L.M.: SMC: A practical schema for privacy-preserved data sharing over distributed data streams. IEEE Trans. Big Data 1(2), 68–81 (2015)CrossRef Liu, S., Qu, Q., Chen, L., Ni, L.M.: SMC: A practical schema for privacy-preserved data sharing over distributed data streams. IEEE Trans. Big Data 1(2), 68–81 (2015)CrossRef
4.
Zurück zum Zitat Qu, Q., Li, H., Wang, L., Miao, G., Wei, X.: Online constrained pattern detection over streams. In: FSKD, pp. 66–70 (2009) Qu, Q., Li, H., Wang, L., Miao, G., Wei, X.: Online constrained pattern detection over streams. In: FSKD, pp. 66–70 (2009)
5.
Zurück zum Zitat Qu, Q., Liu, S., Jensen, C.S., Zhu, F., Faloutsos, C.: Interestingness-driven diffusion process summarization in dynamic networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part II. LNCS, vol. 8725, pp. 597–613. Springer, Heidelberg (2014) Qu, Q., Liu, S., Jensen, C.S., Zhu, F., Faloutsos, C.: Interestingness-driven diffusion process summarization in dynamic networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part II. LNCS, vol. 8725, pp. 597–613. Springer, Heidelberg (2014)
6.
Zurück zum Zitat Liao, T.W.: Clustering of time series data–a survey. Pattern Recogn. 38(11), 1857–1874 (2005)CrossRefMATH Liao, T.W.: Clustering of time series data–a survey. Pattern Recogn. 38(11), 1857–1874 (2005)CrossRefMATH
7.
Zurück zum Zitat Patterson, D.A., et al.: A simple way to estimate the cost of downtime. LISA 2, 185–188 (2002) Patterson, D.A., et al.: A simple way to estimate the cost of downtime. LISA 2, 185–188 (2002)
8.
Zurück zum Zitat Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34), 226–231 (1996) Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34), 226–231 (1996)
9.
Zurück zum Zitat Eppstein, D., Paterson, M.S., Yao, F.F.: On nearest-neighbor graphs. Discrete Computat. Geom. 17(3) 263–282 Eppstein, D., Paterson, M.S., Yao, F.F.: On nearest-neighbor graphs. Discrete Computat. Geom. 17(3) 263–282
10.
Zurück zum Zitat Qu, Q., Qiu, J., Sun, C., Wang, Y.: Graph-based knowledge representation model and pattern retrieval. In: FSKD, pp. 541–545 (2008) Qu, Q., Qiu, J., Sun, C., Wang, Y.: Graph-based knowledge representation model and pattern retrieval. In: FSKD, pp. 541–545 (2008)
11.
Zurück zum Zitat Zhu, F., Zhang, Z., Qu, Q.: A direct mining approach to efficient constrained graph pattern discovery. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22–27, pp. 821–832 (2013) Zhu, F., Zhang, Z., Qu, Q.: A direct mining approach to efficient constrained graph pattern discovery. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22–27, pp. 821–832 (2013)
12.
Zurück zum Zitat Piccolo, D.: A distance measure for classifying arima models. J. Time Ser. Anal. 11(2), 153–164 (1990)CrossRefMATH Piccolo, D.: A distance measure for classifying arima models. J. Time Ser. Anal. 11(2), 153–164 (1990)CrossRefMATH
13.
Zurück zum Zitat Tran, D., Wagner, M.: Fuzzy C-means clustering-based speaker verification. In: Pal, N.R., Sugeno, M. (eds.) AFSS 2002. LNCS (LNAI), vol. 2275, pp. 318–324. Springer, Heidelberg (2002)CrossRef Tran, D., Wagner, M.: Fuzzy C-means clustering-based speaker verification. In: Pal, N.R., Sugeno, M. (eds.) AFSS 2002. LNCS (LNAI), vol. 2275, pp. 318–324. Springer, Heidelberg (2002)CrossRef
14.
Zurück zum Zitat Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. In: VLDB (2000) Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. In: VLDB (2000)
15.
Zurück zum Zitat Yi, B.K., Jagadish, H., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In: Proceedings of the 14th International Conference on Data Engineering, 1998, pp. 201–208, February 1998 Yi, B.K., Jagadish, H., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In: Proceedings of the 14th International Conference on Data Engineering, 1998, pp. 201–208, February 1998
16.
Zurück zum Zitat Golay, X., Kollias, S., Stoll, G., Meier, D., Valavanis, A., Boesiger, P.: A new correlation-based fuzzy logic clustering algorithm for fmri. Magn. Reson. Med. 40(2), 249–260 (1998)CrossRef Golay, X., Kollias, S., Stoll, G., Meier, D., Valavanis, A., Boesiger, P.: A new correlation-based fuzzy logic clustering algorithm for fmri. Magn. Reson. Med. 40(2), 249–260 (1998)CrossRef
17.
Zurück zum Zitat Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)CrossRefMATH Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)CrossRefMATH
18.
Zurück zum Zitat Raymond, T.N., Han, J.: Effecient and effictive clustering methods for spatial data mining. In: Proceedings of the 20th International Conference on Very Large Data Bases (1994) Raymond, T.N., Han, J.: Effecient and effictive clustering methods for spatial data mining. In: Proceedings of the 20th International Conference on Very Large Data Bases (1994)
Metadaten
Titel
An Effective Cluster Assignment Strategy for Large Time Series Data
verfasst von
Damir Mirzanurov
Waqas Nawaz
JooYoung Lee
Qiang Qu
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-39958-4_26

Neuer Inhalt