Skip to main content
Top

2016 | OriginalPaper | Chapter

An Effective Cluster Assignment Strategy for Large Time Series Data

Authors : Damir Mirzanurov, Waqas Nawaz, JooYoung Lee, Qiang Qu

Published in: Web-Age Information Management

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The problem of clustering time series data is of importance to find similar groups of time series, e.g., identifying people who share similar mobility by analyzing their spatio-temporal trajectory data as time series. YADING is one of the most recent and efficient methods to cluster large-scale time series data, which mainly consists of sampling, clustering, and assigning steps. Given a set of processed time series entities, in the sampling step, YADING clusters are found by a density-based clustering method. Next, the left input data is assigned by computing the distance (or similarity) to the entities in the sampled data. Sorted Neighbors Graph (SNG) data structure is used to prune the similarity computation of all possible pairs of entities. However, it does not guarantee to choose the sampled time series with lower density and therefore results in deterioration of accuracy. To resolve this issue, we propose a strategy to order the SNG keys with respect to the density of clusters. The strategy improves the fast selection of time series entities with lower density. The extensive experiments show that our method achieves higher accuracy in terms of NMI than the baseline YADING algorithm. The results suggest that the order of SNG keys should be the same as the clustering phase. Furthermore, the findings also show interesting patterns in identifying density radiuses for clustering.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ding, R., Wang, Q., Dang, Y., Fu, Q., Zhang, H., Zhang, D.: Yading: fast clustering of large-scale time series data. Proc. VLDB Endowment 8(5), 473–484 (2015)CrossRef Ding, R., Wang, Q., Dang, Y., Fu, Q., Zhang, H., Zhang, D.: Yading: fast clustering of large-scale time series data. Proc. VLDB Endowment 8(5), 473–484 (2015)CrossRef
2.
go back to reference Li, F., Li, H., Qu, Q.: Composite pattern query expression over medical data streams. In: BMEI, pp. 1–5 (2009) Li, F., Li, H., Qu, Q.: Composite pattern query expression over medical data streams. In: BMEI, pp. 1–5 (2009)
3.
go back to reference Liu, S., Qu, Q., Chen, L., Ni, L.M.: SMC: A practical schema for privacy-preserved data sharing over distributed data streams. IEEE Trans. Big Data 1(2), 68–81 (2015)CrossRef Liu, S., Qu, Q., Chen, L., Ni, L.M.: SMC: A practical schema for privacy-preserved data sharing over distributed data streams. IEEE Trans. Big Data 1(2), 68–81 (2015)CrossRef
4.
go back to reference Qu, Q., Li, H., Wang, L., Miao, G., Wei, X.: Online constrained pattern detection over streams. In: FSKD, pp. 66–70 (2009) Qu, Q., Li, H., Wang, L., Miao, G., Wei, X.: Online constrained pattern detection over streams. In: FSKD, pp. 66–70 (2009)
5.
go back to reference Qu, Q., Liu, S., Jensen, C.S., Zhu, F., Faloutsos, C.: Interestingness-driven diffusion process summarization in dynamic networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part II. LNCS, vol. 8725, pp. 597–613. Springer, Heidelberg (2014) Qu, Q., Liu, S., Jensen, C.S., Zhu, F., Faloutsos, C.: Interestingness-driven diffusion process summarization in dynamic networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part II. LNCS, vol. 8725, pp. 597–613. Springer, Heidelberg (2014)
6.
go back to reference Liao, T.W.: Clustering of time series data–a survey. Pattern Recogn. 38(11), 1857–1874 (2005)CrossRefMATH Liao, T.W.: Clustering of time series data–a survey. Pattern Recogn. 38(11), 1857–1874 (2005)CrossRefMATH
7.
go back to reference Patterson, D.A., et al.: A simple way to estimate the cost of downtime. LISA 2, 185–188 (2002) Patterson, D.A., et al.: A simple way to estimate the cost of downtime. LISA 2, 185–188 (2002)
8.
go back to reference Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34), 226–231 (1996) Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34), 226–231 (1996)
9.
go back to reference Eppstein, D., Paterson, M.S., Yao, F.F.: On nearest-neighbor graphs. Discrete Computat. Geom. 17(3) 263–282 Eppstein, D., Paterson, M.S., Yao, F.F.: On nearest-neighbor graphs. Discrete Computat. Geom. 17(3) 263–282
10.
go back to reference Qu, Q., Qiu, J., Sun, C., Wang, Y.: Graph-based knowledge representation model and pattern retrieval. In: FSKD, pp. 541–545 (2008) Qu, Q., Qiu, J., Sun, C., Wang, Y.: Graph-based knowledge representation model and pattern retrieval. In: FSKD, pp. 541–545 (2008)
11.
go back to reference Zhu, F., Zhang, Z., Qu, Q.: A direct mining approach to efficient constrained graph pattern discovery. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22–27, pp. 821–832 (2013) Zhu, F., Zhang, Z., Qu, Q.: A direct mining approach to efficient constrained graph pattern discovery. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22–27, pp. 821–832 (2013)
12.
go back to reference Piccolo, D.: A distance measure for classifying arima models. J. Time Ser. Anal. 11(2), 153–164 (1990)CrossRefMATH Piccolo, D.: A distance measure for classifying arima models. J. Time Ser. Anal. 11(2), 153–164 (1990)CrossRefMATH
13.
go back to reference Tran, D., Wagner, M.: Fuzzy C-means clustering-based speaker verification. In: Pal, N.R., Sugeno, M. (eds.) AFSS 2002. LNCS (LNAI), vol. 2275, pp. 318–324. Springer, Heidelberg (2002)CrossRef Tran, D., Wagner, M.: Fuzzy C-means clustering-based speaker verification. In: Pal, N.R., Sugeno, M. (eds.) AFSS 2002. LNCS (LNAI), vol. 2275, pp. 318–324. Springer, Heidelberg (2002)CrossRef
14.
go back to reference Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. In: VLDB (2000) Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. In: VLDB (2000)
15.
go back to reference Yi, B.K., Jagadish, H., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In: Proceedings of the 14th International Conference on Data Engineering, 1998, pp. 201–208, February 1998 Yi, B.K., Jagadish, H., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In: Proceedings of the 14th International Conference on Data Engineering, 1998, pp. 201–208, February 1998
16.
go back to reference Golay, X., Kollias, S., Stoll, G., Meier, D., Valavanis, A., Boesiger, P.: A new correlation-based fuzzy logic clustering algorithm for fmri. Magn. Reson. Med. 40(2), 249–260 (1998)CrossRef Golay, X., Kollias, S., Stoll, G., Meier, D., Valavanis, A., Boesiger, P.: A new correlation-based fuzzy logic clustering algorithm for fmri. Magn. Reson. Med. 40(2), 249–260 (1998)CrossRef
17.
go back to reference Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)CrossRefMATH Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)CrossRefMATH
18.
go back to reference Raymond, T.N., Han, J.: Effecient and effictive clustering methods for spatial data mining. In: Proceedings of the 20th International Conference on Very Large Data Bases (1994) Raymond, T.N., Han, J.: Effecient and effictive clustering methods for spatial data mining. In: Proceedings of the 20th International Conference on Very Large Data Bases (1994)
Metadata
Title
An Effective Cluster Assignment Strategy for Large Time Series Data
Authors
Damir Mirzanurov
Waqas Nawaz
JooYoung Lee
Qiang Qu
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-39958-4_26