Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 4/2020

30.05.2020

An ultra-fast time series distance measure to allow data mining in more complex real-world deployments

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

At their core, many time series data mining algorithms reduce to reasoning about the shapes of time series subsequences. This requires an effective distance measure, and for last two decades most algorithms use Euclidean distance or DTW as their core subroutine. We argue that these distance measures are not as robust as the community seems to believe. The undue faith in these measures perhaps derives from an overreliance on the benchmark datasets and self-selection bias. The community is simply reluctant to address more difficult domains, for which current distance measures are ill-suited. In this work, we introduce a novel distance measure MPdist. We show that our proposed distance measure is much more robust than current distance measures. For example, it can handle data with missing values or spurious regions. Furthermore, it allows us to successfully mine datasets that would defeat any Euclidean or DTW distance-based algorithm. Additionally, we show that our distance measure can be computed so efficiently as to allow analytics on very fast arriving streams.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
There are several variants of LCSS proposed (under this, and other names). This is the more general explanation of such methods.
 
2
This idea is simply the cross correlation, K-Shape is an algorithm that uses cross correlation (Paparrizos and Gravano 2015). However, we abuse terminology a little here to be consistent with the emerging literature.
 
3
This is for the case where the query length is not known ahead of time. If the query length is known, it may be possible to have a faster index-based search.
 
Literatur
Zurück zum Zitat Abanda A, Mori U, Lozano JA (2019) A review on distance based time series classification. Data Min Knowl Disc 33(2):378–412MathSciNetCrossRef Abanda A, Mori U, Lozano JA (2019) A review on distance based time series classification. Data Min Knowl Disc 33(2):378–412MathSciNetCrossRef
Zurück zum Zitat Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering—a decade review. Inf Syst 53:16–38CrossRef Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering—a decade review. Inf Syst 53:16–38CrossRef
Zurück zum Zitat Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Future Gen Comput Syst 55:278–288CrossRef Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Future Gen Comput Syst 55:278–288CrossRef
Zurück zum Zitat Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Disc 31(3):606–660MathSciNetCrossRef Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Disc 31(3):606–660MathSciNetCrossRef
Zurück zum Zitat Batista GE, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of the 2011 SIAM international conference on data mining, pp 699–710 Batista GE, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of the 2011 SIAM international conference on data mining, pp 699–710
Zurück zum Zitat Berndt DJ, Clifford J (1994). Using dynamic time warping to find patterns in time series. In: KDD workshop, vol 10, pp 359–70 Berndt DJ, Clifford J (1994). Using dynamic time warping to find patterns in time series. In: KDD workshop, vol 10, pp 359–70
Zurück zum Zitat Darvishzadeh A, Entezari N, Stahovich T (2018) Finding the answer: techniques for locating students’ answers in handwritten problem solutions. In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR), pp 587–592 Darvishzadeh A, Entezari N, Stahovich T (2018) Finding the answer: techniques for locating students’ answers in handwritten problem solutions. In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR), pp 587–592
Zurück zum Zitat Dau HA, Begum N, Keogh E (2016) Semi-supervision dramatically improves time series clustering under dynamic time warping. In: 25th ACM international on conference on information and knowledge management, pp 999–1008 Dau HA, Begum N, Keogh E (2016) Semi-supervision dramatically improves time series clustering under dynamic time warping. In: 25th ACM international on conference on information and knowledge management, pp 999–1008
Zurück zum Zitat Dau HA, Silva DF, Petitjean F, Forestier G, Bagnall A, Keogh E (2017) Judicious setting of dynamic time warping’s window width allows more accurate classification of time series. In: 2017 IEEE international conference on big data (big data), pp 917–22 Dau HA, Silva DF, Petitjean F, Forestier G, Bagnall A, Keogh E (2017) Judicious setting of dynamic time warping’s window width allows more accurate classification of time series. In: 2017 IEEE international conference on big data (big data), pp 917–22
Zurück zum Zitat Dau HA, Bagnall A, Kamgar K, Yeh C-CM, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh E (2019) The UCR time series archive. IEEE/CAA J Automat Sin 6(6):1293–1305CrossRef Dau HA, Bagnall A, Kamgar K, Yeh C-CM, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh E (2019) The UCR time series archive. IEEE/CAA J Automat Sin 6(6):1293–1305CrossRef
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
Zurück zum Zitat Guillame-Bert M, Dubrawski A (2017) Classification of time sequences using graphs of temporal constraints. J Mach Learn Res 18(1):4370–4403MathSciNet Guillame-Bert M, Dubrawski A (2017) Classification of time sequences using graphs of temporal constraints. J Mach Learn Res 18(1):4370–4403MathSciNet
Zurück zum Zitat Haigh, Karen Zita, Wendy Foslien, and Valerie Guralnik. 2004. “Visual Query Language: Finding Patterns in and Relationships among Time Series Data.” in Seventh Workshop on Mining Scientific and Engineering Datasets. Vol. 24 Haigh, Karen Zita, Wendy Foslien, and Valerie Guralnik. 2004. “Visual Query Language: Finding Patterns in and Relationships among Time Series Data.” in Seventh Workshop on Mining Scientific and Engineering Datasets. Vol. 24
Zurück zum Zitat Honaker J, King G (2010) What to do about missing values in time-series cross-section data. Am J Polit Sci 54(2):561–581CrossRef Honaker J, King G (2010) What to do about missing values in time-series cross-section data. Am J Polit Sci 54(2):561–581CrossRef
Zurück zum Zitat Hu B, Chen Y, Zakaria J, Ulanova L, Keogh E (2013) Classification of multi-dimensional streaming time series by weighting each classifier’s track record. In: 2013 IEEE 13th international conference on data mining, pp 281–90 Hu B, Chen Y, Zakaria J, Ulanova L, Keogh E (2013) Classification of multi-dimensional streaming time series by weighting each classifier’s track record. In: 2013 IEEE 13th international conference on data mining, pp 281–90
Zurück zum Zitat Hu B, Chen Y, Keogh E (2016) Classification of streaming time series under more realistic assumptions. Data Min Knowl Disc 30(2):403–437MathSciNetCrossRef Hu B, Chen Y, Keogh E (2016) Classification of streaming time series under more realistic assumptions. Data Min Knowl Disc 30(2):403–437MathSciNetCrossRef
Zurück zum Zitat Imani S, Madrid F, Ding W, Crouter S, Keogh E (2018) Matrix profile XIII: time series snippets: a new primitive for time series data mining. In: 2018 IEEE international conference on big knowledge (ICBK), pp 382–89 Imani S, Madrid F, Ding W, Crouter S, Keogh E (2018) Matrix profile XIII: time series snippets: a new primitive for time series data mining. In: 2018 IEEE international conference on big knowledge (ICBK), pp 382–89
Zurück zum Zitat Jin S, Chen ZM, Backus EA, Sun XL, Xiao B (2012) Characterization of EPG waveforms for the tea green leafhopper, Empoasca Vitis Göthe (Hemiptera: cicadellidae), on tea plants and their correlation with stylet activities. J Insect Physiol 58(9):1235–1244CrossRef Jin S, Chen ZM, Backus EA, Sun XL, Xiao B (2012) Characterization of EPG waveforms for the tea green leafhopper, Empoasca Vitis Göthe (Hemiptera: cicadellidae), on tea plants and their correlation with stylet activities. J Insect Physiol 58(9):1235–1244CrossRef
Zurück zum Zitat Mauck K (2018) Personal communication Mauck K (2018) Personal communication
Zurück zum Zitat Mei J, Liu M, Wang Y-F, Gao H (2015) Learning a mahalanobis distance-based dynamic time warping measure for multivariate time series classification. IEEE Trans Cybern 46(6):1363–1374CrossRef Mei J, Liu M, Wang Y-F, Gao H (2015) Learning a mahalanobis distance-based dynamic time warping measure for multivariate time series classification. IEEE Trans Cybern 46(6):1363–1374CrossRef
Zurück zum Zitat Moskovitch R, Shahar Y (2015) Classification-driven temporal discretization of multivariate time series. Data Min Knowl Disc 29(4):871–913MathSciNetCrossRef Moskovitch R, Shahar Y (2015) Classification-driven temporal discretization of multivariate time series. Data Min Knowl Disc 29(4):871–913MathSciNetCrossRef
Zurück zum Zitat Murray D, Liao J, Stankovic L, Stankovic V, Hauxwell-Baldwin R, Wilson C, Coleman M, Kane T, Firth S (2015) A data management platform for personalised real-time energy feedback. In: The 8th international conference on energy efficiency in domestic appliances and lighting (EEDAL), pp 1293–1307 Murray D, Liao J, Stankovic L, Stankovic V, Hauxwell-Baldwin R, Wilson C, Coleman M, Kane T, Firth S (2015) A data management platform for personalised real-time energy feedback. In: The 8th international conference on energy efficiency in domestic appliances and lighting (EEDAL), pp 1293–1307
Zurück zum Zitat Paparrizos J, Gravano L (2015) K-shape: efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp 1855–1870 Paparrizos J, Gravano L (2015) K-shape: efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp 1855–1870
Zurück zum Zitat Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 262–70 Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 262–70
Zurück zum Zitat Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630CrossRef Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630CrossRef
Zurück zum Zitat Ruutu JPO, Kilkki MK, Nokia Networks Oy (2000) System and method employing last occurrence and sliding window technique for determining minimum and maximum values. U.S. Patent 6,023,453 Ruutu JPO, Kilkki MK, Nokia Networks Oy (2000) System and method employing last occurrence and sliding window technique for determining minimum and maximum values. U.S. Patent 6,023,453
Zurück zum Zitat Sarangi SR, Murthy K (2010) DUST: a generalized notion of similarity between uncertain time series. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 383–392 Sarangi SR, Murthy K (2010) DUST: a generalized notion of similarity between uncertain time series. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 383–392
Zurück zum Zitat Schäfer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Disc 29(6):1505–1530MathSciNetCrossRef Schäfer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Disc 29(6):1505–1530MathSciNetCrossRef
Zurück zum Zitat Sengupta S (2015) Multidimensional time series classification and its application to video activity recognition. Ulster University, Derry Sengupta S (2015) Multidimensional time series classification and its application to video activity recognition. Ulster University, Derry
Zurück zum Zitat Serrà J, Arcos JL (2016) Particle swarm optimization for time series motif discovery. Knowl Based Syst 92:127–137CrossRef Serrà J, Arcos JL (2016) Particle swarm optimization for time series motif discovery. Knowl Based Syst 92:127–137CrossRef
Zurück zum Zitat Weng X, Shen J (2008a) Classification of multivariate time series using locality preserving projections. Knowl Based Syst 21(7):581–587CrossRef Weng X, Shen J (2008a) Classification of multivariate time series using locality preserving projections. Knowl Based Syst 21(7):581–587CrossRef
Zurück zum Zitat Weng X, Shen J (2008b) Classification of multivariate time series using two-dimensional singular value decomposition. Knowl Based Syst 21(7):535–539CrossRef Weng X, Shen J (2008b) Classification of multivariate time series using two-dimensional singular value decomposition. Knowl Based Syst 21(7):535–539CrossRef
Zurück zum Zitat Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 947–956 Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 947–956
Zurück zum Zitat Yeh MY, Wu KL, Yu PS, Chen MS (2009) PROUD: a probabilistic approach to processing similarity queries over uncertain data streams. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 684–95 Yeh MY, Wu KL, Yu PS, Chen MS (2009) PROUD: a probabilistic approach to processing similarity queries over uncertain data streams. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 684–95
Zurück zum Zitat Yeh CCM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau YA, Silva DF, Mueen A, Keogh E (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 1317–22 Yeh CCM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau YA, Silva DF, Mueen A, Keogh E (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 1317–22
Zurück zum Zitat Yi X, Zheng Y, Zhang J, Li T (2016) ST-MVL: filling missing values in geo-sensory time series data Yi X, Zheng Y, Zhang J, Li T (2016) ST-MVL: filling missing values in geo-sensory time series data
Zurück zum Zitat Zhang A, Song S, Wang J, Yu PS (2017) Time series data cleaning: from anomaly detection to anomaly repairing. Proc VLDB Endowm 10(10):1046–1057CrossRef Zhang A, Song S, Wang J, Yu PS (2017) Time series data cleaning: from anomaly detection to anomaly repairing. Proc VLDB Endowm 10(10):1046–1057CrossRef
Zurück zum Zitat Zhu Y, Zimmerman Z, Senobari NS, Yeh CCM, Funning G, Mueen A, Brisk P, Keogh E (2016) Matrix profile Ii: exploiting a novel algorithm and Gpus to break the one hundred million barrier for time series motifs and joins. In 2016 IEEE 16th international conference on data mining (ICDM), pp 739–48 Zhu Y, Zimmerman Z, Senobari NS, Yeh CCM, Funning G, Mueen A, Brisk P, Keogh E (2016) Matrix profile Ii: exploiting a novel algorithm and Gpus to break the one hundred million barrier for time series motifs and joins. In 2016 IEEE 16th international conference on data mining (ICDM), pp 739–48
Metadaten
Titel
An ultra-fast time series distance measure to allow data mining in more complex real-world deployments
Publikationsdatum
30.05.2020
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 4/2020
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-020-00695-8

Weitere Artikel der Ausgabe 4/2020

Data Mining and Knowledge Discovery 4/2020 Zur Ausgabe