Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 6/2015

01.11.2015

The BOSS is concerned with time series classification in the presence of noise

verfasst von: Patrick Schäfer

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 6/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Similarity search is one of the most important and probably best studied methods for data mining. In the context of time series analysis it reaches its limits when it comes to mining raw datasets. The raw time series data may be recorded at variable lengths, be noisy, or are composed of repetitive substructures. These build a foundation for state of the art search algorithms. However, noise has been paid surprisingly little attention to and is assumed to be filtered as part of a preprocessing step carried out by a human. Our Bag-of-SFA-Symbols (BOSS) model combines the extraction of substructures with the tolerance to extraneous and erroneous data using a noise reducing representation of the time series. We show that our BOSS ensemble classifier improves the best published classification accuracies in diverse application areas and on the official UCR classification benchmark datasets by a large margin.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The BIDMC congestive heart failure database: http://​www.​physionet.​org/​physiobank/​database/​chfdb/​. Accessed 2014.
 
2
UCR Time Series Classification/Clustering Homepage: http://​www.​cs.​ucr.​edu/​~eamonn/​time_​series_​data. Accessed 2014.
 
3
CMU Graphics Lab Motion Capture Database: http://​mocap.​cs.​cmu.​edu/​. Accessed 2014.
 
Literatur
Zurück zum Zitat Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Foundations of Data Organization and Algorithms. Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Foundations of Data Organization and Algorithms.
Zurück zum Zitat Albrecht S, Cumming I, Dudas J (1997) The momentary fourier transformation derived from recursive matrix transformations. In: IEEE Digital Signal Processing Proceedings. Albrecht S, Cumming I, Dudas J (1997) The momentary fourier transformation derived from recursive matrix transformations. In: IEEE Digital Signal Processing Proceedings.
Zurück zum Zitat Bagnall A, Davis LM, Hills J, Lines J (2012) Transformation based ensembles for time series classification. In: SDM. SIAM/Omnipress. Bagnall A, Davis LM, Hills J, Lines J (2012) Transformation based ensembles for time series classification. In: SDM. SIAM/Omnipress.
Zurück zum Zitat Batista G, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: SDM. SIAM/Omnipress. Batista G, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: SDM. SIAM/Omnipress.
Zurück zum Zitat Chen Q, Chen L, Lian X, Liu Y, Yu JX (2007) Indexable PLA for efficient similarity search. In: VLDB. ACM. Chen Q, Chen L, Lian X, Liu Y, Yu JX (2007) Indexable PLA for efficient similarity search. In: VLDB. ACM.
Zurück zum Zitat Ding H (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. VLDB Endowment. Ding H (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. VLDB Endowment.
Zurück zum Zitat Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. In: SDM. Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. In: SDM.
Zurück zum Zitat Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286.
Zurück zum Zitat Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the 8th KDD, ACM, pp. 102–111. Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the 8th KDD, ACM, pp. 102–111.
Zurück zum Zitat Kumar N, Lolla VN, Keogh EJ, Lonardi S (2005) Ratanamahatana, C.A.: Time-series bitmaps: a practical visualization tool for working with large time series databases. In: SDM. Kumar N, Lolla VN, Keogh EJ, Lonardi S (2005) Ratanamahatana, C.A.: Time-series bitmaps: a practical visualization tool for working with large time series databases. In: SDM.
Zurück zum Zitat Lin J, Keogh EJ, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144. Lin J, Keogh EJ, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144.
Zurück zum Zitat Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315. Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315.
Zurück zum Zitat Mueen A, Keogh EJ, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: KDD. ACM. Mueen A, Keogh EJ, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: KDD. ACM.
Zurück zum Zitat Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: ACM SIGKDD. ACM. Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: ACM SIGKDD. ACM.
Zurück zum Zitat Rakthanmanon T, Campana BJL, Mueen A, Batista GEAPA, Westover M, Zakaria J, Keogh EJ (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: KDD. ACM. Rakthanmanon T, Campana BJL, Mueen A, Batista GEAPA, Westover M, Zakaria J, Keogh EJ (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: KDD. ACM.
Zurück zum Zitat Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: SDM. Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: SDM.
Zurück zum Zitat Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 1:43–49CrossRef Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 1:43–49CrossRef
Zurück zum Zitat Schäfer P, Högqvist M (2012) SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In: EDBT. ACM. Schäfer P, Högqvist M (2012) SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In: EDBT. ACM.
Zurück zum Zitat Senin P, Malinchik S (2013) SAX-VSM: Interpretable time series classification using SAX and vector space model. In: IEEE 13th International Conference on Data Mining (ICDM) 2013. Senin P, Malinchik S (2013) SAX-VSM: Interpretable time series classification using SAX and vector space model. In: IEEE 13th International Conference on Data Mining (ICDM) 2013.
Zurück zum Zitat Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: ICDE, San Jose. Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: ICDE, San Jose.
Zurück zum Zitat Liao Warren T (2005) Clustering of time series data-a survey. Pattern Recognit 38(11):1857–1874CrossRefMATH Liao Warren T (2005) Clustering of time series data-a survey. Pattern Recognit 38(11):1857–1874CrossRefMATH
Zurück zum Zitat Ye L, Keogh EJ (2009) Time series shapelets: a new primitive for data mining. In: KDD. ACM. Ye L, Keogh EJ (2009) Time series shapelets: a new primitive for data mining. In: KDD. ACM.
Zurück zum Zitat Ye L, Keogh EJ (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1–2):149–182. Ye L, Keogh EJ (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1–2):149–182.
Zurück zum Zitat Zakaria J, Mueen A, Keogh EJ (2012) Clustering time series using unsupervised-shapelets. In: ICDM. IEEE Computer Society. Zakaria J, Mueen A, Keogh EJ (2012) Clustering time series using unsupervised-shapelets. In: ICDM. IEEE Computer Society.
Metadaten
Titel
The BOSS is concerned with time series classification in the presence of noise
verfasst von
Patrick Schäfer
Publikationsdatum
01.11.2015
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 6/2015
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-014-0377-7

Weitere Artikel der Ausgabe 6/2015

Data Mining and Knowledge Discovery 6/2015 Zur Ausgabe

Premium Partner