Skip to main content
Erschienen in: Datenbank-Spektrum 2/2018

25.05.2018 | Schwerpunktbeitrag

Data Change Exploration Using Time Series Clustering

verfasst von: Leon Bornemann, Tobias Bleifuß, Dmitri Kalashnikov, Felix Naumann, Divesh Srivastava

Erschienen in: Datenbank-Spektrum | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Analysis of static data is one of the best studied research areas. However, data changes over time. These changes may reveal patterns or groups of similar values, properties, and entities. We study changes in large, publicly available data repositories by modelling them as time series and clustering these series by their similarity. In order to perform change exploration on real-world data we use the publicly available revision data of Wikipedia Infoboxes and weekly snapshots of IMDB.
The changes to the data are captured as events, which we call change records. In order to extract temporal behavior we count changes in time periods and propose a general transformation framework that aggregates groups of changes to numerical time series of different resolutions. We use these time series to study different application scenarios of unsupervised clustering. Our explorative results show that changes made to collaboratively edited data sources can help find characteristic behavior, distinguish entities or properties and provide insight into the respective domains.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering–a decade review. Inf Syst 53:16–38CrossRef Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering–a decade review. Inf Syst 53:16–38CrossRef
2.
Zurück zum Zitat Alfonseca E, Garrido G, Delort J, Peñas A (2013) WHAD: Wikipedia historical attributes data – historical structured data extraction and vandalism detection from the Wikipedia edit history. Lang Resour Eval 47(4):1163–1190CrossRef Alfonseca E, Garrido G, Delort J, Peñas A (2013) WHAD: Wikipedia historical attributes data – historical structured data extraction and vandalism detection from the Wikipedia edit history. Lang Resour Eval 47(4):1163–1190CrossRef
3.
Zurück zum Zitat Bleifuss T, Johnson T, Kalashnikov DV, Naumann F, Shkapenyuk V, Srivastava D (2017) Enabling change exploration (vision). Fourth International Workshop on Exploratory Search in Databases and the Web (ExploreDB), pp 1–3 Bleifuss T, Johnson T, Kalashnikov DV, Naumann F, Shkapenyuk V, Srivastava D (2017) Enabling change exploration (vision). Fourth International Workshop on Exploratory Search in Databases and the Web (ExploreDB), pp 1–3
4.
Zurück zum Zitat Cetintemel U, Cherniack M, DeBrabant J, Diao Y, Dimitriadou K, Kalinin A, Papaemmanouil O, Zdonik SB (2013) Query steering for interactive data exploration. Conference on Innovative Data Systems Research (CIDR). Cetintemel U, Cherniack M, DeBrabant J, Diao Y, Dimitriadou K, Kalinin A, Papaemmanouil O, Zdonik SB (2013) Query steering for interactive data exploration. Conference on Innovative Data Systems Research (CIDR).
5.
Zurück zum Zitat Dasu T, Johnson T, Marathe A (2006) Database exploration using database dynamics. IEEE Data Eng Bull 29(2):43–59 Dasu T, Johnson T, Marathe A (2006) Database exploration using database dynamics. IEEE Data Eng Bull 29(2):43–59
6.
Zurück zum Zitat Deligiannidis L, Kochut KJ, Sheth AP (2007) Rdf data exploration and visualization. ACM first workshop on CyberInfrastructure: information management in eScience, pp 39–46 Deligiannidis L, Kochut KJ, Sheth AP (2007) Rdf data exploration and visualization. ACM first workshop on CyberInfrastructure: information management in eScience, pp 39–46
7.
Zurück zum Zitat Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Inf Sci (Ny) 239:142–153MathSciNetCrossRefMATH Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Inf Sci (Ny) 239:142–153MathSciNetCrossRefMATH
8.
Zurück zum Zitat Dividino RQ, Gottron T, Scherp A, Gröner G (2014) From changes to dynamics: dynamics analysis of linked open data sources. Proceedings of the Extended Semantic Web Conference (ESWC). Dividino RQ, Gottron T, Scherp A, Gröner G (2014) From changes to dynamics: dynamics analysis of linked open data sources. Proceedings of the Extended Semantic Web Conference (ESWC).
9.
Zurück zum Zitat Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77 Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77
10.
Zurück zum Zitat Fu T-C, Chung F-L, Luk R, Ng V (2001) Pattern discovery from stock time series using self-organizing maps. Workshop Notes of KDD 2001 Workshop on Temporal Data Mining, pp 26–29 Fu T-C, Chung F-L, Luk R, Ng V (2001) Pattern discovery from stock time series using self-organizing maps. Workshop Notes of KDD 2001 Workshop on Temporal Data Mining, pp 26–29
11.
Zurück zum Zitat Idreos S, Papaemmanouil O, Chaudhuri S (2015) Overview of data exploration techniques. International Conference on Management of Data (SIGMOD), pp 277–281 Idreos S, Papaemmanouil O, Chaudhuri S (2015) Overview of data exploration techniques. International Conference on Management of Data (SIGMOD), pp 277–281
12.
Zurück zum Zitat Iglesias F, Kastner W (2013) Analysis of similarity measures in times series clustering for the discovery of building energy patterns. Energies 6(2):579–597CrossRef Iglesias F, Kastner W (2013) Analysis of similarity measures in times series clustering for the discovery of building energy patterns. Energies 6(2):579–597CrossRef
13.
Zurück zum Zitat Keim DA, Kriegel HP (1994) VisDB: database exploration using multidimensional visualization. IEEE Comput Graph Appl 14(5):40–49CrossRef Keim DA, Kriegel HP (1994) VisDB: database exploration using multidimensional visualization. IEEE Comput Graph Appl 14(5):40–49CrossRef
14.
Zurück zum Zitat Li X, Li Z, Han J, Lee JG (2009) Temporal outlier detection in vehicle traffic data. International Conference on Data Engineering (ICDE), pp 1319–1322 Li X, Li Z, Han J, Lee JG (2009) Temporal outlier detection in vehicle traffic data. International Conference on Data Engineering (ICDE), pp 1319–1322
15.
Zurück zum Zitat Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144MathSciNetCrossRef Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144MathSciNetCrossRef
16.
Zurück zum Zitat Maule A, Emmerich W, Rosenblum DS (2008) Impact analysis of database schema changes. International Conference on Software Engineering (ICSE). ACM, New York, pp 451–460 Maule A, Emmerich W, Rosenblum DS (2008) Impact analysis of database schema changes. International Conference on Software Engineering (ICSE). ACM, New York, pp 451–460
17.
Zurück zum Zitat Mörchen F, Ultsch A, Hoos O (2005) Extracting interpretable muscle activation patterns with time series knowledge mining. Int J Knowledgebased Intell Eng Syst 9(3):197–208 Mörchen F, Ultsch A, Hoos O (2005) Extracting interpretable muscle activation patterns with time series knowledge mining. Int J Knowledgebased Intell Eng Syst 9(3):197–208
18.
Zurück zum Zitat Olszewski RT (2001) Generalized feature extraction for structural pattern recognition in time-series data. Tech. rep. Carnegie-Mellon University, School of Computer Science, Pittsburgh Olszewski RT (2001) Generalized feature extraction for structural pattern recognition in time-series data. Tech. rep. Carnegie-Mellon University, School of Computer Science, Pittsburgh
19.
Zurück zum Zitat Özsoyoglu G, Snodgrass RT (1995) Temporal and real-time databases: a survey. IEEE Trans Knowl Data Eng 7(4):513–532CrossRef Özsoyoglu G, Snodgrass RT (1995) Temporal and real-time databases: a survey. IEEE Trans Knowl Data Eng 7(4):513–532CrossRef
20.
Zurück zum Zitat Papavassiliou V, Flouris G, Fundulaki I, Kotzinos D, Christophides V (2009) On detecting high-level changes in RDF/S KBs. International Semantic Web Conference (ISWC), pp 473–488 Papavassiliou V, Flouris G, Fundulaki I, Kotzinos D, Christophides V (2009) On detecting high-level changes in RDF/S KBs. International Semantic Web Conference (ISWC), pp 473–488
21.
Zurück zum Zitat Petitjean F, Ketterlin A, Gançarski P (2011) A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit 44(3):678–693CrossRefMATH Petitjean F, Ketterlin A, Gançarski P (2011) A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit 44(3):678–693CrossRefMATH
22.
Zurück zum Zitat Ramoni M, Sebastiani P, Cohen P (2000) Multivariate clustering by dynamics. National Conference on Artificial Intelligence (AAAI), pp 633–638 Ramoni M, Sebastiani P, Cohen P (2000) Multivariate clustering by dynamics. National Conference on Artificial Intelligence (AAAI), pp 633–638
23.
Zurück zum Zitat Rebbapragada U, Protopapas P, Brodley CE, Alcock C (2009) Finding anomalous periodic time series. Mach Learn 74(3):281–313CrossRef Rebbapragada U, Protopapas P, Brodley CE, Alcock C (2009) Finding anomalous periodic time series. Mach Learn 74(3):281–313CrossRef
24.
Zurück zum Zitat Umbrich J, Decker S, Hausenblas M, Polleres A, Hogan A (2010) Towards dataset dynamics: change frequency of linked open data sources. International Workshop on Linked Data on the Web. Umbrich J, Decker S, Hausenblas M, Polleres A, Hogan A (2010) Towards dataset dynamics: change frequency of linked open data sources. International Workshop on Linked Data on the Web.
25.
Zurück zum Zitat Van Der Aalst W (2012) Process mining: overview and opportunities. ACM Trans Manag Inf Syst 3(2):7 Van Der Aalst W (2012) Process mining: overview and opportunities. ACM Trans Manag Inf Syst 3(2):7
26.
Zurück zum Zitat Velegrakis Y, Miller J, Popa L (2004) Preserving mapping consistency under schema changes. VLDB J 13(3):274–293CrossRef Velegrakis Y, Miller J, Popa L (2004) Preserving mapping consistency under schema changes. VLDB J 13(3):274–293CrossRef
27.
Zurück zum Zitat Xing Z, Pei J, Yu PS, Wang K (2011) Extracting interpretable features for early classification on time series. SIAM International Conference on Data Mining, pp 247–258 Xing Z, Pei J, Yu PS, Wang K (2011) Extracting interpretable features for early classification on time series. SIAM International Conference on Data Mining, pp 247–258
Metadaten
Titel
Data Change Exploration Using Time Series Clustering
verfasst von
Leon Bornemann
Tobias Bleifuß
Dmitri Kalashnikov
Felix Naumann
Divesh Srivastava
Publikationsdatum
25.05.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Datenbank-Spektrum / Ausgabe 2/2018
Print ISSN: 1618-2162
Elektronische ISSN: 1610-1995
DOI
https://doi.org/10.1007/s13222-018-0285-x

Weitere Artikel der Ausgabe 2/2018

Datenbank-Spektrum 2/2018 Zur Ausgabe

Dissertationen

Dissertationen

Editorial

Editorial