Skip to main content
Erschienen in: Knowledge and Information Systems 1/2013

01.07.2013 | Regular Paper

Out-of-core detection of periodicity from sequence databases

verfasst von: Faraz Rasheed, Muhaimenul Adnan, Reda Alhajj

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we address the scalability problem of periodicity detection for time series and sequence databases. We present time and space efficient periodicity detection method that efficiently uses external memory (disk) when the series cannot be processed inside the available main memory. Our approach uses suffix tree to facilitate periodicity detection. We consider two cases, namely in-core and out of core. First, we optimize storage requirements of the suffix tree to be able to fit larger suffix trees in-core. This guarantees the ability to mine larger sequences when everything can be kept in-core, which is what the current periodicity detection approaches are able to mine. Second, when the data structures go out of core, we extend the suffix tree construction part to use external memory. We are able to achieve this while maintaining linear time complexity. As a result, when we go out of core, we can mine databases that require considerably larger space to keep the data structures compared to the available main memory. For the out-of-core periodicity detection part, the proposed method allows the required data structures to be kept on external memory partially when a memory overflow situation occurs. Various pruning strategies are also proposed to allow the proposed approach to process large sequences within reasonable amount of time. Additionally, we introduced the notion of “emulated tree traversal” for fast suffix tree traversal. Due to all these special considerations, we are able to mine much larger sequences compared to other existing periodicity detection algorithms. To demonstrate the applicability, power, and effectiveness of the proposed framework, we present results of periodicity detection up to 500 MB of time sequence data, which (to the best of our knowledge) is the largest reported sequence mined for periodicity detection ever.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ahdesmaki M, Lahdesmaki H, Pearson R, Huttunen H, Yli-Harja O (2005) Robust detection of periodic time series measured from biological systems. BMC Bioinform 6:117CrossRef Ahdesmaki M, Lahdesmaki H, Pearson R, Huttunen H, Yli-Harja O (2005) Robust detection of periodic time series measured from biological systems. BMC Bioinform 6:117CrossRef
2.
Zurück zum Zitat Al-Rawi A, Lansari A, Bouslama F (2003) A new non-recursive algorithm for binary search tree traversal. In: Proceedings of 10th IEEE international conference on electronics, circuits and systems ICECS, vol 2. UAE, pp 770–773 (Dec 2003) Al-Rawi A, Lansari A, Bouslama F (2003) A new non-recursive algorithm for binary search tree traversal. In: Proceedings of 10th IEEE international conference on electronics, circuits and systems ICECS, vol 2. UAE, pp 770–773 (Dec 2003)
3.
Zurück zum Zitat Barsky M, Stege U, Thomo A, Upton C (2009) Suffix trees for very large genomic sequences. In: Proceeding of the 18th ACM conference on information and knowledge management (CIKM ’09). New York, NY, USA, pp 1417–1420 Barsky M, Stege U, Thomo A, Upton C (2009) Suffix trees for very large genomic sequences. In: Proceeding of the 18th ACM conference on information and knowledge management (CIKM ’09). New York, NY, USA, pp 1417–1420
4.
Zurück zum Zitat Bedathur SJ, Haritsa JR (2004) Engineering a fast online persistent suffix tree construction. In: Proceedings of the 20th international conference on data, engineering Bedathur SJ, Haritsa JR (2004) Engineering a fast online persistent suffix tree construction. In: Proceedings of the 20th international conference on data, engineering
5.
Zurück zum Zitat Bentley J (1999) Programming pearls, 2nd edn. Addison-Wesley Professional, Reading Bentley J (1999) Programming pearls, 2nd edn. Addison-Wesley Professional, Reading
6.
Zurück zum Zitat Berberidis C, Aref W, Atallah M, Vlahavas I, Elmagarmid A (2002) Multiple and partial periodicity mining in time series databases. In: Proceedings of European conference on artificial intelligence (July 2002) Berberidis C, Aref W, Atallah M, Vlahavas I, Elmagarmid A (2002) Multiple and partial periodicity mining in time series databases. In: Proceedings of European conference on artificial intelligence (July 2002)
7.
Zurück zum Zitat Branch JW, Giannella C, Szymanski B, Wolff R, Kargupta H (2012) In-network outlier detection in wireless sensor networks. Knowl Inform Syst 18. doi:10.1007/s10115-011-0474-5 (January 2012, Online First) Branch JW, Giannella C, Szymanski B, Wolff R, Kargupta H (2012) In-network outlier detection in wireless sensor networks. Knowl Inform Syst 18. doi:10.​1007/​s10115-011-0474-5 (January 2012, Online First)
8.
Zurück zum Zitat Cheung C-F, Yu JX, Lu H (2005) Constructing suffix tree for Gigabyte sequences with Megabyte memory. IEEE Trans Knowl Data Eng 17(1):90–105CrossRef Cheung C-F, Yu JX, Lu H (2005) Constructing suffix tree for Gigabyte sequences with Megabyte memory. IEEE Trans Knowl Data Eng 17(1):90–105CrossRef
9.
Zurück zum Zitat Elfeky MG, Aref WG, Elmagarmid AK (2005) Periodicity detection in time series databases. IEEE Trans Knowl Data Eng 17(7):875–887CrossRef Elfeky MG, Aref WG, Elmagarmid AK (2005) Periodicity detection in time series databases. IEEE Trans Knowl Data Eng 17(7):875–887CrossRef
10.
Zurück zum Zitat Elfeky MG, Aref WG, Elmagarmid AK (2005) WARP: time warping for periodicity detection. In: Proceedings of IEEE international conference of data mining (Nov 2005) Elfeky MG, Aref WG, Elmagarmid AK (2005) WARP: time warping for periodicity detection. In: Proceedings of IEEE international conference of data mining (Nov 2005)
11.
Zurück zum Zitat Fayolle J, Ward MD (2005) Analysis of the average depth in a suffix tree under a Markov model. In: Martnez C (ed) 2005 International conference on analysis of algorithms. Discrete mathematics and theoretical computer ccience proceedings AD, pp 95–104 Fayolle J, Ward MD (2005) Analysis of the average depth in a suffix tree under a Markov model. In: Martnez C (ed) 2005 International conference on analysis of algorithms. Discrete mathematics and theoretical computer ccience proceedings AD, pp 95–104
12.
Zurück zum Zitat Garcia ACB, Bentes C, de Melo RHC, Zadrozny B, Penna TJP (2011) Sensor data analysis for equipment monitoring. Knowl Inform Syst 28(2):333–364CrossRef Garcia ACB, Bentes C, de Melo RHC, Zadrozny B, Penna TJP (2011) Sensor data analysis for equipment monitoring. Knowl Inform Syst 28(2):333–364CrossRef
13.
Zurück zum Zitat Glynn EF, Chen J, Mushegian AR (2006) Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics 22(3):310–316CrossRef Glynn EF, Chen J, Mushegian AR (2006) Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics 22(3):310–316CrossRef
14.
Zurück zum Zitat Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, CambridgeMATHCrossRef Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, CambridgeMATHCrossRef
15.
Zurück zum Zitat Han J, Gong W, Yin Y (1998) Mining segment-wise periodic patterns in time related databases. In: Proceedings of ACM international conference on knowledge discovery and data mining (Aug 1998) Han J, Gong W, Yin Y (1998) Mining segment-wise periodic patterns in time related databases. In: Proceedings of ACM international conference on knowledge discovery and data mining (Aug 1998)
16.
Zurück zum Zitat Han J, Yin Y, Dong G (1999) Efficient mining of partial periodic patterns in time series database. In: Proceedings of IEEE international conference on data, engineering, p 106 Han J, Yin Y, Dong G (1999) Efficient mining of partial periodic patterns in time series database. In: Proceedings of IEEE international conference on data, engineering, p 106
17.
Zurück zum Zitat Huang K-Y, Chang C-H (June 2005) SMCA: a general model for mining asynchronous periodic patterns in temporal databases. IEEE Trans Knowl Data Eng 17(6):774–785 Huang K-Y, Chang C-H (June 2005) SMCA: a general model for mining asynchronous periodic patterns in temporal databases. IEEE Trans Knowl Data Eng 17(6):774–785
18.
Zurück zum Zitat Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: Proceedings of the international conference on very large databases, VLDB (Sept 2000) Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: Proceedings of the international conference on very large databases, VLDB (Sept 2000)
19.
Zurück zum Zitat Koknar-Tezel S, Latecki LJ (2011) Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inform Syst 28(1):1–23CrossRef Koknar-Tezel S, Latecki LJ (2011) Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inform Syst 28(1):1–23CrossRef
20.
Zurück zum Zitat Kurtz S (1999) Reducing the space requirement of suffix trees. Softw Pract Exp 29(13):1149–1171CrossRef Kurtz S (1999) Reducing the space requirement of suffix trees. Softw Pract Exp 29(13):1149–1171CrossRef
21.
Zurück zum Zitat Lahiri M, Berger-Wolf TY (2010) Periodic subgraph mining in dynamic networks. Knowl Inform Syst 24(3):467–497CrossRef Lahiri M, Berger-Wolf TY (2010) Periodic subgraph mining in dynamic networks. Knowl Inform Syst 24(3):467–497CrossRef
22.
Zurück zum Zitat Lee G, Yang W, Lee J-M (2006) A parallel algorithm for mining multiple partial periodic patterns. Elsevier J Inform Sci 176:3591–3609MathSciNetMATHCrossRef Lee G, Yang W, Lee J-M (2006) A parallel algorithm for mining multiple partial periodic patterns. Elsevier J Inform Sci 176:3591–3609MathSciNetMATHCrossRef
23.
Zurück zum Zitat Nelson M (1996) Fast string searching with suffix trees. Dr. Dobb’s Journal Nelson M (1996) Fast string searching with suffix trees. Dr. Dobb’s Journal
24.
Zurück zum Zitat Phoophakdee B, Zaki MJ (2007) Genome-scale disk-based suffix tree indexing. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data (SIGMOD ’07), pp 833–844 Phoophakdee B, Zaki MJ (2007) Genome-scale disk-based suffix tree indexing. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data (SIGMOD ’07), pp 833–844
25.
Zurück zum Zitat Rasheed F, Alshalalfa M, Alhajj R (2011) Efficient periodicity mining in time series databases using suffix tree. IEEE Trans Knowl Data Eng (TKDE) 23(1):79–94CrossRef Rasheed F, Alshalalfa M, Alhajj R (2011) Efficient periodicity mining in time series databases using suffix tree. IEEE Trans Knowl Data Eng (TKDE) 23(1):79–94CrossRef
26.
Zurück zum Zitat Reznik YA (2002) On tries, suffix trees, and universal variable-length-to-block codes. In: Proceedings of IEEE international symposium on information theory, p 123 Reznik YA (2002) On tries, suffix trees, and universal variable-length-to-block codes. In: Proceedings of IEEE international symposium on information theory, p 123
27.
Zurück zum Zitat Sheng C, Hsu W, Lee M-L (2012) Mining dense periodic patterns in time series data. In: Proceedings of IEEE international conference on data, engineering, p 115 Sheng C, Hsu W, Lee M-L (2012) Mining dense periodic patterns in time series data. In: Proceedings of IEEE international conference on data, engineering, p 115
28.
Zurück zum Zitat Tian Y, Tata S, Hankins RA, Patel JM (Sep. 2005) Practical methods for constructing suffix trees. VLDB J 14(3):281–299 Tian Y, Tata S, Hankins RA, Patel JM (Sep. 2005) Practical methods for constructing suffix trees. VLDB J 14(3):281–299
30.
Zurück zum Zitat Valimaki N, Gerlach W, Dixit K, Makinen V (2007) Compressed suffix tree—a basis for genome-scale sequence analysis. Bioinformatics 23:629–630CrossRef Valimaki N, Gerlach W, Dixit K, Makinen V (2007) Compressed suffix tree—a basis for genome-scale sequence analysis. Bioinformatics 23:629–630CrossRef
31.
Zurück zum Zitat Weigend A, Gershenfeld N (1994) Time series prediction: forecasting the future and understanding the past. Addison-Wesley, Reading Weigend A, Gershenfeld N (1994) Time series prediction: forecasting the future and understanding the past. Addison-Wesley, Reading
32.
Zurück zum Zitat Wong S-S, Sung W-K, Wong L (2007) CPS-tree: a compact partitioned suffix tree for diskAbased indexing on large genome sequences. In: International conference on data engineering (ICDE), pp 1350–1354 Wong S-S, Sung W-K, Wong L (2007) CPS-tree: a compact partitioned suffix tree for diskAbased indexing on large genome sequences. In: International conference on data engineering (ICDE), pp 1350–1354
33.
Zurück zum Zitat Yang J, Wang W, Philip SY (2001) Infominer: mining surprising periodic patterns. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD, pp 395–400 Yang J, Wang W, Philip SY (2001) Infominer: mining surprising periodic patterns. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD, pp 395–400
34.
Zurück zum Zitat Yankov D, Keogh E, Rebbapragada U (2008) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl Inform Syst 17(2):241–262CrossRef Yankov D, Keogh E, Rebbapragada U (2008) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl Inform Syst 17(2):241–262CrossRef
Metadaten
Titel
Out-of-core detection of periodicity from sequence databases
verfasst von
Faraz Rasheed
Muhaimenul Adnan
Reda Alhajj
Publikationsdatum
01.07.2013
Verlag
Springer-Verlag
Erschienen in
Knowledge and Information Systems / Ausgabe 1/2013
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-012-0546-1

Weitere Artikel der Ausgabe 1/2013

Knowledge and Information Systems 1/2013 Zur Ausgabe

Premium Partner