Skip to main content
Top
Published in: Knowledge and Information Systems 1/2013

01-07-2013 | Regular Paper

Out-of-core detection of periodicity from sequence databases

Authors: Faraz Rasheed, Muhaimenul Adnan, Reda Alhajj

Published in: Knowledge and Information Systems | Issue 1/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we address the scalability problem of periodicity detection for time series and sequence databases. We present time and space efficient periodicity detection method that efficiently uses external memory (disk) when the series cannot be processed inside the available main memory. Our approach uses suffix tree to facilitate periodicity detection. We consider two cases, namely in-core and out of core. First, we optimize storage requirements of the suffix tree to be able to fit larger suffix trees in-core. This guarantees the ability to mine larger sequences when everything can be kept in-core, which is what the current periodicity detection approaches are able to mine. Second, when the data structures go out of core, we extend the suffix tree construction part to use external memory. We are able to achieve this while maintaining linear time complexity. As a result, when we go out of core, we can mine databases that require considerably larger space to keep the data structures compared to the available main memory. For the out-of-core periodicity detection part, the proposed method allows the required data structures to be kept on external memory partially when a memory overflow situation occurs. Various pruning strategies are also proposed to allow the proposed approach to process large sequences within reasonable amount of time. Additionally, we introduced the notion of “emulated tree traversal” for fast suffix tree traversal. Due to all these special considerations, we are able to mine much larger sequences compared to other existing periodicity detection algorithms. To demonstrate the applicability, power, and effectiveness of the proposed framework, we present results of periodicity detection up to 500 MB of time sequence data, which (to the best of our knowledge) is the largest reported sequence mined for periodicity detection ever.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ahdesmaki M, Lahdesmaki H, Pearson R, Huttunen H, Yli-Harja O (2005) Robust detection of periodic time series measured from biological systems. BMC Bioinform 6:117CrossRef Ahdesmaki M, Lahdesmaki H, Pearson R, Huttunen H, Yli-Harja O (2005) Robust detection of periodic time series measured from biological systems. BMC Bioinform 6:117CrossRef
2.
go back to reference Al-Rawi A, Lansari A, Bouslama F (2003) A new non-recursive algorithm for binary search tree traversal. In: Proceedings of 10th IEEE international conference on electronics, circuits and systems ICECS, vol 2. UAE, pp 770–773 (Dec 2003) Al-Rawi A, Lansari A, Bouslama F (2003) A new non-recursive algorithm for binary search tree traversal. In: Proceedings of 10th IEEE international conference on electronics, circuits and systems ICECS, vol 2. UAE, pp 770–773 (Dec 2003)
3.
go back to reference Barsky M, Stege U, Thomo A, Upton C (2009) Suffix trees for very large genomic sequences. In: Proceeding of the 18th ACM conference on information and knowledge management (CIKM ’09). New York, NY, USA, pp 1417–1420 Barsky M, Stege U, Thomo A, Upton C (2009) Suffix trees for very large genomic sequences. In: Proceeding of the 18th ACM conference on information and knowledge management (CIKM ’09). New York, NY, USA, pp 1417–1420
4.
go back to reference Bedathur SJ, Haritsa JR (2004) Engineering a fast online persistent suffix tree construction. In: Proceedings of the 20th international conference on data, engineering Bedathur SJ, Haritsa JR (2004) Engineering a fast online persistent suffix tree construction. In: Proceedings of the 20th international conference on data, engineering
5.
go back to reference Bentley J (1999) Programming pearls, 2nd edn. Addison-Wesley Professional, Reading Bentley J (1999) Programming pearls, 2nd edn. Addison-Wesley Professional, Reading
6.
go back to reference Berberidis C, Aref W, Atallah M, Vlahavas I, Elmagarmid A (2002) Multiple and partial periodicity mining in time series databases. In: Proceedings of European conference on artificial intelligence (July 2002) Berberidis C, Aref W, Atallah M, Vlahavas I, Elmagarmid A (2002) Multiple and partial periodicity mining in time series databases. In: Proceedings of European conference on artificial intelligence (July 2002)
7.
go back to reference Branch JW, Giannella C, Szymanski B, Wolff R, Kargupta H (2012) In-network outlier detection in wireless sensor networks. Knowl Inform Syst 18. doi:10.1007/s10115-011-0474-5 (January 2012, Online First) Branch JW, Giannella C, Szymanski B, Wolff R, Kargupta H (2012) In-network outlier detection in wireless sensor networks. Knowl Inform Syst 18. doi:10.​1007/​s10115-011-0474-5 (January 2012, Online First)
8.
go back to reference Cheung C-F, Yu JX, Lu H (2005) Constructing suffix tree for Gigabyte sequences with Megabyte memory. IEEE Trans Knowl Data Eng 17(1):90–105CrossRef Cheung C-F, Yu JX, Lu H (2005) Constructing suffix tree for Gigabyte sequences with Megabyte memory. IEEE Trans Knowl Data Eng 17(1):90–105CrossRef
9.
go back to reference Elfeky MG, Aref WG, Elmagarmid AK (2005) Periodicity detection in time series databases. IEEE Trans Knowl Data Eng 17(7):875–887CrossRef Elfeky MG, Aref WG, Elmagarmid AK (2005) Periodicity detection in time series databases. IEEE Trans Knowl Data Eng 17(7):875–887CrossRef
10.
go back to reference Elfeky MG, Aref WG, Elmagarmid AK (2005) WARP: time warping for periodicity detection. In: Proceedings of IEEE international conference of data mining (Nov 2005) Elfeky MG, Aref WG, Elmagarmid AK (2005) WARP: time warping for periodicity detection. In: Proceedings of IEEE international conference of data mining (Nov 2005)
11.
go back to reference Fayolle J, Ward MD (2005) Analysis of the average depth in a suffix tree under a Markov model. In: Martnez C (ed) 2005 International conference on analysis of algorithms. Discrete mathematics and theoretical computer ccience proceedings AD, pp 95–104 Fayolle J, Ward MD (2005) Analysis of the average depth in a suffix tree under a Markov model. In: Martnez C (ed) 2005 International conference on analysis of algorithms. Discrete mathematics and theoretical computer ccience proceedings AD, pp 95–104
12.
go back to reference Garcia ACB, Bentes C, de Melo RHC, Zadrozny B, Penna TJP (2011) Sensor data analysis for equipment monitoring. Knowl Inform Syst 28(2):333–364CrossRef Garcia ACB, Bentes C, de Melo RHC, Zadrozny B, Penna TJP (2011) Sensor data analysis for equipment monitoring. Knowl Inform Syst 28(2):333–364CrossRef
13.
go back to reference Glynn EF, Chen J, Mushegian AR (2006) Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics 22(3):310–316CrossRef Glynn EF, Chen J, Mushegian AR (2006) Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics 22(3):310–316CrossRef
14.
go back to reference Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, CambridgeMATHCrossRef Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, CambridgeMATHCrossRef
15.
go back to reference Han J, Gong W, Yin Y (1998) Mining segment-wise periodic patterns in time related databases. In: Proceedings of ACM international conference on knowledge discovery and data mining (Aug 1998) Han J, Gong W, Yin Y (1998) Mining segment-wise periodic patterns in time related databases. In: Proceedings of ACM international conference on knowledge discovery and data mining (Aug 1998)
16.
go back to reference Han J, Yin Y, Dong G (1999) Efficient mining of partial periodic patterns in time series database. In: Proceedings of IEEE international conference on data, engineering, p 106 Han J, Yin Y, Dong G (1999) Efficient mining of partial periodic patterns in time series database. In: Proceedings of IEEE international conference on data, engineering, p 106
17.
go back to reference Huang K-Y, Chang C-H (June 2005) SMCA: a general model for mining asynchronous periodic patterns in temporal databases. IEEE Trans Knowl Data Eng 17(6):774–785 Huang K-Y, Chang C-H (June 2005) SMCA: a general model for mining asynchronous periodic patterns in temporal databases. IEEE Trans Knowl Data Eng 17(6):774–785
18.
go back to reference Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: Proceedings of the international conference on very large databases, VLDB (Sept 2000) Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: Proceedings of the international conference on very large databases, VLDB (Sept 2000)
19.
go back to reference Koknar-Tezel S, Latecki LJ (2011) Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inform Syst 28(1):1–23CrossRef Koknar-Tezel S, Latecki LJ (2011) Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inform Syst 28(1):1–23CrossRef
20.
go back to reference Kurtz S (1999) Reducing the space requirement of suffix trees. Softw Pract Exp 29(13):1149–1171CrossRef Kurtz S (1999) Reducing the space requirement of suffix trees. Softw Pract Exp 29(13):1149–1171CrossRef
21.
go back to reference Lahiri M, Berger-Wolf TY (2010) Periodic subgraph mining in dynamic networks. Knowl Inform Syst 24(3):467–497CrossRef Lahiri M, Berger-Wolf TY (2010) Periodic subgraph mining in dynamic networks. Knowl Inform Syst 24(3):467–497CrossRef
22.
go back to reference Lee G, Yang W, Lee J-M (2006) A parallel algorithm for mining multiple partial periodic patterns. Elsevier J Inform Sci 176:3591–3609MathSciNetMATHCrossRef Lee G, Yang W, Lee J-M (2006) A parallel algorithm for mining multiple partial periodic patterns. Elsevier J Inform Sci 176:3591–3609MathSciNetMATHCrossRef
23.
go back to reference Nelson M (1996) Fast string searching with suffix trees. Dr. Dobb’s Journal Nelson M (1996) Fast string searching with suffix trees. Dr. Dobb’s Journal
24.
go back to reference Phoophakdee B, Zaki MJ (2007) Genome-scale disk-based suffix tree indexing. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data (SIGMOD ’07), pp 833–844 Phoophakdee B, Zaki MJ (2007) Genome-scale disk-based suffix tree indexing. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data (SIGMOD ’07), pp 833–844
25.
go back to reference Rasheed F, Alshalalfa M, Alhajj R (2011) Efficient periodicity mining in time series databases using suffix tree. IEEE Trans Knowl Data Eng (TKDE) 23(1):79–94CrossRef Rasheed F, Alshalalfa M, Alhajj R (2011) Efficient periodicity mining in time series databases using suffix tree. IEEE Trans Knowl Data Eng (TKDE) 23(1):79–94CrossRef
26.
go back to reference Reznik YA (2002) On tries, suffix trees, and universal variable-length-to-block codes. In: Proceedings of IEEE international symposium on information theory, p 123 Reznik YA (2002) On tries, suffix trees, and universal variable-length-to-block codes. In: Proceedings of IEEE international symposium on information theory, p 123
27.
go back to reference Sheng C, Hsu W, Lee M-L (2012) Mining dense periodic patterns in time series data. In: Proceedings of IEEE international conference on data, engineering, p 115 Sheng C, Hsu W, Lee M-L (2012) Mining dense periodic patterns in time series data. In: Proceedings of IEEE international conference on data, engineering, p 115
28.
go back to reference Tian Y, Tata S, Hankins RA, Patel JM (Sep. 2005) Practical methods for constructing suffix trees. VLDB J 14(3):281–299 Tian Y, Tata S, Hankins RA, Patel JM (Sep. 2005) Practical methods for constructing suffix trees. VLDB J 14(3):281–299
30.
go back to reference Valimaki N, Gerlach W, Dixit K, Makinen V (2007) Compressed suffix tree—a basis for genome-scale sequence analysis. Bioinformatics 23:629–630CrossRef Valimaki N, Gerlach W, Dixit K, Makinen V (2007) Compressed suffix tree—a basis for genome-scale sequence analysis. Bioinformatics 23:629–630CrossRef
31.
go back to reference Weigend A, Gershenfeld N (1994) Time series prediction: forecasting the future and understanding the past. Addison-Wesley, Reading Weigend A, Gershenfeld N (1994) Time series prediction: forecasting the future and understanding the past. Addison-Wesley, Reading
32.
go back to reference Wong S-S, Sung W-K, Wong L (2007) CPS-tree: a compact partitioned suffix tree for diskAbased indexing on large genome sequences. In: International conference on data engineering (ICDE), pp 1350–1354 Wong S-S, Sung W-K, Wong L (2007) CPS-tree: a compact partitioned suffix tree for diskAbased indexing on large genome sequences. In: International conference on data engineering (ICDE), pp 1350–1354
33.
go back to reference Yang J, Wang W, Philip SY (2001) Infominer: mining surprising periodic patterns. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD, pp 395–400 Yang J, Wang W, Philip SY (2001) Infominer: mining surprising periodic patterns. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD, pp 395–400
34.
go back to reference Yankov D, Keogh E, Rebbapragada U (2008) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl Inform Syst 17(2):241–262CrossRef Yankov D, Keogh E, Rebbapragada U (2008) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl Inform Syst 17(2):241–262CrossRef
Metadata
Title
Out-of-core detection of periodicity from sequence databases
Authors
Faraz Rasheed
Muhaimenul Adnan
Reda Alhajj
Publication date
01-07-2013
Publisher
Springer-Verlag
Published in
Knowledge and Information Systems / Issue 1/2013
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-012-0546-1

Other articles of this Issue 1/2013

Knowledge and Information Systems 1/2013 Go to the issue

Premium Partner