Skip to main content
Log in

A survey of temporal data mining

  • Published:
Sadhana Aims and scope Submit manuscript

Abstract

Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining. We mainly concentrate on algorithms for pattern discovery in sequential data streams. We also describe some recent results regarding statistical analysis of pattern discovery methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal R, Srikant R 1994 Fast algorithms for mining association rules in large databases. InProc. 20th Int. Conf. on Very Large Data Bases, pp 487–499

  • Agrawal R, Srikant R 1995 Mining sequential patterns. InProc. 11th Int. Conf. on Data Engineering, (Washington, DC: IEEE Comput. Soc.)

    Google Scholar 

  • Agrawal R, Imielinski T, Swami 1993 A Mining association rules between sets of items in large databases. InProc. ACM SIGMOD Conf. on Management of Data, pp 207–216

  • Agrawal R, Lin K I, Sawhney H S, Shim K 1995a Fast similarity search in the presence of noise, scaling and translation in time series databases. InProc. 21st Int. Conf. on Very Large Data Bases (VLDB95), pp 490–501

  • Agrawal R, Psaila G, Wimmers E L, Zait M 1995b Querying shapes of histories. InProc. 21st Int. Conf. on Very Large Databases, Zurich, Switzerland

  • Alon J, Sclaroff S, Kollios G, Pavlovic V 2003 Discovering clusters in motion time series data. InProc. 2003 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition, pp I-375–I-381, Madison, Wisconsin

  • Alur R, Dill D L 1994 A theory of timed automata.Theor. Comput. Sci. 126: 183–235

    Article  MATH  MathSciNet  Google Scholar 

  • Atallah M J, Gwadera R, Szpankowski W 2004 Detection of significant sets of episodes in event sequences. InProc. 4th IEEE Int. Conf. on Data Mining (ICDM 2004), pp 3–10, Brighton, UK

  • Baeza-Yates R A 1991 Searching subsequences.Theor. Comput. Sci. 78: 363–376

    Article  MATH  MathSciNet  Google Scholar 

  • Baldi P, Chauvin Y, Hunkapiller T, McClure M 1994 Hidden Markov models of biological primary sequence information.Proc. Nat. Acad. Sci. USA 91: 1059–1063

    Article  Google Scholar 

  • Bender E A, Kochman F 1993 The distribution of subword counts is usually normal.Eur. J. Combinatorics 14: 265–275

    Article  MATH  MathSciNet  Google Scholar 

  • Berberidis C, Vlahavas I P, Aref W G, Atallah M J, Elmagarmid A K 2002 On the discovery of weak periodicities in large time series. InLecture notes in computer science, Proc. 6th Eur. Conf. on Principles of Data Mining and Knowledge Discovery, vol. 2431, pp 51–61

    Google Scholar 

  • Bettini C, Wang X S, Jajodia S, Lin J L 1998 Discovering frequent event patterns with multiple granularities in time sequences.IEEE Trans. Knowledge Data Eng. 10: 222–237

    Article  Google Scholar 

  • Box G E P, Jenkins G M, Reinsel G C 1994Time series analysis: Forecasting and control (Singapore: Pearson Education Inc.)

    MATH  Google Scholar 

  • Cadez I, Heckerman D, Meek C, Smyth P, White S 2000 Model-based clustering and visualisation of navigation patterns on a web site. Technical Report CA 92717-3425, Dept. of Information and Computer Science, University of California, Irvine, CA

    Google Scholar 

  • Cao H, Cheung D W, Mamoulis N 2004 Discovering partial periodic patterns in discrete data sequences. InProc. 8th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD ’04), Sydney, pp 653–658

  • Casas-Garriga G 2003 Discovering unbounded episodes in sequential data. InProc. 7th Eur. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD’03), Cavtat-Dubvrovnik, Croatia, pp 83–94

  • Chang S F, Chen W, Men J, Sundaram H, Zhong D 1998 A fully automated content based video search engine supporting spatio-temporal queries.IEEE Trans. Circuits Syst. Video Technol. 8(5): 602–615

    Article  Google Scholar 

  • Chatfield C 1996The analysis of time series 5th edn (New York, NY: Chapman and Hall)

    MATH  Google Scholar 

  • Chudova D, Smyth P 2002 Pattern discovery in sequences under a Markovian assumption. InProc. Eigth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada

  • Cohen J 2004 Bioinformatics — an introduction for computer scientists.ACM Comput. Surv. 36(2): 122–158

    Article  Google Scholar 

  • Corpet F 1988 Multiple sequence alignment with hierarchical clustering.Nucleic Acids Research, 16: 10881–10890

    Article  Google Scholar 

  • Darrell T, Pentland A 1993 Space-time gestures. InProc. 1993 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR’93), pp 335–340

  • Dietterich T G, Michalski R S 1985 Discovering patterns in sequences of events.Artif. Intell. 25: 187–232

    Article  Google Scholar 

  • Duda R O, Hart P E, Stork D G 1997Pattern classification and scene analysis (New York: Wiley)

    Google Scholar 

  • Durbin R, Eddy S, Krogh A, Mitchison G 1998Biological sequence analysis (Cambridge: University Press)

    MATH  Google Scholar 

  • Ewens W J, Grant G R 2001Statistical methods in bioinformatics: An introduction (New York: Springer-Verlag)

    MATH  Google Scholar 

  • Fadili M J, Ruan S, Bloyet D, Mazoyer B 2000 A multistep unsupervised fuzzy clustering analysis of fMRI time series.Human Brain Mapping 10: 160–178

    Article  Google Scholar 

  • Flajolet P, Guivarc’h Y, Szpankowski W, Vallee B 2001 Hidden pattern statistics. InLecture notes in computer science; Proc. 28th Int. Colloq. on Automata, Languages and Programming (London: Springer-Verlag) vol. 2076, pp 152–165

    Google Scholar 

  • Frenkel K A 1991 The human genome project and informatics.Commun. ACM 34(11): 40–51

    Article  Google Scholar 

  • Garofalakis M, Rastogi R, Shim K 2002 Mining sequential patterns with regular expression constraints.IEEE Trans. Knowledge Data Eng. 14: 530–552

    Article  Google Scholar 

  • Ghias A, Logan J, Chamberlin D, Smith B C 1995 Query by humming — musical information retrieval in an audio database. InProc. ACM Multimedia 95, San Fransisco, CA

  • Gold B, Morgan N 2000Speech and audio signal processing: Processing and perception of speech and music (New York: John Wiley & Sons)

    Google Scholar 

  • Gray R M, Buzo A, Gray Jr. A H, Matsuyama Y 1980 Distortion measures for speech processing.IEEE Trans. Acoust., Speech Signal Process. 28: 367–376

    Article  MATH  Google Scholar 

  • Gusfield D 1997A lgorithms on strings, trees and subsequences (New York: University of Cambridge Press)

    Google Scholar 

  • Gwadera R, Atallah M J, Szpankowski W 2003 Reliable detection of episodes in event sequences. InProc. 3rd IEEE Int. Conf. on Data Mining (ICDM 2003), pp 67–74

  • Gwadera R, Atallah M J, Szpankowski W 2005 Markov models for identification of significant episodes. InProc. 2005 SIAM Int. Conf. on Data Mining (SDM-05), Newport Beach, California

  • Han J, Kamber M 2001Data mining: Concepts and techniques (San Fransisco, CA: Morgan Kauffmann)

    Google Scholar 

  • Han J, Gong W, Yin Y 1998 Mining segment-wise periodic patterns in time-related databases. InProc. 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD’98), New York, pp 214–218

  • Han J, Dong G, Yin Y 1999 Efficient mining of partial periodic patterns in time series database. InProc. 15th Int. Conf. on Data Engineering, (ICDE’99), Sydney, pp 106–115

  • Hand D, Mannila H, Smyth P 2001Principles of data mining (Cambridge, MA: MIT Press)

    Google Scholar 

  • Haselsteiner E, Pfurtscheller G 2000 Using time-dependent neural networks for EEG classification.IEEE Trans. Rehab. Eng. 8: 457–463

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J 2001The elements of statistical learning: Data mining, inference and prediction (New York: Springer-Verlag)

    MATH  Google Scholar 

  • Haykin S 1992Neural networks: A comprehensive foundation (New York: Macmillan)

    Google Scholar 

  • Hirao M, Inenaga S, Shinohara A, Takeda M, Arikawa S 2001 A practical algorithm to find the best episode patterns.Lecture notes in computer science; Proc. 4th Int. Conf. on Discovery Science (DS 2001) Washington, DC, vol. 2226, pp 435–441, 25–28

  • Juang B H, Rabiner L 1993Fundamentals of speech recognition. (Englewood Cliffs, NJ: Prentice Hall)

    Google Scholar 

  • Kalpakis K, Puttagunta D G V 2001 Distance measures for effective clustering of ARIMA time series. In2001 IEEE Int. Conf. on Data Mining (ICDM01), San Jose, CA

  • Keogh E J, Pazzani M J 2000 Scaling up dynamic time warping for datamining applications. InProc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data mining, Boston, MA, pp 285–289, 20–23

  • Koskela T, Lehtokangas M, Saarinen J, Kaski K 1996 Time series prediction with multilayer perceptron, FIR and Elman neural networks. InProc. World Congress on Neural Networks, pp 491–496

  • Kruskal J B 1983 An overview of sequence comparison: Time warps, string edits and macromolecules.SIAM Rev. 21:201–237

    Article  MathSciNet  Google Scholar 

  • Kundu A, He Y, Bahl P 1988 Word recognition and word hypothesis generation for handwritten script: A Hidden Markov Model based approach. InProc. 1988 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR’88), pp 457–462

  • Law M H, Kwok J T 2000 Rival penalized competitive learning for model-based sequence clustering. InProc. IEEE Int. Conf. on Pattern Recognition (ICPR00), Barcelona, Spain

  • Laxman S, Sastry P S, Unnikrishnan K P 2002 Generalized frequent episodes in event sequences.Temporal Data Mining Workshop Notes, SIGKDD, (eds) K P Unnikrishnan, R Uthurusamy, Edmonton, Alberta, Canada

  • Laxman S, Sastry P S, Unnikrishnan K P 2004a Fast algorithms for frequent episode discovery in event sequences. Technical Report CL-2004-04/MSR, GM R&D Center, Warren

    Google Scholar 

  • Laxman S, Sastry P S, Unnikrishnan K P 2004b Fast algorithms for frequent episode discovery in event sequences. InProc. 3rd Workshop on Mining Temporal and Sequential Data, Seattle, WA

  • Laxman S, Sastry P S, Unnikrishnan K P 2005 Discovering frequent episodes and learning hidden markov models: A formal connection.IEEE Trans. Knowledge Data Eng. 17: 1505–1517

    Article  Google Scholar 

  • Lee C-H, Chen M-S, Lin C-R 2003 Progressive pattern miner: An efficient algorithm for mining general temporal association rules.IEEE Trans. Knowledge Data Eng. 15: 1004–1017

    Article  Google Scholar 

  • Levenshtein VI 1966 Binary codes capable of correcting deletions, insertions and reversals.Sov. Phys. Dokl. 10: 707–710

    MathSciNet  Google Scholar 

  • Lin M-Y, Lee S-Y 2003 Improving the efficiency of interactive sequential pattern mining by incremental pattern discovery. InProc. IEEE 36th Annu. Hawaii Int. Conf. on System Sciences (HICSS03), Big Island, Hawaii

  • Ma S, Hellerstein J L 2001 Mining partially periodic event patterns with unknown periods. InProc. 17th Int. Conf. on Data Eng. (ICDE’01), pp 205–214

  • Mannila H, Rusakov D 2001 Decomposition of event sequences into independent components. InFirst SIAM Int. Conf. on Data Mining, Chicago, IL

  • Mannila H, Toivonen H, Verkamo A I 1997 Discovery of frequent episodes in event sequences.Data Mining Knowledge Discovery 1: 259–289

    Article  Google Scholar 

  • Meger N, Rigotti C 2004 Constraint-based mining of episode rules and optimal window sizes. InProc. 8th Eur. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD ’04), Pisa, Italy

  • Miller R T, Christoffels A G, Gopalakrishnan C, Burke J, Ptitsyn A A, Broveak T R, Hide W A 1999 A comprehensive approach to clustering of expressed human gene sequence: The sequence tag alignment and consensus knowledge base.Genome Res. 9: 1143–1155

    Article  Google Scholar 

  • Miller W, Schwartz S, Hardison R C 1994 A point of contact between computer science and molecular biology.IEEE Comput. Sci. Eng. 1: 69–78

    Article  Google Scholar 

  • Nag R, Wong K H, Fallside F 1986 Script recognition using Hidden Markov Models. InProc. 1986 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP’86), pp 2071–2074

  • Nalwa V S 1997 Automatic on-line signature verification.Proc. IEEE 85: 215–239

    Article  Google Scholar 

  • Ng R T, Lakshmanan L V S, Han J, Pang A 1998 Exploratory mining and pruning optimizations of constrained associations rules. InProc. 1998 ACM SIGMOD Int. Conf. on Management of Data, (Seattle, Washington) pp 13–24

  • Oates T, Firoiu L, Cohen P R 2001 Using dynamic time warping to bootstrap HMM-based clustering of time series. InLecture notes in computer science; Sequence learning: Paradigms, algorithms, and applications (eds) C L Giles, R Sun (Heidelberg: Springer-Verlag) vol. 1828, p. 35

    Google Scholar 

  • Osata Net al 2002 A computer-based method of selecting clones for a full-length cDNA project: Simulataneous collection of negligibly redundant and variant cDNAs.Genome Res. 12: 1127–1134

    Article  Google Scholar 

  • O’Shaughnessy D 2003Speech communications: Human and machine (Piscataway, NJ: IEEE Press)

    Google Scholar 

  • Ozden B, Ramaswamy S, Silberschatz A 1998 Cyclic association rules. InProc. 14th Int. Conf. on Data Engineering (ICDE’98), Orlando, Florida, pp 412–421

  • Pasquier N, Bastide Y, Taouil R, Lakhal L 1999 Discovering frequent closed itemsets for association rules. InLecture notes in computer science; Proc. 7th Int. Conf. on Database Theory (ICDT99), Jerusalem, Israel, vol. 1540, pp 398–416

    Google Scholar 

  • Perng C-S, Wang H, Zhang S R, Parker D S 2000 Landmarks: A new model for similarity-based pattern querying in time series databases. In16th Int. Conf. on Data Engineering (ICDE00), p. 33, San Diego, CA

  • Pevzner P A, Borodovski M Y, Mironov A A 1989 Linguistic of nucleotide sequences: The significance of deviation from mean statistical characteristics and prediction of the frequencies of occurrence of words.J. Biomol. Struct. Dynamics 6: 1013–1026

    Google Scholar 

  • Rabiner L R 1989 A tutorial on hidden Markov models and selected applications in speech recognition.Proc. IEEE 77: 257–286

    Article  Google Scholar 

  • Regnier M, Szpankowski W 1998 On pattern frequency occurrences in a Markovian sequence.Algorithmica 22: 631–649

    Article  MATH  MathSciNet  Google Scholar 

  • Schreiber T, Schmitz A 1997 Classification of time series data with nonlinear similarity measures.Phys. Rev. Lett. 79: 1475–1478

    Article  Google Scholar 

  • Sclaroff S, Kollios G, Betke M, Rosales R 2001 Motion mining. InLecture notes in computer science;Proc. 2nd Int. Workshop on Multimedia Databases and Image Communication (Heidelberg: Springer-Verlag)

    Google Scholar 

  • Sebastiani P, Ramoni M, Cohen P, Warwick J, Davis J 1999 Discovering dynamics using bayesian clustering. InLecture notes in computer science;Adv. in Intelligent Data Analysis: 3rd Int. Symp., IDA-99 (Heidelberg: Springer-Verlag) vol. 1642, p. 199

    Google Scholar 

  • Shintani T, Kitsuregawa M 1998 Mining algorithms for sequential patterns in parallel: Hash based approach. InProc. 2nd Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp 283–294

  • Smyth P 1997 Clustering sequences with hidden Markov models.Adv. Neural Inf. Process. 9: 648–655

    MathSciNet  Google Scholar 

  • Smyth P 2001 Data mining at the interface of computer science and statistics. InData mining for scientific and engineering applications. (eds) R L Grossman, C Kamath, P Kegelmeyer, V Kumar, R R Namburu (Dordrecht: Kluwer Academic)

    Google Scholar 

  • Srikanth R, Agrawal R 1996 Mining sequential patterns: Generalizations and performance improvements. InProc. 5th Int. Conf. on Extending Database Technology (EDBT), Avignon, France

  • Starner T E, Pentland A 1995 Visual recognition of American sign language. InProc. 1995 Int. Workshop on Face and Gesture Recognition, Zurich

  • Sutton R S 1988 Learning to predict by method of temporal differences.Machine Learning 3(1): 9–44

    Google Scholar 

  • Sze S H, Gelfand M S, Pevzner P A 2002 Finding weak motifs in DNA sequences. InProc. 2002 Pacific Symposium on Biocomputing, pp 235–246

  • Tappert C C, Suen C Y, Wakahara T 1990 The state of the art in on-line handwriting recognition.IEEE Trans. Pattern Anal. Machine Intell. 12: 787–808

    Article  Google Scholar 

  • Tino P, Schittenkopf C, Dorffner G 2000 Temporal pattern recognition in noisy non-stationary time series based on quantization into symbolic streams: Lessons learned from financial volatility trading (url:citeseer.nj.nec.com/tino00temporal.html)

  • Tronicek Z 2001 Episode matching. InProc. 12th Annu. Symp. on Combinatorial Pattern Matching (CPM 2001), Jerusalem, Israel, vol. 2089, pp 143–146

    Google Scholar 

  • Wan E A 1990 Temporal backpropagation for FIR neural networks. InInt. Joint Conf. on Neural Networks (1990 IJCNN), vol. 1, pp 575–580

    Article  Google Scholar 

  • Wang J T-L, Chirn G-W, Marr T G, Shapiro B, Shasha D, Zhang K 1994 Combinatorial pattern discovery for scientific data: some preliminary results. InProc. 1994 ACM SIGMOD Int. Conf. on Management of Data, Minneapolis, Minnesota, pp 115–125

  • Wang J, Han J 2004 BIDE: Efficient mining of frequent closed sequences. In20th Int. Conf. on Data Engineering, Boston, MA

  • Witten I H, Frank E 2000Data mining: Practical machine learning tools and techniques with JAVA implementations (San Fransisco, CA: Morgan Kaufmann)

    Google Scholar 

  • Wu C, Berry M, Shivakumar S, McLarty J 1995 Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition.Machine Learning, Special issue on applications in molecular biology 21(1–2): 177–193 Wu S, Manber U 1992 Fast text searching allowing errors.Commun. ACM 35(10): 83–91

    Google Scholar 

  • Wu Y-L, Agrawal D, Abbadi A E 2000 A comparison of DFT and DWT based similarity search in time series databases. InProc. Ninth Int. Conf. on Information and Knowledge Management, McLean, VA, pp 488–495

  • Xiong Y, Yeung D Y 2002 Mixtures of ARMA models for model-based time series clustering. In2002 IEEE Int. Conf. on Data Mining, Maebashi City, Japan, pp 717–720

  • Yamato J, Ohya J, Ishii K 1992 Recognizing human action in time-sequential images using Hidden Markov Model. InProc. 1992 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR’92), Champaign, IL, pp 379–385

  • Yan X, Han J, Afshar R 2003 CloSpan: Mining closed sequential patterns in large datasets. InProc. 2003 Int. SIAM Conf. on Data Mining (SDM03), San Fransisco, CA

  • Yule G 1927 On a method of investigating periodicity in distributed series with special reference to Wolfer’s sunspot numbers.Philos. Trans. R. Soc. London A226

  • Zaki M J 1998 Efficient enumeration of frequent sequences. InProc. ACM 7th Int. Conf. Information and Knowledge Management (CIKM)

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laxman, S., Sastry, P.S. A survey of temporal data mining. Sadhana 31, 173–198 (2006). https://doi.org/10.1007/BF02719780

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02719780

Keywords

Navigation