A survey of temporal data mining

Laxman, Srivatsan; Sastry, P. S.

doi:10.1007/BF02719780

A survey of temporal data mining

Published: April 2006

Volume 31, pages 173–198, (2006)
Cite this article

Sadhana Aims and scope Submit manuscript

Srivatsan Laxman¹ &
P. S. Sastry¹

2634 Accesses
176 Citations
6 Altmetric
Explore all metrics

Abstract

Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining. We mainly concentrate on algorithms for pattern discovery in sequential data streams. We also describe some recent results regarding statistical analysis of pattern discovery methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Agrawal R, Srikant R 1994 Fast algorithms for mining association rules in large databases. InProc. 20th Int. Conf. on Very Large Data Bases, pp 487–499
Agrawal R, Srikant R 1995 Mining sequential patterns. InProc. 11th Int. Conf. on Data Engineering, (Washington, DC: IEEE Comput. Soc.)
Google Scholar
Agrawal R, Imielinski T, Swami 1993 A Mining association rules between sets of items in large databases. InProc. ACM SIGMOD Conf. on Management of Data, pp 207–216
Agrawal R, Lin K I, Sawhney H S, Shim K 1995a Fast similarity search in the presence of noise, scaling and translation in time series databases. InProc. 21st Int. Conf. on Very Large Data Bases (VLDB95), pp 490–501
Agrawal R, Psaila G, Wimmers E L, Zait M 1995b Querying shapes of histories. InProc. 21st Int. Conf. on Very Large Databases, Zurich, Switzerland
Alon J, Sclaroff S, Kollios G, Pavlovic V 2003 Discovering clusters in motion time series data. InProc. 2003 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition, pp I-375–I-381, Madison, Wisconsin
Alur R, Dill D L 1994 A theory of timed automata.Theor. Comput. Sci. 126: 183–235
Article MATH MathSciNet Google Scholar
Atallah M J, Gwadera R, Szpankowski W 2004 Detection of significant sets of episodes in event sequences. InProc. 4th IEEE Int. Conf. on Data Mining (ICDM 2004), pp 3–10, Brighton, UK
Baeza-Yates R A 1991 Searching subsequences.Theor. Comput. Sci. 78: 363–376
Article MATH MathSciNet Google Scholar
Baldi P, Chauvin Y, Hunkapiller T, McClure M 1994 Hidden Markov models of biological primary sequence information.Proc. Nat. Acad. Sci. USA 91: 1059–1063
Article Google Scholar
Bender E A, Kochman F 1993 The distribution of subword counts is usually normal.Eur. J. Combinatorics 14: 265–275
Article MATH MathSciNet Google Scholar
Berberidis C, Vlahavas I P, Aref W G, Atallah M J, Elmagarmid A K 2002 On the discovery of weak periodicities in large time series. InLecture notes in computer science, Proc. 6th Eur. Conf. on Principles of Data Mining and Knowledge Discovery, vol. 2431, pp 51–61
Google Scholar
Bettini C, Wang X S, Jajodia S, Lin J L 1998 Discovering frequent event patterns with multiple granularities in time sequences.IEEE Trans. Knowledge Data Eng. 10: 222–237
Article Google Scholar
Box G E P, Jenkins G M, Reinsel G C 1994Time series analysis: Forecasting and control (Singapore: Pearson Education Inc.)
MATH Google Scholar
Cadez I, Heckerman D, Meek C, Smyth P, White S 2000 Model-based clustering and visualisation of navigation patterns on a web site. Technical Report CA 92717-3425, Dept. of Information and Computer Science, University of California, Irvine, CA
Google Scholar
Cao H, Cheung D W, Mamoulis N 2004 Discovering partial periodic patterns in discrete data sequences. InProc. 8th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD ’04), Sydney, pp 653–658
Casas-Garriga G 2003 Discovering unbounded episodes in sequential data. InProc. 7th Eur. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD’03), Cavtat-Dubvrovnik, Croatia, pp 83–94
Chang S F, Chen W, Men J, Sundaram H, Zhong D 1998 A fully automated content based video search engine supporting spatio-temporal queries.IEEE Trans. Circuits Syst. Video Technol. 8(5): 602–615
Article Google Scholar
Chatfield C 1996The analysis of time series 5th edn (New York, NY: Chapman and Hall)
MATH Google Scholar
Chudova D, Smyth P 2002 Pattern discovery in sequences under a Markovian assumption. InProc. Eigth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada
Cohen J 2004 Bioinformatics — an introduction for computer scientists.ACM Comput. Surv. 36(2): 122–158
Article Google Scholar
Corpet F 1988 Multiple sequence alignment with hierarchical clustering.Nucleic Acids Research, 16: 10881–10890
Article Google Scholar
Darrell T, Pentland A 1993 Space-time gestures. InProc. 1993 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR’93), pp 335–340
Dietterich T G, Michalski R S 1985 Discovering patterns in sequences of events.Artif. Intell. 25: 187–232
Article Google Scholar
Duda R O, Hart P E, Stork D G 1997Pattern classification and scene analysis (New York: Wiley)
Google Scholar
Durbin R, Eddy S, Krogh A, Mitchison G 1998Biological sequence analysis (Cambridge: University Press)
MATH Google Scholar
Ewens W J, Grant G R 2001Statistical methods in bioinformatics: An introduction (New York: Springer-Verlag)
MATH Google Scholar
Fadili M J, Ruan S, Bloyet D, Mazoyer B 2000 A multistep unsupervised fuzzy clustering analysis of fMRI time series.Human Brain Mapping 10: 160–178
Article Google Scholar
Flajolet P, Guivarc’h Y, Szpankowski W, Vallee B 2001 Hidden pattern statistics. InLecture notes in computer science; Proc. 28th Int. Colloq. on Automata, Languages and Programming (London: Springer-Verlag) vol. 2076, pp 152–165
Google Scholar
Frenkel K A 1991 The human genome project and informatics.Commun. ACM 34(11): 40–51
Article Google Scholar
Garofalakis M, Rastogi R, Shim K 2002 Mining sequential patterns with regular expression constraints.IEEE Trans. Knowledge Data Eng. 14: 530–552
Article Google Scholar
Ghias A, Logan J, Chamberlin D, Smith B C 1995 Query by humming — musical information retrieval in an audio database. InProc. ACM Multimedia 95, San Fransisco, CA
Gold B, Morgan N 2000Speech and audio signal processing: Processing and perception of speech and music (New York: John Wiley & Sons)
Google Scholar
Gray R M, Buzo A, Gray Jr. A H, Matsuyama Y 1980 Distortion measures for speech processing.IEEE Trans. Acoust., Speech Signal Process. 28: 367–376
Article MATH Google Scholar
Gusfield D 1997A lgorithms on strings, trees and subsequences (New York: University of Cambridge Press)
Google Scholar
Gwadera R, Atallah M J, Szpankowski W 2003 Reliable detection of episodes in event sequences. InProc. 3rd IEEE Int. Conf. on Data Mining (ICDM 2003), pp 67–74
Gwadera R, Atallah M J, Szpankowski W 2005 Markov models for identification of significant episodes. InProc. 2005 SIAM Int. Conf. on Data Mining (SDM-05), Newport Beach, California
Han J, Kamber M 2001Data mining: Concepts and techniques (San Fransisco, CA: Morgan Kauffmann)
Google Scholar
Han J, Gong W, Yin Y 1998 Mining segment-wise periodic patterns in time-related databases. InProc. 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD’98), New York, pp 214–218
Han J, Dong G, Yin Y 1999 Efficient mining of partial periodic patterns in time series database. InProc. 15th Int. Conf. on Data Engineering, (ICDE’99), Sydney, pp 106–115
Hand D, Mannila H, Smyth P 2001Principles of data mining (Cambridge, MA: MIT Press)
Google Scholar
Haselsteiner E, Pfurtscheller G 2000 Using time-dependent neural networks for EEG classification.IEEE Trans. Rehab. Eng. 8: 457–463
Article Google Scholar
Hastie T, Tibshirani R, Friedman J 2001The elements of statistical learning: Data mining, inference and prediction (New York: Springer-Verlag)
MATH Google Scholar
Haykin S 1992Neural networks: A comprehensive foundation (New York: Macmillan)
Google Scholar
Hirao M, Inenaga S, Shinohara A, Takeda M, Arikawa S 2001 A practical algorithm to find the best episode patterns.Lecture notes in computer science; Proc. 4th Int. Conf. on Discovery Science (DS 2001) Washington, DC, vol. 2226, pp 435–441, 25–28
Juang B H, Rabiner L 1993Fundamentals of speech recognition. (Englewood Cliffs, NJ: Prentice Hall)
Google Scholar
Kalpakis K, Puttagunta D G V 2001 Distance measures for effective clustering of ARIMA time series. In2001 IEEE Int. Conf. on Data Mining (ICDM01), San Jose, CA
Keogh E J, Pazzani M J 2000 Scaling up dynamic time warping for datamining applications. InProc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data mining, Boston, MA, pp 285–289, 20–23
Koskela T, Lehtokangas M, Saarinen J, Kaski K 1996 Time series prediction with multilayer perceptron, FIR and Elman neural networks. InProc. World Congress on Neural Networks, pp 491–496
Kruskal J B 1983 An overview of sequence comparison: Time warps, string edits and macromolecules.SIAM Rev. 21:201–237
Article MathSciNet Google Scholar
Kundu A, He Y, Bahl P 1988 Word recognition and word hypothesis generation for handwritten script: A Hidden Markov Model based approach. InProc. 1988 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR’88), pp 457–462
Law M H, Kwok J T 2000 Rival penalized competitive learning for model-based sequence clustering. InProc. IEEE Int. Conf. on Pattern Recognition (ICPR00), Barcelona, Spain
Laxman S, Sastry P S, Unnikrishnan K P 2002 Generalized frequent episodes in event sequences.Temporal Data Mining Workshop Notes, SIGKDD, (eds) K P Unnikrishnan, R Uthurusamy, Edmonton, Alberta, Canada
Laxman S, Sastry P S, Unnikrishnan K P 2004a Fast algorithms for frequent episode discovery in event sequences. Technical Report CL-2004-04/MSR, GM R&D Center, Warren
Google Scholar
Laxman S, Sastry P S, Unnikrishnan K P 2004b Fast algorithms for frequent episode discovery in event sequences. InProc. 3rd Workshop on Mining Temporal and Sequential Data, Seattle, WA
Laxman S, Sastry P S, Unnikrishnan K P 2005 Discovering frequent episodes and learning hidden markov models: A formal connection.IEEE Trans. Knowledge Data Eng. 17: 1505–1517
Article Google Scholar
Lee C-H, Chen M-S, Lin C-R 2003 Progressive pattern miner: An efficient algorithm for mining general temporal association rules.IEEE Trans. Knowledge Data Eng. 15: 1004–1017
Article Google Scholar
Levenshtein VI 1966 Binary codes capable of correcting deletions, insertions and reversals.Sov. Phys. Dokl. 10: 707–710
MathSciNet Google Scholar
Lin M-Y, Lee S-Y 2003 Improving the efficiency of interactive sequential pattern mining by incremental pattern discovery. InProc. IEEE 36th Annu. Hawaii Int. Conf. on System Sciences (HICSS03), Big Island, Hawaii
Ma S, Hellerstein J L 2001 Mining partially periodic event patterns with unknown periods. InProc. 17th Int. Conf. on Data Eng. (ICDE’01), pp 205–214
Mannila H, Rusakov D 2001 Decomposition of event sequences into independent components. InFirst SIAM Int. Conf. on Data Mining, Chicago, IL
Mannila H, Toivonen H, Verkamo A I 1997 Discovery of frequent episodes in event sequences.Data Mining Knowledge Discovery 1: 259–289
Article Google Scholar
Meger N, Rigotti C 2004 Constraint-based mining of episode rules and optimal window sizes. InProc. 8th Eur. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD ’04), Pisa, Italy
Miller R T, Christoffels A G, Gopalakrishnan C, Burke J, Ptitsyn A A, Broveak T R, Hide W A 1999 A comprehensive approach to clustering of expressed human gene sequence: The sequence tag alignment and consensus knowledge base.Genome Res. 9: 1143–1155
Article Google Scholar
Miller W, Schwartz S, Hardison R C 1994 A point of contact between computer science and molecular biology.IEEE Comput. Sci. Eng. 1: 69–78
Article Google Scholar
Nag R, Wong K H, Fallside F 1986 Script recognition using Hidden Markov Models. InProc. 1986 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP’86), pp 2071–2074
Nalwa V S 1997 Automatic on-line signature verification.Proc. IEEE 85: 215–239
Article Google Scholar
Ng R T, Lakshmanan L V S, Han J, Pang A 1998 Exploratory mining and pruning optimizations of constrained associations rules. InProc. 1998 ACM SIGMOD Int. Conf. on Management of Data, (Seattle, Washington) pp 13–24
Oates T, Firoiu L, Cohen P R 2001 Using dynamic time warping to bootstrap HMM-based clustering of time series. InLecture notes in computer science; Sequence learning: Paradigms, algorithms, and applications (eds) C L Giles, R Sun (Heidelberg: Springer-Verlag) vol. 1828, p. 35
Google Scholar
Osata Net al 2002 A computer-based method of selecting clones for a full-length cDNA project: Simulataneous collection of negligibly redundant and variant cDNAs.Genome Res. 12: 1127–1134
Article Google Scholar
O’Shaughnessy D 2003Speech communications: Human and machine (Piscataway, NJ: IEEE Press)
Google Scholar
Ozden B, Ramaswamy S, Silberschatz A 1998 Cyclic association rules. InProc. 14th Int. Conf. on Data Engineering (ICDE’98), Orlando, Florida, pp 412–421
Pasquier N, Bastide Y, Taouil R, Lakhal L 1999 Discovering frequent closed itemsets for association rules. InLecture notes in computer science; Proc. 7th Int. Conf. on Database Theory (ICDT99), Jerusalem, Israel, vol. 1540, pp 398–416
Google Scholar
Perng C-S, Wang H, Zhang S R, Parker D S 2000 Landmarks: A new model for similarity-based pattern querying in time series databases. In16th Int. Conf. on Data Engineering (ICDE00), p. 33, San Diego, CA
Pevzner P A, Borodovski M Y, Mironov A A 1989 Linguistic of nucleotide sequences: The significance of deviation from mean statistical characteristics and prediction of the frequencies of occurrence of words.J. Biomol. Struct. Dynamics 6: 1013–1026
Google Scholar
Rabiner L R 1989 A tutorial on hidden Markov models and selected applications in speech recognition.Proc. IEEE 77: 257–286
Article Google Scholar
Regnier M, Szpankowski W 1998 On pattern frequency occurrences in a Markovian sequence.Algorithmica 22: 631–649
Article MATH MathSciNet Google Scholar
Schreiber T, Schmitz A 1997 Classification of time series data with nonlinear similarity measures.Phys. Rev. Lett. 79: 1475–1478
Article Google Scholar
Sclaroff S, Kollios G, Betke M, Rosales R 2001 Motion mining. InLecture notes in computer science;Proc. 2nd Int. Workshop on Multimedia Databases and Image Communication (Heidelberg: Springer-Verlag)
Google Scholar
Sebastiani P, Ramoni M, Cohen P, Warwick J, Davis J 1999 Discovering dynamics using bayesian clustering. InLecture notes in computer science;Adv. in Intelligent Data Analysis: 3rd Int. Symp., IDA-99 (Heidelberg: Springer-Verlag) vol. 1642, p. 199
Google Scholar
Shintani T, Kitsuregawa M 1998 Mining algorithms for sequential patterns in parallel: Hash based approach. InProc. 2nd Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp 283–294
Smyth P 1997 Clustering sequences with hidden Markov models.Adv. Neural Inf. Process. 9: 648–655
MathSciNet Google Scholar
Smyth P 2001 Data mining at the interface of computer science and statistics. InData mining for scientific and engineering applications. (eds) R L Grossman, C Kamath, P Kegelmeyer, V Kumar, R R Namburu (Dordrecht: Kluwer Academic)
Google Scholar
Srikanth R, Agrawal R 1996 Mining sequential patterns: Generalizations and performance improvements. InProc. 5th Int. Conf. on Extending Database Technology (EDBT), Avignon, France
Starner T E, Pentland A 1995 Visual recognition of American sign language. InProc. 1995 Int. Workshop on Face and Gesture Recognition, Zurich
Sutton R S 1988 Learning to predict by method of temporal differences.Machine Learning 3(1): 9–44
Google Scholar
Sze S H, Gelfand M S, Pevzner P A 2002 Finding weak motifs in DNA sequences. InProc. 2002 Pacific Symposium on Biocomputing, pp 235–246
Tappert C C, Suen C Y, Wakahara T 1990 The state of the art in on-line handwriting recognition.IEEE Trans. Pattern Anal. Machine Intell. 12: 787–808
Article Google Scholar
Tino P, Schittenkopf C, Dorffner G 2000 Temporal pattern recognition in noisy non-stationary time series based on quantization into symbolic streams: Lessons learned from financial volatility trading (url:citeseer.nj.nec.com/tino00temporal.html)
Tronicek Z 2001 Episode matching. InProc. 12th Annu. Symp. on Combinatorial Pattern Matching (CPM 2001), Jerusalem, Israel, vol. 2089, pp 143–146
Google Scholar
Wan E A 1990 Temporal backpropagation for FIR neural networks. InInt. Joint Conf. on Neural Networks (1990 IJCNN), vol. 1, pp 575–580
Article Google Scholar
Wang J T-L, Chirn G-W, Marr T G, Shapiro B, Shasha D, Zhang K 1994 Combinatorial pattern discovery for scientific data: some preliminary results. InProc. 1994 ACM SIGMOD Int. Conf. on Management of Data, Minneapolis, Minnesota, pp 115–125
Wang J, Han J 2004 BIDE: Efficient mining of frequent closed sequences. In20th Int. Conf. on Data Engineering, Boston, MA
Witten I H, Frank E 2000Data mining: Practical machine learning tools and techniques with JAVA implementations (San Fransisco, CA: Morgan Kaufmann)
Google Scholar
Wu C, Berry M, Shivakumar S, McLarty J 1995 Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition.Machine Learning, Special issue on applications in molecular biology 21(1–2): 177–193 Wu S, Manber U 1992 Fast text searching allowing errors.Commun. ACM 35(10): 83–91
Google Scholar
Wu Y-L, Agrawal D, Abbadi A E 2000 A comparison of DFT and DWT based similarity search in time series databases. InProc. Ninth Int. Conf. on Information and Knowledge Management, McLean, VA, pp 488–495
Xiong Y, Yeung D Y 2002 Mixtures of ARMA models for model-based time series clustering. In2002 IEEE Int. Conf. on Data Mining, Maebashi City, Japan, pp 717–720
Yamato J, Ohya J, Ishii K 1992 Recognizing human action in time-sequential images using Hidden Markov Model. InProc. 1992 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR’92), Champaign, IL, pp 379–385
Yan X, Han J, Afshar R 2003 CloSpan: Mining closed sequential patterns in large datasets. InProc. 2003 Int. SIAM Conf. on Data Mining (SDM03), San Fransisco, CA
Yule G 1927 On a method of investigating periodicity in distributed series with special reference to Wolfer’s sunspot numbers.Philos. Trans. R. Soc. London A226
Zaki M J 1998 Efficient enumeration of frequent sequences. InProc. ACM 7th Int. Conf. Information and Knowledge Management (CIKM)

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Science, 560 012, Bangalore, India
Srivatsan Laxman & P. S. Sastry

Authors

Srivatsan Laxman
View author publications
You can also search for this author in PubMed Google Scholar
P. S. Sastry
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laxman, S., Sastry, P.S. A survey of temporal data mining. Sadhana 31, 173–198 (2006). https://doi.org/10.1007/BF02719780

Download citation

Issue Date: April 2006
DOI: https://doi.org/10.1007/BF02719780

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A survey of temporal data mining

Abstract

Access this article

Similar content being viewed by others

Frequent Temporal Pattern Mining with Extended Lists

Sequential Pattern Mining

An Empirical Evaluation of Sequential Pattern Mining Algorithms

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey of temporal data mining

Abstract

Access this article

Similar content being viewed by others

Frequent Temporal Pattern Mining with Extended Lists

Sequential Pattern Mining

An Empirical Evaluation of Sequential Pattern Mining Algorithms

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation