skip to main content
10.1145/1081870.1081953acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Optimizing time series discretization for knowledge discovery

Published:21 August 2005Publication History

ABSTRACT

Knowledge Discovery in time series usually requires symbolic time series. Many discretization methods that convert numeric time series to symbolic time series ignore the temporal order of values. This often leads to symbols that do not correspond to states of the process generating the time series and cannot be interpreted meaningfully. We propose a new method for meaningful unsupervised discretization of numeric time series called Persist. The algorithm is based on the Kullback-Leibler divergence between the marginal and the self-transition probability distributions of the discretization symbols. Its performance is evaluated on both artificial and real life data in comparison to the most common discretization methods. Persist achieves significantly higher accuracy than existing static methods and is robust against noise. It also outperforms Hidden Markov Models for all but very simple cases.

References

  1. J. Bilmes. A Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report ICSI-TR-97-021, University of Berkeley, 1997.Google ScholarGoogle Scholar
  2. C. Daw, C. Finney, and E. Tracy. A review of symbolic analysis of experimental data. Review of Scientific Instruments, 74:916--930, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  3. J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In Int. Conf. on Machine Learning, pages 194--202, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  4. A. Gionis and H. Mannila. Finding recurrent sources in sequences. In Proc. 7th Int. Conf. on Computational Molecular Biology, pages 123--130, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. K. Harms and J. Deogun. Sequential association rule mining with time lags. Journal of Intelligent Information Systems (JIIS), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. L. Hetland and P. Saetrom. Temporal rule discovery using genetic programming and specialized hardware. In A. Lotfi, J. Garibaldi, and R. John, editors, Proc. 4th Int. Conf. on Recent Advances in Soft Computing, pages 182--188, 2002.Google ScholarGoogle Scholar
  7. M. L. Hetland and P. Saetrom. The role of discretization parameters in sequence rule evolution. In Proc. 7th Int. Conf. on Knowledge-Based Intelligent Information & Engineering Systems, KES. Springer, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  8. M. W. Kadous. Learning comprehensible descriptions of multivariate time series. In Proc. 16th Int. Conf. on Machine Learning, pages 454--463. Morgan Kaufmann, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Keogh. UCR Time Series Data Mining Archive, http://www.cs.ucr.edu/~eamonn/tsdma/index.html, 2002.Google ScholarGoogle Scholar
  10. E. Keogh, S. Chu, D. Hart, and M. Pazzani. Segmenting time series: A survey and novel approach. In Data Mining in Time Series Databases. World Scientific Publishing Company, 2003.Google ScholarGoogle Scholar
  11. E. Keogh and S. Kasetty. On the need for time series data mining benchmarks: A survey and empirical demonstration. In 8th ACM SIGKDD, Edmonton, Canada, pages 102--111, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Keogh, S. Lonardi, and B. Chiu. Finding surprising patterns in a time series database in linear time and space. In Proc. 8th ACM SIGKDD, Edmonton, Canada, July 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. J. Keogh, K. Chakrabarti, M. J. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3(3):263--286, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  14. R. Kohavi and M. Sahami. Error-based and entropy-based discretization of continuous features. In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, pages 114--119, 1996.Google ScholarGoogle Scholar
  15. S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22:79--86, 1951.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proc. 8th ACM SIGMOD, DMKD workshop, pages 2--11, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Liu, F. Hussain, C. L. Tan, and M. Dash. Discretization: An enabling technique. Data Mining and Knowledge Discovery, (6):393--423, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Mäntyjärvi, J. Himberg, P. Kangas, U. Tuomela, and P. Huuskonen. Sensor signal data set for exploring context recognition of mobile devices. In 2nd Int. Conf. on Pervasive Computing, Linz/Vienna, Austria, 2004.Google ScholarGoogle Scholar
  19. F. Mörchen and A. Ultsch. Discovering temporal knowlegde in multivariate time series. In Proc. GfKl 2004, Dortmund, Germany, 2004.Google ScholarGoogle Scholar
  20. F. Mörchen, A. Ultsch, and O. Hoos. Extracting interpretable muscle activation patterns with time series knowledge mining. Int. Journal of Knowledge-Based & Intelligent Engineering Systems, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proc. of IEEE, 77(2):257--286, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  22. J. J. Rodriguez, C. J. Alonso, and H. Boström. Learning first order logic time series classifiers. In Proc. 10th Int. Conf. on Inductive Logic Programming, pages 260--275, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Ultsch. Pareto Density Estimation: Probability Density Estimation for Knowledge Discovery. In Proc. GfKl 2003, Cottbus, Germany, 2003.Google ScholarGoogle Scholar

Index Terms

  1. Optimizing time series discretization for knowledge discovery

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
      August 2005
      844 pages
      ISBN:159593135X
      DOI:10.1145/1081870

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2005

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader