Article

Optimizing time series discretization for knowledge discovery

Authors:
Fabian Mörchen

Philipps-University Marburg, Marburg, Germany

Philipps-University Marburg, Marburg, Germany
View Profile

,
Alfred Ultsch

Philipps-University Marburg, Marburg, Germany

Philipps-University Marburg, Marburg, Germany
View Profile

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data miningAugust 2005Pages 660–665https://doi.org/10.1145/1081870.1081953

Published:21 August 2005Publication History

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Pages 660–665

ABSTRACT

Knowledge Discovery in time series usually requires symbolic time series. Many discretization methods that convert numeric time series to symbolic time series ignore the temporal order of values. This often leads to symbols that do not correspond to states of the process generating the time series and cannot be interpreted meaningfully. We propose a new method for meaningful unsupervised discretization of numeric time series called Persist. The algorithm is based on the Kullback-Leibler divergence between the marginal and the self-transition probability distributions of the discretization symbols. Its performance is evaluated on both artificial and real life data in comparison to the most common discretization methods. Persist achieves significantly higher accuracy than existing static methods and is robust against noise. It also outperforms Hidden Markov Models for all but very simple cases.

References

J. Bilmes. A Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report ICSI-TR-97-021, University of Berkeley, 1997.Google Scholar
C. Daw, C. Finney, and E. Tracy. A review of symbolic analysis of experimental data. Review of Scientific Instruments, 74:916--930, 2003.Google ScholarCross Ref
J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In Int. Conf. on Machine Learning, pages 194--202, 1995.Google ScholarCross Ref
A. Gionis and H. Mannila. Finding recurrent sources in sequences. In Proc. 7th Int. Conf. on Computational Molecular Biology, pages 123--130, 2003. Google ScholarDigital Library
S. K. Harms and J. Deogun. Sequential association rule mining with time lags. Journal of Intelligent Information Systems (JIIS), 2003. Google ScholarDigital Library
M. L. Hetland and P. Saetrom. Temporal rule discovery using genetic programming and specialized hardware. In A. Lotfi, J. Garibaldi, and R. John, editors, Proc. 4th Int. Conf. on Recent Advances in Soft Computing, pages 182--188, 2002.Google Scholar
M. L. Hetland and P. Saetrom. The role of discretization parameters in sequence rule evolution. In Proc. 7th Int. Conf. on Knowledge-Based Intelligent Information & Engineering Systems, KES. Springer, 2003.Google ScholarCross Ref
M. W. Kadous. Learning comprehensible descriptions of multivariate time series. In Proc. 16th Int. Conf. on Machine Learning, pages 454--463. Morgan Kaufmann, 1999. Google ScholarDigital Library
E. Keogh. UCR Time Series Data Mining Archive, http://www.cs.ucr.edu/~eamonn/tsdma/index.html, 2002.Google Scholar
E. Keogh, S. Chu, D. Hart, and M. Pazzani. Segmenting time series: A survey and novel approach. In Data Mining in Time Series Databases. World Scientific Publishing Company, 2003.Google Scholar
E. Keogh and S. Kasetty. On the need for time series data mining benchmarks: A survey and empirical demonstration. In 8th ACM SIGKDD, Edmonton, Canada, pages 102--111, 2002. Google ScholarDigital Library
E. Keogh, S. Lonardi, and B. Chiu. Finding surprising patterns in a time series database in linear time and space. In Proc. 8th ACM SIGKDD, Edmonton, Canada, July 2002. Google ScholarDigital Library
E. J. Keogh, K. Chakrabarti, M. J. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3(3):263--286, 2001.Google ScholarCross Ref
R. Kohavi and M. Sahami. Error-based and entropy-based discretization of continuous features. In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, pages 114--119, 1996.Google Scholar
S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22:79--86, 1951.Google ScholarCross Ref
J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proc. 8th ACM SIGMOD, DMKD workshop, pages 2--11, 2003. Google ScholarDigital Library
H. Liu, F. Hussain, C. L. Tan, and M. Dash. Discretization: An enabling technique. Data Mining and Knowledge Discovery, (6):393--423, 2002. Google ScholarDigital Library
J. Mäntyjärvi, J. Himberg, P. Kangas, U. Tuomela, and P. Huuskonen. Sensor signal data set for exploring context recognition of mobile devices. In 2nd Int. Conf. on Pervasive Computing, Linz/Vienna, Austria, 2004.Google Scholar
F. Mörchen and A. Ultsch. Discovering temporal knowlegde in multivariate time series. In Proc. GfKl 2004, Dortmund, Germany, 2004.Google Scholar
F. Mörchen, A. Ultsch, and O. Hoos. Extracting interpretable muscle activation patterns with time series knowledge mining. Int. Journal of Knowledge-Based & Intelligent Engineering Systems, 2005. Google ScholarDigital Library
L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proc. of IEEE, 77(2):257--286, 1989.Google ScholarCross Ref
J. J. Rodriguez, C. J. Alonso, and H. Boström. Learning first order logic time series classifiers. In Proc. 10th Int. Conf. on Inductive Logic Programming, pages 260--275, 2000. Google ScholarDigital Library
A. Ultsch. Pareto Density Estimation: Probability Density Estimation for Knowledge Discovery. In Proc. GfKl 2003, Cottbus, Germany, 2003.Google Scholar

Index Terms

Optimizing time series discretization for knowledge discovery
1. Computing methodologies
  1. Machine learning

Recommendations

A Hellinger-based discretization method for numeric attributes in classification learning

Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper ...
Read More
A Method of Discretization of Continuous Attributes in Knowledge Discovery
CSA '13: Proceedings of the 2013 International Conference on Computer Sciences and Applications

Discretization of continuous attributes is an important preprocessing step in knowledge discovery. Most consecutive property boundaries are blurred. It is suitable for soft classification. This paper introduced the fuzzy set of continuous attribute ...
Read More
Discretization method of continuous attributes based on decision attributes
AICI'10: Proceedings of the 2010 international conference on Artificial intelligence and computational intelligence: Part II

The attributes in rough set must be discretized, but the general theory on discretization did not think about the decision attribute adequately during discretization of data, as a result, it leads to several redundant rules and lower calculation ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
August 2005
844 pages
ISBN:159593135X
DOI:10.1145/1081870
General Chair:
Robert Grossman
University of Illinois at Chicago & Open Data Partners, USA
,
Program Chairs:
Roberto Bayardo
IBM Almaden Research, USA
,
Kristin Bennett
RPI, USA
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
discretization
persistence
time series
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 74
  Total Citations
  View Citations
- 1,228
  Total Downloads
- Downloads (Last 12 months)38
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimizing time series discretization for knowledge discovery

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Hellinger-based discretization method for numeric attributes in classification learning

A Method of Discretization of Continuous Attributes in Knowledge Discovery

Discretization method of continuous attributes based on decision attributes