ABSTRACT
Knowledge Discovery in time series usually requires symbolic time series. Many discretization methods that convert numeric time series to symbolic time series ignore the temporal order of values. This often leads to symbols that do not correspond to states of the process generating the time series and cannot be interpreted meaningfully. We propose a new method for meaningful unsupervised discretization of numeric time series called Persist. The algorithm is based on the Kullback-Leibler divergence between the marginal and the self-transition probability distributions of the discretization symbols. Its performance is evaluated on both artificial and real life data in comparison to the most common discretization methods. Persist achieves significantly higher accuracy than existing static methods and is robust against noise. It also outperforms Hidden Markov Models for all but very simple cases.
- J. Bilmes. A Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report ICSI-TR-97-021, University of Berkeley, 1997.Google Scholar
- C. Daw, C. Finney, and E. Tracy. A review of symbolic analysis of experimental data. Review of Scientific Instruments, 74:916--930, 2003.Google ScholarCross Ref
- J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In Int. Conf. on Machine Learning, pages 194--202, 1995.Google ScholarCross Ref
- A. Gionis and H. Mannila. Finding recurrent sources in sequences. In Proc. 7th Int. Conf. on Computational Molecular Biology, pages 123--130, 2003. Google ScholarDigital Library
- S. K. Harms and J. Deogun. Sequential association rule mining with time lags. Journal of Intelligent Information Systems (JIIS), 2003. Google ScholarDigital Library
- M. L. Hetland and P. Saetrom. Temporal rule discovery using genetic programming and specialized hardware. In A. Lotfi, J. Garibaldi, and R. John, editors, Proc. 4th Int. Conf. on Recent Advances in Soft Computing, pages 182--188, 2002.Google Scholar
- M. L. Hetland and P. Saetrom. The role of discretization parameters in sequence rule evolution. In Proc. 7th Int. Conf. on Knowledge-Based Intelligent Information & Engineering Systems, KES. Springer, 2003.Google ScholarCross Ref
- M. W. Kadous. Learning comprehensible descriptions of multivariate time series. In Proc. 16th Int. Conf. on Machine Learning, pages 454--463. Morgan Kaufmann, 1999. Google ScholarDigital Library
- E. Keogh. UCR Time Series Data Mining Archive, http://www.cs.ucr.edu/~eamonn/tsdma/index.html, 2002.Google Scholar
- E. Keogh, S. Chu, D. Hart, and M. Pazzani. Segmenting time series: A survey and novel approach. In Data Mining in Time Series Databases. World Scientific Publishing Company, 2003.Google Scholar
- E. Keogh and S. Kasetty. On the need for time series data mining benchmarks: A survey and empirical demonstration. In 8th ACM SIGKDD, Edmonton, Canada, pages 102--111, 2002. Google ScholarDigital Library
- E. Keogh, S. Lonardi, and B. Chiu. Finding surprising patterns in a time series database in linear time and space. In Proc. 8th ACM SIGKDD, Edmonton, Canada, July 2002. Google ScholarDigital Library
- E. J. Keogh, K. Chakrabarti, M. J. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3(3):263--286, 2001.Google ScholarCross Ref
- R. Kohavi and M. Sahami. Error-based and entropy-based discretization of continuous features. In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, pages 114--119, 1996.Google Scholar
- S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22:79--86, 1951.Google ScholarCross Ref
- J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proc. 8th ACM SIGMOD, DMKD workshop, pages 2--11, 2003. Google ScholarDigital Library
- H. Liu, F. Hussain, C. L. Tan, and M. Dash. Discretization: An enabling technique. Data Mining and Knowledge Discovery, (6):393--423, 2002. Google ScholarDigital Library
- J. Mäntyjärvi, J. Himberg, P. Kangas, U. Tuomela, and P. Huuskonen. Sensor signal data set for exploring context recognition of mobile devices. In 2nd Int. Conf. on Pervasive Computing, Linz/Vienna, Austria, 2004.Google Scholar
- F. Mörchen and A. Ultsch. Discovering temporal knowlegde in multivariate time series. In Proc. GfKl 2004, Dortmund, Germany, 2004.Google Scholar
- F. Mörchen, A. Ultsch, and O. Hoos. Extracting interpretable muscle activation patterns with time series knowledge mining. Int. Journal of Knowledge-Based & Intelligent Engineering Systems, 2005. Google ScholarDigital Library
- L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proc. of IEEE, 77(2):257--286, 1989.Google ScholarCross Ref
- J. J. Rodriguez, C. J. Alonso, and H. Boström. Learning first order logic time series classifiers. In Proc. 10th Int. Conf. on Inductive Logic Programming, pages 260--275, 2000. Google ScholarDigital Library
- A. Ultsch. Pareto Density Estimation: Probability Density Estimation for Knowledge Discovery. In Proc. GfKl 2003, Cottbus, Germany, 2003.Google Scholar
Index Terms
- Optimizing time series discretization for knowledge discovery
Recommendations
A Hellinger-based discretization method for numeric attributes in classification learning
Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper ...
A Method of Discretization of Continuous Attributes in Knowledge Discovery
CSA '13: Proceedings of the 2013 International Conference on Computer Sciences and ApplicationsDiscretization of continuous attributes is an important preprocessing step in knowledge discovery. Most consecutive property boundaries are blurred. It is suitable for soft classification. This paper introduced the fuzzy set of continuous attribute ...
Discretization method of continuous attributes based on decision attributes
AICI'10: Proceedings of the 2010 international conference on Artificial intelligence and computational intelligence: Part IIThe attributes in rough set must be discretized, but the general theory on discretization did not think about the decision attribute adequately during discretization of data, as a result, it leads to several redundant rules and lower calculation ...
Comments