Abstract
We propose a novel approach for building finite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by first transforming the n-block structure of the training sequence into a geometric structure of points in a unit hypercube, such that the longer is the common suffix shared by any two n-blocks, the closer lie their point representations. Such a transformation embodies a Markov assumption—n-blocks with long common suffixes are likely to produce similar continuations. Prediction contexts are found by detecting clusters in the geometric n-block representation of the training sequence via vector quantization. We compare our model with both the classical (fixed order) and variable memory length Markov models on five data sets with different memory and stochastic components. Fixed order Markov models (MMs) fail on three large data sets on which the advantage of allowing variable memory length can be exploited. On these data sets, our predictive models have a superior, or comparable performance to that of VLMMs, yet, their construction is fully automatic, which, is shown to be problematic in the case of VLMMs. On one data set, VLMMs are outperformed by the classical MMs. On this set, our models perform significantly better than MMs. On the remaining data set, classical MMs outperform the variable context length strategies.
Article PDF
Similar content being viewed by others
References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Aurenhammer, F. (1991). Voronoi diagrams—Survey of a fundamental geometric data structure. ACM Computing Surveys, 3, 345–405.
Barnsley, M. F. (1988). Fractals everywhere. New York: Academic Press.
Bauer, H. U., Der, R.,& Herrmann, M. (1996). Controlling the magnification factor of self-organizing feature maps. Neural Computation, 8, 757–771.
Bollerslev, T. (1986). A generalized autoregressive conditional heteroscedasticity. Journal of Econometrics, 31, 307–327.
Brillinger, D. R. (1994). Examples of scientific problems and data analysis in demography, neurophysiology and seismology. J. Comp. and Graph. Statistics, 3, 1–22.
Bruske, J.& Sommer, G. (1995). Dynamic cell structure learns perfectly topology preserving map. Neural Computation, 4, 845–865.
Burset, M.& Guigó, R. (1996). Evaluation of gene structure prediction programs. Genomics, 34, 353–357.
Bühlmann, P. (1998). Extreme events from return-volume process: A discretization approach for complexity reduction. Applied Financial Economics, 8, 267–278.
Bühlmann, P. (2000). Model selection for variable length Markov chains and tuning the context algorithm. Annals of the Institute of Statistical Mathematics, 52, 287–315.
Bühlmann, P. (1999b). Dynamic adaptive partitioning for nonlinear time series. Biometrika, 86, 555–571.
Bühlmann, P.& Wyner, A. J. (1999). Variable length Markov chains. Annals of Statistics, 27, 480–513.
Buhmann, J. M. (1995). Learning and data clustering. In M. Arbib (Ed.), Handbook of brain theory and neural networks. Cambridge, MA: Bradford Books, MIT Press.
Crutchfield, J. P.& Young, K. (1990). Computation at the onset of chaos. In W. H. Zurek (Ed.), Complexity, entropy, and the physics of information, SFI Studies in the Sciences of Complexity (Vol. 8). Reading, MA: Addison-Wesley.
Everitt, B. S. (1977). The analysis of contingency tables. London: Chapman and Hall.
Freund, J., Ebeling, W.,& Rateitschak, K. (1996). Self-similar sequences and universal scaling of dynamical entropies. Physical Review E, 5, 5561–5566.
Giles, C. L., Lawrence, S.,& Tsoi, A. C. (1997). Rule inference for financial prediction using recurrent neural networks. In Proceedings of the Conference on Computational Intelligence for Financial Engineering (pp. 253–259). Piscataway, NJ: IEEE.
Guyon, I.& Pereira, F. (1995). Design of a linguistic postprocessor using variable memory length Markov models. In Proceedings of International Conference on Document Analysis and Recognition (pp. 454–457). Montreal, Canada: IEEE Computer Society Press.
Jaditz, T.& Sayers, C. L. (1993). Is chaos generic in economic data? Int. Journal of Bifurcation and Chaos, 3, 745–755.
Jeffrey, J. (1990). Chaos game representation of gene structure. Nucleic Acids Research, 8, 2163–2170.
Katok, A.& Hausselblatt, B. (1995). Introduction to the modern theory of dynamical systems. Cambridge, UK: Cambridge University Press.
Khinchin, A. I. (1957). Mathematical foundations of information theory. New York: Dover Publications.
Kohonen, T. (1990). The self- organizing map. Proceedings of the IEEE, 9, 1464–1479.
Krogh, A. (1997). Two methods for improving performance of a HMM and their application for gene finding. In Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, Halkidiki, Greece (pp. 179–186). Menlo Park, CA: AAAI Press.
Laird, P.& Saul, R. (1994). Discrete sequence prediction and its applications. Machine Learning, 15, 43–68.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). Berkeley, CA: University of California Press.
Nadas, A. (1984). Estimation of probabilities in the language model of the IBM speech recognition system. IEEE Trans. on ASSP, 4, 859–861.
Noh, J., Engle, R. F.,& Kane, A. (1994). Forecasting volatility and option prices of the s&p 500 index. Journal of Derivatives, 2(1), 17–30.
Papageorgiou, C. P. (1998). Mixed memory Markov models for time series analysis. In Proceedings of the conference on Computational Intelligence for Financial Engineering (pp. 165–183). Piscataway, NJ: IEEE.
Prum, B., Rodolphe, F.,& deTurckheim, E. (1995). Findingwords with unexpected frequencies in deoxyribonucleic acid sequences. Journal of Royal Statistical Society, B 57, 205–220.
Rissanen, J. (1983). A universal data compression system. IEEE Trans. Inform. Theory, 5, 656–664.
Ritter, H.& Schulten, K. (1986). On the stationary state of the kohonen's self-organizing sensory mapping. Biol. Cybern., 54, 99–106.
Ron, D., Singer, Y.,& Tishby, N. (1994). The power of amnesia. In Advances in Neural Information Processing Systems 6. San Mateo, CA: Morgan Kaufmann.
Ron, D., Singer, Y.,& Tishby, N. (1996). The power of amnesia. Machine Learning, 25(1), 117–150.
Rose, K., Gurewitz, E.,& Fox, G. C. (1990). Statistical mechanics and phase transitions in clustering. Physical Review Letters, 8, 945–948.
Tiňo, P.& Šajda, J. (1995). Learning and extracting initial mealy machines with a modular neural network model. Neural Computation, 4, 822–844.
Tiňo, P.& Dorffner, G. (1998). Recurrent neural networks with iterated function systems dynamics. In International ICSC/IFAC Symposium on Neural Computation (pp. 526–532). Canada/Switzerland: ICSC Academic Press.
Tiňo, P.& Köteles, M. (1999). Extracting finite state representations from recurrent neural networks trained on chaotic symbolic sequences. IEEE Transactions on Neural Networks, 10(2), 284–302.
Tiňo, P. (1999). Spatial representation of symbolic sequences through iterative function systems. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 29(4), 386–392.
Tiňo, P., Dorffner, G.,& Schittenkopf, Ch. (2000). Understanding state space organization in recurrent neural networks with iterative function systems dynamics. In S. Wermter,& R. Sun (Eds.), Hybrid neural symbolic integration (pp. 256–270). Berlin: Springer Verlag.
Tiňo, P., Schittenkopf, Ch., Dorffner, G.,& Dockner, E. J. (2000a). A symbolic dynamics approach to volatility prediction. In Y. S. Abu-Mostafa, B. LeBaron, A. W. Lo,& A. S. Weigend (Eds.), Computational finance (pp. 137–151). Cambridge, MA: MIT Press.
Weinberger, M. J., Rissanen, J. J.,& Feder, M. (1995). A universal finite memory source. IEEE Transactions on Information Theory, 3, 643–652.
Willems, F. M. J., Shtarkov, Y. M.,& Tjalkens, T. J. (1995). The context tree weighting method: basic properties. IEEE Trans. Info. Theory, 3, 653–664.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Tino, P., Dorffner, G. Predicting the Future of Discrete Sequences from Fractal Representations of the Past. Machine Learning 45, 187–217 (2001). https://doi.org/10.1023/A:1010972803901
Issue Date:
DOI: https://doi.org/10.1023/A:1010972803901