ABSTRACT
In this paper, we study the problem of learning phylogenies and hidden Markov models. We call a Markov model nonsingular if all transition matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov models without the nonsingularity condition is at least as hard as learning parity with noise. On the other hand, we give a polynomial-time algorithm for learning nonsingular phylogenies and hidden Markov models.
- N. Abe and N. K. Warmuth, On the Computational Complexity of Approximating Probability Distributions by Probabilistic Automata, Machine Learning, 9(2--3), 205--260, 1992.]] Google ScholarDigital Library
- L. Addario-Berry, B. Chor, M. Hallett, J. Lagergren, A. Panconesi and Todd Wareham, Ancestral Maximum Likelihood of Evolutionary Trees Is Hard, Journal of Bioinformatics and Computational Biology, 2, 257--271, 2004.]]Google ScholarCross Ref
- A. Ambainis, R. Desper, M. Farach, and S. Kannan, Nearly tight bounds on the learnability of evolution, FOCS 1997.]] Google ScholarDigital Library
- J. Cavender, Taxonomy with confidence, Mathematical Biosciences, 40, 271--280, 1978.]]Google ScholarCross Ref
- J. T. Chang, Full Reconstruction of Markov Models on Evolutionary Trees: Identifiability and Consistency, Mathematical Biosciences, 137, 51--73, 1996.]]Google ScholarCross Ref
- R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter, Probabilistic Networks and Expert Systems, Springer, New York, 1999.]] Google ScholarDigital Library
- M. Cryan, L. A. Goldberg, and P. W. Goldberg, Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model, SIAM Journal on Computing, 31(2), 375--397, 2002.]] Google ScholarDigital Library
- R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998.]]Google ScholarCross Ref
- R. Durrett, Probability: Theory and Examples, Duxbury, 1996.]]Google Scholar
- P. L. Erdos, M. Steel, L. Szekely, and T. Warnow, A Few Logs Suffice to Build (Almost) all Trees (I), Random Structures and Algorithms, 14, 153--184, 1997.]] Google ScholarDigital Library
- P. L. Erdos, M. A. Steel, L. A. Szekely, and T. J. Warnow, A Few Logs Suffice to Build (Almost) all Trees (II), Theoretical Computer Science, 221(1--2), 77--118, 1999.]] Google ScholarDigital Library
- J. Felsenstein. Inferring Phylogenies. Sinauer, New York, New York, 2004.]]Google Scholar
- M. Farach and S. Kannan, Efficient algorithms for inverting evolution, STOC 1996.]] Google ScholarDigital Library
- J. Feldman and R. O'Donnell and R. Servedio. Learning mixtures of product distributions, preprint, 2004.]]Google Scholar
- J. Felsenstein, Cases in which parsimony or compatibility methods will be positively misleading, Systematic Zoology, 27, 410--410, 1978.]]Google ScholarCross Ref
- G. H. Golub and C. H. Van Loan, Matrix Computations, Johns Hopkins University Press, 1996.]]Google Scholar
- R. L. Graham and L. R. Foulds, Unlikelihood that minimal phylogenies for a realistic biological study can be constructed in reasonable computational time, Mathematical Biosciences, 60, 133--142, 1982.]]Google ScholarCross Ref
- R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, 1985.]] Google ScholarDigital Library
- M. Kearns, Efficient noise-tolerant learning from statistical queries, STOC 1993.]] Google ScholarDigital Library
- M. J. Kearns, Y. Mansour, D. Ron, R. Rubinfeld, R. E. Schapire, and L. Sellie, On the learnability of discrete distributions, STOC 1994.]] Google ScholarDigital Library
- R. B. Lyngsø and C. N. S. Pedersen, Complexity of Comparing Hidden Markov Models, ISAAC 2001.]] Google ScholarDigital Library
- E. Mossel, Distorted metrics on trees and phylogenetic forests, preprint, 2004 (available at http://arxiv.org/math.CO/0403508).]]Google Scholar
- E. Mossel and S.Roch, Learning Nonsingular Phylogenies and Hidden Markov Models, preprint, 2005 (available at http://arxiv.org/cs.LG/0502076).]]Google Scholar
- L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, 77(2), 257--286, 1989.]]Google ScholarCross Ref
- K. Rice and T. Warnow, Parsimony is Hard to Beat, Proceedings of the Third Annual International Conference on Computing and Combinatorics, Lecture Notes In Computer Science, Springer, 1276, 124 -- 133, 1997.]] Google ScholarDigital Library
- D. Ron, Y. Singer, and N. Tishby, On the Learnability and Usage of Acyclic Probabilistic Finite Automata, J. Comput. Syst. Sci., 56(2), 133--152, 1998.]] Google ScholarDigital Library
- C. Semple and M. Steel. Phylogenetics, volume 22 of Mathematics and its Applications series. Oxford University Press, 2003.]]Google Scholar
- M. Steel, Recovering a tree from the leaf colourations it generates under a Markov model, Appl. Math. Lett. 7(2), 19--24, 1994.]]Google ScholarCross Ref
- M. A. Steel, L. A. Szekely, and M. D. Hendy, Reconstructing trees when sequence sites evolve at variable rates, J. Comput. Biol. 1(2), 153--63, 1994.]]Google ScholarCross Ref
- L. G. Valiant, A theory of the learnable, Communications of the ACM, 27(11), 1134--1142, 1984.]] Google ScholarDigital Library
Index Terms
- Learning nonsingular phylogenies and hidden Markov models
Recommendations
Coding with partially hidden Markov models
DCC '95: Proceedings of the Conference on Data CompressionPartially hidden Markov models (PHMM) are introduced. They are a variation of the hidden Markov models (HMM) combining the power of explicit conditioning on past observations and the power of using hidden states. (P)HMM may be combined with arithmetic ...
Structural hidden Markov models: An application to handwritten numeral recognition
We introduce in this paper a generalization of the widely used hidden Markov models (HMM's), which we name "structural hidden Markov models" (SHMM). Our approach is motivated by the need of modeling complex structures which are encountered in many ...
Joint semi-supervised learning of Hidden Conditional Random Fields and Hidden Markov Models
Although semi-supervised learning has generated great interest for designing classifiers on static patterns, there has been comparatively fewer works on semi-supervised learning for structured outputs and in particular for sequences. We investigate semi-...
Comments