Skip to main content
Top

2016 | OriginalPaper | Chapter

5. Learning Probability Distributions Generated by Finite-State Machines

Authors : Jorge Castro, Ricard Gavaldà

Published in: Topics in Grammatical Inference

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We review methods for inference of probability distributions generated by probabilistic automata and related models for sequence generation. We focus on methods that can be proved to learn in the inference in the limit and PAC formal models. The methods we review are state merging and state splitting methods for probabilistic deterministic automata and the recently developed spectral method for nondeterministic probabilistic automata. In both cases, we derive them from a high-level algorithm described in terms of the Hankel matrix of the distribution to be learned, given as an oracle, and then describe how to adapt that algorithm to account for the error introduced by a finite sample.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Naoki Abe and Manfred K. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205–260, 1992.MATH Naoki Abe and Manfred K. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205–260, 1992.MATH
2.
go back to reference Animashree Anandkumar, Daniel Hsu, and Sham M. Kakade. A method of moments for mixture models and hidden Markov models. In 25th Annual Conference on Learning Theory (COLT); Journal of Machine Learning Research - Proceedings Track - COLT 2012, volume 23, pages 33.1–33.34, 2012. Animashree Anandkumar, Daniel Hsu, and Sham M. Kakade. A method of moments for mixture models and hidden Markov models. In 25th Annual Conference on Learning Theory (COLT); Journal of Machine Learning Research - Proceedings Track - COLT 2012, volume 23, pages 33.1–33.34, 2012.
3.
go back to reference Dana Angluin. Queries and concept learning. Machine Learning, 2(4):319–342, 1987. Dana Angluin. Queries and concept learning. Machine Learning, 2(4):319–342, 1987.
4.
go back to reference Raphaël Bailly, François Denis, and Liva Ralaivola. Grammatical inference as a principal component analysis problem. In 26th Intl. Conf. on Machine Learning (ICML), page 5. ACM, 2009. Raphaël Bailly, François Denis, and Liva Ralaivola. Grammatical inference as a principal component analysis problem. In 26th Intl. Conf. on Machine Learning (ICML), page 5. ACM, 2009.
5.
go back to reference Borja Balle, Jorge Castro, and Ricard Gavaldà. Learning PDFA with asynchronous transitions. In 10th Intl. Coll. on Grammatical Inference (ICGI), volume 6339 of Lecture Notes in Computer Science, pages 271–275. Springer, 2010. Borja Balle, Jorge Castro, and Ricard Gavaldà. Learning PDFA with asynchronous transitions. In 10th Intl. Coll. on Grammatical Inference (ICGI), volume 6339 of Lecture Notes in Computer Science, pages 271–275. Springer, 2010.
6.
go back to reference Borja Balle, Jorge Castro, and Ricard Gavaldà. A lower bound for learning distributions generated by probabilistic automata. In 21st Intl. Conf. on Algorithmic Learning Theory (ALT), volume 6331 of Lecture Notes in Computer Science, pages 179–193. Springer, 2010. Borja Balle, Jorge Castro, and Ricard Gavaldà. A lower bound for learning distributions generated by probabilistic automata. In 21st Intl. Conf. on Algorithmic Learning Theory (ALT), volume 6331 of Lecture Notes in Computer Science, pages 179–193. Springer, 2010.
7.
go back to reference Borja Balle, Jorge Castro, and Ricard Gavaldà. Bootstrapping and learning PDFA in data streams. In 11th Intl. Conf. on Grammatical Inference (ICGI), volume 21 of JMLR Workshop and Conf. Proceedings, pages 34–48, 2012. Borja Balle, Jorge Castro, and Ricard Gavaldà. Bootstrapping and learning PDFA in data streams. In 11th Intl. Conf. on Grammatical Inference (ICGI), volume 21 of JMLR Workshop and Conf. Proceedings, pages 34–48, 2012.
8.
go back to reference Borja Balle and Mehryar Mohri. Spectral learning of general weighted automata via constrained matrix completion. In Neural Information Processing Systems (NIPS), 2012. Borja Balle and Mehryar Mohri. Spectral learning of general weighted automata via constrained matrix completion. In Neural Information Processing Systems (NIPS), 2012.
9.
go back to reference Borja Balle, Ariadna Quattoni, and Xavier Carreras. Local loss optimization in operator models: A new insight into spectral learning. In 29th Intl. Conf. on Machine Learning (ICML), 2012. Borja Balle, Ariadna Quattoni, and Xavier Carreras. Local loss optimization in operator models: A new insight into spectral learning. In 29th Intl. Conf. on Machine Learning (ICML), 2012.
10.
go back to reference L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist., 41(1):164–171, 1970. L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist., 41(1):164–171, 1970.
11.
go back to reference Amos Beimel, Francesco Bergadano, Nader H. Bshouty, Eyal Kushilevitz, and Stefano Varricchio. Learning functions represented as multiplicity automata. J. ACM, 47(3):506–530, 2000.MathSciNetCrossRefMATH Amos Beimel, Francesco Bergadano, Nader H. Bshouty, Eyal Kushilevitz, and Stefano Varricchio. Learning functions represented as multiplicity automata. J. ACM, 47(3):506–530, 2000.MathSciNetCrossRefMATH
13.
go back to reference Rafael C. Carrasco and José Oncina. Learning stochastic regular grammars by means of a state merging method. In 2nd Intl. Coll. on Grammatical Inference (ICGI), volume 862 of Lecture Notes in Computer Science, pages 139–152. Springer, 1994. Rafael C. Carrasco and José Oncina. Learning stochastic regular grammars by means of a state merging method. In 2nd Intl. Coll. on Grammatical Inference (ICGI), volume 862 of Lecture Notes in Computer Science, pages 139–152. Springer, 1994.
14.
go back to reference Rafael C. Carrasco and José Oncina. Learning deterministic regular grammars from stochastic samples in polynomial time. Informatique Théorique et Applications, 33(1):1–20, 1999.MathSciNetCrossRefMATH Rafael C. Carrasco and José Oncina. Learning deterministic regular grammars from stochastic samples in polynomial time. Informatique Théorique et Applications, 33(1):1–20, 1999.MathSciNetCrossRefMATH
15.
go back to reference Jorge Castro and Ricard Gavaldà. Towards feasible PAC-learning of probabilistic deterministic finite automata. In 9th Intl. Coll. on Grammatical Inference (ICGI), volume 5278 of Lecture Notes in Computer Science, pages 163–174. Springer, 2008. Jorge Castro and Ricard Gavaldà. Towards feasible PAC-learning of probabilistic deterministic finite automata. In 9th Intl. Coll. on Grammatical Inference (ICGI), volume 5278 of Lecture Notes in Computer Science, pages 163–174. Springer, 2008.
16.
go back to reference Alexander Clark and Franck Thollard. PAC-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research, 5:473–497, 2004.MathSciNetMATH Alexander Clark and Franck Thollard. PAC-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research, 5:473–497, 2004.MathSciNetMATH
17.
go back to reference Borja de Balle Pigem. Learning Finite-State Machines. Algorithmic and Statistical Aspects. PhD thesis, Universitat Politècnica de Catalunya, 2013. Borja de Balle Pigem. Learning Finite-State Machines. Algorithmic and Statistical Aspects. PhD thesis, Universitat Politècnica de Catalunya, 2013.
18.
go back to reference Colin de la Higuera. Grammatical Inference Learning Automata and Grammars. Cambridge University Press, 2010. Colin de la Higuera. Grammatical Inference Learning Automata and Grammars. Cambridge University Press, 2010.
19.
go back to reference Colin de la Higuera and José Oncina. Learning stochastic finite automata. In 7th Intl. Coll. on Grammatical Inference (ICGI), volume 3264 of Lecture Notes in Computer Science, pages 175–186. Springer, 2004. Colin de la Higuera and José Oncina. Learning stochastic finite automata. In 7th Intl. Coll. on Grammatical Inference (ICGI), volume 3264 of Lecture Notes in Computer Science, pages 175–186. Springer, 2004.
20.
go back to reference Colin de la Higuera and Franck Thollard. Identification in the limit with probability one of stochastic deterministic finite automata. In 5th Intl. Coll. on Grammatical Inference (ICGI), volume 1891 of Lecture Notes in Computer Science, pages 141–156. Springer, 2000. Colin de la Higuera and Franck Thollard. Identification in the limit with probability one of stochastic deterministic finite automata. In 5th Intl. Coll. on Grammatical Inference (ICGI), volume 1891 of Lecture Notes in Computer Science, pages 141–156. Springer, 2000.
21.
go back to reference François Denis and Yann Esposito. Learning classes of probabilistic automata. In 17th Annual Conference on Learning Theory (COLT), volume 3120 of Lecture Notes in Computer Science, pages 124–139. Springer, 2004. François Denis and Yann Esposito. Learning classes of probabilistic automata. In 17th Annual Conference on Learning Theory (COLT), volume 3120 of Lecture Notes in Computer Science, pages 124–139. Springer, 2004.
22.
go back to reference François Denis, Yann Esposito, and Amaury Habrard. Learning rational stochastic languages. In 19th Annual Conference on Learning Theory (COLT), volume 4005 of Lecture Notes in Computer Science, pages 274–288. Springer, 2006. François Denis, Yann Esposito, and Amaury Habrard. Learning rational stochastic languages. In 19th Annual Conference on Learning Theory (COLT), volume 4005 of Lecture Notes in Computer Science, pages 274–288. Springer, 2006.
23.
go back to reference Pierre Dupont and Juan-Carlos Amengual. Smoothing probabilistic automata: An error-correcting approach. In 5th Intl. Coll. on Grammatical Inference (ICGI 2000), volume 1891 of Lecture Notes in Computer Science, pages 51–64. Springer, 2000. Pierre Dupont and Juan-Carlos Amengual. Smoothing probabilistic automata: An error-correcting approach. In 5th Intl. Coll. on Grammatical Inference (ICGI 2000), volume 1891 of Lecture Notes in Computer Science, pages 51–64. Springer, 2000.
24.
go back to reference Pierre Dupont, François Denis, and Yann Esposito. Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms. Pattern Recognition, 38(9):1349–1371, 2005.CrossRefMATH Pierre Dupont, François Denis, and Yann Esposito. Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms. Pattern Recognition, 38(9):1349–1371, 2005.CrossRefMATH
25.
go back to reference M. Fliess. Matrices de Hankel. J. Math. Pures Appl., 53:197–222, 1974. (Erratum in vol. 54, 1975.). M. Fliess. Matrices de Hankel. J. Math. Pures Appl., 53:197–222, 1974. (Erratum in vol. 54, 1975.).
26.
go back to reference Ricard Gavaldà, Philipp W. Keller, Joelle Pineau, and Doina Precup. PAC-learning of Markov models with hidden state. In 17th European Conf. on Machine Learning (ECML), volume 4212 of Lecture Notes in Computer Science, pages 150–161. Springer, 2006. Ricard Gavaldà, Philipp W. Keller, Joelle Pineau, and Doina Precup. PAC-learning of Markov models with hidden state. In 17th European Conf. on Machine Learning (ECML), volume 4212 of Lecture Notes in Computer Science, pages 150–161. Springer, 2006.
27.
go back to reference E. Mark Gold. Language identification in the limit. Information and Control, 10(5):447–474, 1967.CrossRef E. Mark Gold. Language identification in the limit. Information and Control, 10(5):447–474, 1967.CrossRef
28.
go back to reference Omri Guttman, S. V. N. Vishwanathan, and Robert C. Williamson. Learnability of probabilistic automata via oracles. In 16th Intl. Conf. on Algorithmic Learning Theory (ALT), pages 171–182, 2005. Omri Guttman, S. V. N. Vishwanathan, and Robert C. Williamson. Learnability of probabilistic automata via oracles. In 16th Intl. Conf. on Algorithmic Learning Theory (ALT), pages 171–182, 2005.
29.
go back to reference Daniel Hsu, Sham M. Kakade, and Tong Zhang. A spectral algorithm for learning hidden Markov models. In 22nd Annual Conference on Learning Theory (COLT). ACM, 2009. Daniel Hsu, Sham M. Kakade, and Tong Zhang. A spectral algorithm for learning hidden Markov models. In 22nd Annual Conference on Learning Theory (COLT). ACM, 2009.
30.
go back to reference Michael J. Kearns, Yishay Mansour, Dana Ron, Ronitt Rubinfeld, Robert E. Schapire, and Linda Sellie. On the learnability of discrete distributions. In 26th Annual ACM Symp. on Theory of Computing (STOC), pages 273–282. ACM, 1994. Michael J. Kearns, Yishay Mansour, Dana Ron, Ronitt Rubinfeld, Robert E. Schapire, and Linda Sellie. On the learnability of discrete distributions. In 26th Annual ACM Symp. on Theory of Computing (STOC), pages 273–282. ACM, 1994.
31.
go back to reference Christopher Kermorvant and Pierre Dupont. Stochastic grammatical inference with multinomial tests. In 6th Intl. Coll. on Grammatical Inference (ICGI), volume 2484 of Lecture Notes in Computer Science, pages 149–160. Springer, 2002. Christopher Kermorvant and Pierre Dupont. Stochastic grammatical inference with multinomial tests. In 6th Intl. Coll. on Grammatical Inference (ICGI), volume 2484 of Lecture Notes in Computer Science, pages 149–160. Springer, 2002.
32.
go back to reference Franco M. Luque, Ariadna Quattoni, Borja Balle, and Xavier Carreras. Spectral learning for non-deterministic dependency parsing. In 13th Conf. of the European Chapter of the Association for Computational Linguistics (EACL), pages 409–419. The Association for Computer Linguistics, 2012. Franco M. Luque, Ariadna Quattoni, Borja Balle, and Xavier Carreras. Spectral learning for non-deterministic dependency parsing. In 13th Conf. of the European Chapter of the Association for Computational Linguistics (EACL), pages 409–419. The Association for Computer Linguistics, 2012.
33.
go back to reference Elchanan Mossel and Sébastien Roch. Learning nonsingular phylogenies and hidden Markov models. In 37th Annual ACM Symp. on Theory of Computing (STOC), pages 366–375. ACM, 2005. Elchanan Mossel and Sébastien Roch. Learning nonsingular phylogenies and hidden Markov models. In 37th Annual ACM Symp. on Theory of Computing (STOC), pages 366–375. ACM, 2005.
34.
go back to reference José Oncina and Pedro García. Identifying regular languages in polynomial. In Advances in Structural and Syntactic Pattern Recognition, pages 99–108. World Scientific, 1992. José Oncina and Pedro García. Identifying regular languages in polynomial. In Advances in Structural and Syntactic Pattern Recognition, pages 99–108. World Scientific, 1992.
35.
go back to reference Nick Palmer and Paul W. Goldberg. PAC-learnability of probabilistic deterministic finite state automata in terms of variation distance. Theor. Comput. Sci., 387(1):18–31, 2007.MathSciNetCrossRefMATH Nick Palmer and Paul W. Goldberg. PAC-learnability of probabilistic deterministic finite state automata in terms of variation distance. Theor. Comput. Sci., 387(1):18–31, 2007.MathSciNetCrossRefMATH
36.
go back to reference Dana Ron, Yoram Singer, and Naftali Tishby. The power of amnesia: Learning probabilistic automata with variable memory length. Machine Learning, 25(2-3):117–149, 1996.CrossRefMATH Dana Ron, Yoram Singer, and Naftali Tishby. The power of amnesia: Learning probabilistic automata with variable memory length. Machine Learning, 25(2-3):117–149, 1996.CrossRefMATH
37.
go back to reference Steven Rudich. Inferring the structure of a Markov chain from its output. In 26th Annual Symp. on Foundations of Computer Science (FOCS), pages 321–326, 1985. Steven Rudich. Inferring the structure of a Markov chain from its output. In 26th Annual Symp. on Foundations of Computer Science (FOCS), pages 321–326, 1985.
39.
go back to reference Cosma Rohilla Shalizi and Kristina Lisa Shalizi. Blind construction of optimal nonlinear recursive predictors for discrete sequences. In 20th Conf. on Uncertainty in Artificial Intelligence (UAI), pages 504–511, 2004. Cosma Rohilla Shalizi and Kristina Lisa Shalizi. Blind construction of optimal nonlinear recursive predictors for discrete sequences. In 20th Conf. on Uncertainty in Artificial Intelligence (UAI), pages 504–511, 2004.
40.
go back to reference Gilbert Strang. Introduction to Linear Algebra, 4th edition. Wellesley-Cambridge Press and SIAM, 2009. Gilbert Strang. Introduction to Linear Algebra, 4th edition. Wellesley-Cambridge Press and SIAM, 2009.
41.
go back to reference Sebastiaan Terwijn. On the learnability of hidden Markov models. In 6th Intl. Coll. on Grammatical Inference (ICGI), volume 2484 of Lecture Notes in Computer Science, pages 261–268. Springer, 2002. Sebastiaan Terwijn. On the learnability of hidden Markov models. In 6th Intl. Coll. on Grammatical Inference (ICGI), volume 2484 of Lecture Notes in Computer Science, pages 261–268. Springer, 2002.
42.
go back to reference Franck Thollard. Improving probabilistic grammatical inference core algorithms with post-processing techniques. In 18th Intl. Conf. on Machine Learning (ICML 2001), pages 561–568, 2001. Franck Thollard. Improving probabilistic grammatical inference core algorithms with post-processing techniques. In 18th Intl. Conf. on Machine Learning (ICML 2001), pages 561–568, 2001.
43.
go back to reference Franck Thollard, Pierre Dupont, and Colin de la Higuera. Probabilistic DFA inference using Kullback-Leibler divergence and minimality. In 17th Intl. Conf. on Machine Learning (ICML), pages 975–982. Morgan Kaufmann, 2000. Franck Thollard, Pierre Dupont, and Colin de la Higuera. Probabilistic DFA inference using Kullback-Leibler divergence and minimality. In 17th Intl. Conf. on Machine Learning (ICML), pages 975–982. Morgan Kaufmann, 2000.
44.
45.
go back to reference Enrique Vidal, Franck Thollard, Colin de la Higuera, Francisco Casacuberta, and Rafael C. Carrasco. Probabilistic finite-state machines - part i. IEEE Trans. Pattern Anal. Mach. Intell., 27(7):1013–1025, 2005.CrossRef Enrique Vidal, Franck Thollard, Colin de la Higuera, Francisco Casacuberta, and Rafael C. Carrasco. Probabilistic finite-state machines - part i. IEEE Trans. Pattern Anal. Mach. Intell., 27(7):1013–1025, 2005.CrossRef
46.
go back to reference Enrique Vidal, Franck Thollard, Colin de la Higuera, Francisco Casacuberta, and Rafael C. Carrasco. Probabilistic finite-state machines - part ii. IEEE Trans. Pattern Anal. Mach. Intell., 27(7):1026–1039, 2005.CrossRef Enrique Vidal, Franck Thollard, Colin de la Higuera, Francisco Casacuberta, and Rafael C. Carrasco. Probabilistic finite-state machines - part ii. IEEE Trans. Pattern Anal. Mach. Intell., 27(7):1026–1039, 2005.CrossRef
47.
go back to reference Lloyd R. Welch. Hidden Markov Models and the Baum–Welch Algorithm. IEEE Information Theory Society Newsletter, 53(4), 2003. Lloyd R. Welch. Hidden Markov Models and the Baum–Welch Algorithm. IEEE Information Theory Society Newsletter, 53(4), 2003.
Metadata
Title
Learning Probability Distributions Generated by Finite-State Machines
Authors
Jorge Castro
Ricard Gavaldà
Copyright Year
2016
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-48395-4_5

Premium Partner