Skip to main content

2016 | OriginalPaper | Buchkapitel

8. Learning the Language of Biological Sequences

verfasst von : François Coste

Erschienen in: Topics in Grammatical Inference

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The application to biological sequences is an appealing challenge for Grammatical Inference. While some first successes have already been recorded, such as the inference of profile Hidden Markov Models or stochastic Context-Free Grammars which are now part of the classical Bioinformatics toolbox, it is still a nice and open source of problems or inspiration for our research, with the possibility to apply our ideas to real fundamental applications. In this chapter, we survey biological sequences’ main specificities and how they are handled in Pattern/Motif Discovery in order to introduce the important concepts and techniques used and present the latest successful approaches in that field by Grammatical Inference.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
This corresponds to the preservation constraint from [104] forbidding us to merge together the states resulting from merging a diagonal to prevent identified conserved words from being damaged.
 
Literatur
1.
Zurück zum Zitat Beadle, G.W., Beadle, M.: The language of life: an introduction to the science of genetics. American Institute of Biological Sciences (1966) Beadle, G.W., Beadle, M.: The language of life: an introduction to the science of genetics. American Institute of Biological Sciences (1966)
2.
Zurück zum Zitat Clancy, S., Brown, W.: Translation: DNA to mRNA to protein. Nature Education (2008) Clancy, S., Brown, W.: Translation: DNA to mRNA to protein. Nature Education (2008)
3.
Zurück zum Zitat Chomsky, N.: Syntactic Structures. Mouton (1957) Chomsky, N.: Syntactic Structures. Mouton (1957)
4.
Zurück zum Zitat Searls, D.B.: The computational linguistics of biological sequences. In Hunter, L., ed.: Artificial Intelligence and Molecular Biology. AAAI Press (1993) 47–120 Searls, D.B.: The computational linguistics of biological sequences. In Hunter, L., ed.: Artificial Intelligence and Molecular Biology. AAAI Press (1993) 47–120
5.
Zurück zum Zitat Searls, D.B.: Linguistic approaches to biological sequences. Computer Applications in the Biosciences 13 (1997) 333–344 Searls, D.B.: Linguistic approaches to biological sequences. Computer Applications in the Biosciences 13 (1997) 333–344
6.
7.
Zurück zum Zitat Chiang, D., Joshi, A.K., Searls, D.B.: Grammatical representations of macromolecular structure. Journal of Computational Biology 13 (2006) 1077–1100MathSciNetCrossRef Chiang, D., Joshi, A.K., Searls, D.B.: Grammatical representations of macromolecular structure. Journal of Computational Biology 13 (2006) 1077–1100MathSciNetCrossRef
8.
Zurück zum Zitat Searls, D.B.: A primer in macromolecular linguistics. Biopolymers 99 (2013) 203–17 Searls, D.B.: A primer in macromolecular linguistics. Biopolymers 99 (2013) 203–17
9.
Zurück zum Zitat Joshi, A.K., Weir, D.J., Vijay-Shanker, K.: The convergence of mildly context-sensitive grammar formalisms. Technical Report MS-CIS-90-01, University of Pennsylvania (1990) Joshi, A.K., Weir, D.J., Vijay-Shanker, K.: The convergence of mildly context-sensitive grammar formalisms. Technical Report MS-CIS-90-01, University of Pennsylvania (1990)
10.
Zurück zum Zitat Dong, S., Searls, D.B.: Gene structure prediction by linguistic methods. Genomics 23 (1994) 540–551CrossRef Dong, S., Searls, D.B.: Gene structure prediction by linguistic methods. Genomics 23 (1994) 540–551CrossRef
11.
Zurück zum Zitat Nicolas, F., Rivals, E.: Hardness results for the center and median string problems under the weighted and unweighted edit distances. J. Discrete Algorithms 3 (2005) 390–415MathSciNetCrossRefMATH Nicolas, F., Rivals, E.: Hardness results for the center and median string problems under the weighted and unweighted edit distances. J. Discrete Algorithms 3 (2005) 390–415MathSciNetCrossRefMATH
12.
Zurück zum Zitat Dsouza, M., Larsen, N., Overbeek, R.: Searching for patterns in genomic data. Trends in Genetics 13 (1997) 497–498CrossRef Dsouza, M., Larsen, N., Overbeek, R.: Searching for patterns in genomic data. Trends in Genetics 13 (1997) 497–498CrossRef
13.
Zurück zum Zitat Pesole, G., Liuni, S., D’Souza, M.: Patsearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics 16 (2000) 439–450CrossRef Pesole, G., Liuni, S., D’Souza, M.: Patsearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics 16 (2000) 439–450CrossRef
14.
Zurück zum Zitat Belleannée, C., Sallou, O., Nicolas, J.: Logol: Expressive Pattern Matching in Sequences. Application to Ribosomal Frameshift Modeling. In Comin, M., Kall, L., Marchiori, E., Ngom, A., Rajapakse, J., eds.: PRIB2014 - Pattern Recognition in Bioinformatics, 9th IAPR International Conference. Volume 8626 of Lecture Notes in Computer Science, Stockholm, Springer (2014) 34–47 Belleannée, C., Sallou, O., Nicolas, J.: Logol: Expressive Pattern Matching in Sequences. Application to Ribosomal Frameshift Modeling. In Comin, M., Kall, L., Marchiori, E., Ngom, A., Rajapakse, J., eds.: PRIB2014 - Pattern Recognition in Bioinformatics, 9th IAPR International Conference. Volume 8626 of Lecture Notes in Computer Science, Stockholm, Springer (2014) 34–47
15.
Zurück zum Zitat Macke, T.J., Ecker, D.J., Gutell, R.R., Gautheret, D., Case, D.A., Sampath, R.: Rnamotif, an RNA secondary structure definition and search algorithm. Nucleic acids research 29 (2001) 4724–4735CrossRef Macke, T.J., Ecker, D.J., Gutell, R.R., Gautheret, D., Case, D.A., Sampath, R.: Rnamotif, an RNA secondary structure definition and search algorithm. Nucleic acids research 29 (2001) 4724–4735CrossRef
16.
Zurück zum Zitat Eddy, S.: RNABOB: a program to search for RNA secondary structure motifs in sequence databases (1996) Eddy, S.: RNABOB: a program to search for RNA secondary structure motifs in sequence databases (1996)
17.
Zurück zum Zitat Graf, S., Strothmann, D., Kurtz, S., Steger, G.: Hypalib: a database of RNAs and RNA structural elements defined by hybrid patterns. Nucleic Acids Res. 29 (2001) 196–198 Graf, S., Strothmann, D., Kurtz, S., Steger, G.: Hypalib: a database of RNAs and RNA structural elements defined by hybrid patterns. Nucleic Acids Res. 29 (2001) 196–198
18.
Zurück zum Zitat Strothmann, D., Gräf, S.A., Kurtz, S., Steger, G.: The syntax and semantics of a language for describing complex patterns in biological sequences. Technical report, Universität Bielefeld, Technische Fakultät, Arbeitsgruppe Praktische Informatik (2000) Strothmann, D., Gräf, S.A., Kurtz, S., Steger, G.: The syntax and semantics of a language for describing complex patterns in biological sequences. Technical report, Universität Bielefeld, Technische Fakultät, Arbeitsgruppe Praktische Informatik (2000)
19.
Zurück zum Zitat Billoud, B., Kontic, M., Viari, A.: Palingol: a declarative programming language to describe nucleic acids’ secondary structures and to scan sequence database. Nucleic Acids Res 24 (1996) 395–403CrossRef Billoud, B., Kontic, M., Viari, A.: Palingol: a declarative programming language to describe nucleic acids’ secondary structures and to scan sequence database. Nucleic Acids Res 24 (1996) 395–403CrossRef
20.
Zurück zum Zitat Meyer, F., Kurtz, S., Backofen, R., Will, S., Beckstette, M.: Structator: fast index-based search for RNA sequence-structure patterns. BMC Bioinformatics 12 (2011) 214CrossRef Meyer, F., Kurtz, S., Backofen, R., Will, S., Beckstette, M.: Structator: fast index-based search for RNA sequence-structure patterns. BMC Bioinformatics 12 (2011) 214CrossRef
21.
Zurück zum Zitat Pribnow, D.: Nucleotide sequence of an RNA polymerase binding site at an early t7 promoter. Proceedings of the National Academy of Sciences of the United States of America 72 (1975) 784–8CrossRef Pribnow, D.: Nucleotide sequence of an RNA polymerase binding site at an early t7 promoter. Proceedings of the National Academy of Sciences of the United States of America 72 (1975) 784–8CrossRef
22.
Zurück zum Zitat van Helden, J.: The Analysis of Regulatory Sequences. In: Multiple Aspects of DNA and RNA: from Biophysics to Bioinformatics: Lecture Notes of the Les Houches Summer School 2004. Gulf Professional Publishing (2005) van Helden, J.: The Analysis of Regulatory Sequences. In: Multiple Aspects of DNA and RNA: from Biophysics to Bioinformatics: Lecture Notes of the Les Houches Summer School 2004. Gulf Professional Publishing (2005)
23.
Zurück zum Zitat Parida, L.: Pattern Discovery in Bioinformatics: Theory & Algorithms. Chapman & Hall/CRC (2007) Parida, L.: Pattern Discovery in Bioinformatics: Theory & Algorithms. Chapman & Hall/CRC (2007)
24.
Zurück zum Zitat Stormo, G.D., Schneider, T.D., Gold, L., Ehrenfeucht, A.: Use of the "perceptron" algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res 10 (1982) 2997–3011 Stormo, G.D., Schneider, T.D., Gold, L., Ehrenfeucht, A.: Use of the "perceptron" algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res 10 (1982) 2997–3011
25.
Zurück zum Zitat Schneider, T.D., Stormo, G.D., Gold, L., Ehrenfeucht, A.: Information content of binding sites on nucleotide sequences. Journal of molecular biology 188 (1986) 415–31 Schneider, T.D., Stormo, G.D., Gold, L., Ehrenfeucht, A.: Information content of binding sites on nucleotide sequences. Journal of molecular biology 188 (1986) 415–31
26.
Zurück zum Zitat Schneider, T.: Information theory primer (1995) Schneider, T.: Information theory primer (1995)
27.
Zurück zum Zitat Crooks, G.E., Hon, G., Chandonia, J.M., Brenner, S.E.: Weblogo: a sequence logo generator. Genome Res 14 (2004) 1188–1190CrossRef Crooks, G.E., Hon, G., Chandonia, J.M., Brenner, S.E.: Weblogo: a sequence logo generator. Genome Res 14 (2004) 1188–1190CrossRef
29.
Zurück zum Zitat Hertz, G.Z., Hartzell, 3rd, G., Stormo, G.D.: Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput Appl Biosci 6 (1990) 81–92 Hertz, G.Z., Hartzell, 3rd, G., Stormo, G.D.: Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput Appl Biosci 6 (1990) 81–92
30.
Zurück zum Zitat Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15 (1999) 563–577CrossRef Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15 (1999) 563–577CrossRef
31.
Zurück zum Zitat Stormo, G.D., Hartzell, 3rd, G.: Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci U S A 86 (1989) 1183–1187CrossRef Stormo, G.D., Hartzell, 3rd, G.: Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci U S A 86 (1989) 1183–1187CrossRef
32.
Zurück zum Zitat Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2 (1994) 28–36 Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2 (1994) 28–36
33.
Zurück zum Zitat Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262 (1993) 208–214 Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262 (1993) 208–214
34.
Zurück zum Zitat Neuwald, A.F., Liu, J.S., Lawrence, C.E.: Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci 4 (1995) 1618–1632CrossRef Neuwald, A.F., Liu, J.S., Lawrence, C.E.: Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci 4 (1995) 1618–1632CrossRef
35.
Zurück zum Zitat Neuwald, A.F., Liu, J.S., Lipman, D.J., Lawrence, C.E.: Extracting protein alignment models from the sequence database. Nucleic Acids Res 25 (1997) 1665–1677CrossRef Neuwald, A.F., Liu, J.S., Lipman, D.J., Lawrence, C.E.: Extracting protein alignment models from the sequence database. Nucleic Acids Res 25 (1997) 1665–1677CrossRef
36.
Zurück zum Zitat Roth, F.P., Hughes, J.D., Estep, P.W., Church, G.M.: Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16 (1998) 939–945CrossRef Roth, F.P., Hughes, J.D., Estep, P.W., Church, G.M.: Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16 (1998) 939–945CrossRef
37.
Zurück zum Zitat Thijs, G., Lescot, M., Marchal, K., Rombauts, S., De Moor, B., Rouzé, P., Moreau, Y.: A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17 (2001) 1113–1122 Thijs, G., Lescot, M., Marchal, K., Rombauts, S., De Moor, B., Rouzé, P., Moreau, Y.: A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17 (2001) 1113–1122
38.
Zurück zum Zitat Liu, X., Brutlag, D.L., Liu, J.S.: Bioprospector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput (2001) 127–138 Liu, X., Brutlag, D.L., Liu, J.S.: Bioprospector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput (2001) 127–138
39.
Zurück zum Zitat Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., Voss, N., Stegmaier, P., Lewicki-Potapov, B., Saxel, H., Kel, A.E., Wingender, E.: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Research 34 (2006) D108–D110 Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., Voss, N., Stegmaier, P., Lewicki-Potapov, B., Saxel, H., Kel, A.E., Wingender, E.: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Research 34 (2006) D108–D110
40.
Zurück zum Zitat Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W., Lenhard, B.: Jaspar: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research 32 (2004) D91–D94CrossRef Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W., Lenhard, B.: Jaspar: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research 32 (2004) D91–D94CrossRef
41.
Zurück zum Zitat Taylor, W.R.: The classification of amino acid conservation. J Theor Biol 119 (1986) 205–218CrossRef Taylor, W.R.: The classification of amino acid conservation. J Theor Biol 119 (1986) 205–218CrossRef
42.
Zurück zum Zitat Eddy, S.R.: Where did the BLOSUM62 alignment score matrix come from? Nat Biotechnol 22 (2004) 1035–1036 Eddy, S.R.: Where did the BLOSUM62 alignment score matrix come from? Nat Biotechnol 22 (2004) 1035–1036
43.
Zurück zum Zitat Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48 (1970) 443–453 Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48 (1970) 443–453
44.
Zurück zum Zitat Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147 (1981) 195–197 Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147 (1981) 195–197
45.
Zurück zum Zitat Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85 (1988) 2444–2448CrossRef Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85 (1988) 2444–2448CrossRef
46.
Zurück zum Zitat Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: A basic local alignment search tool. J. Mol. Biol. 215 (1990) 403–410CrossRef Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: A basic local alignment search tool. J. Mol. Biol. 215 (1990) 403–410CrossRef
47.
Zurück zum Zitat Thompson, J.D., Higgins, D.G., Gibson, T.J.: Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22 (1994) 4673–4680CrossRef Thompson, J.D., Higgins, D.G., Gibson, T.J.: Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22 (1994) 4673–4680CrossRef
48.
Zurück zum Zitat Notredame, C., Higgins, D.G., Heringa, J.: T-coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302 (2000) 205–217CrossRef Notredame, C., Higgins, D.G., Heringa, J.: T-coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302 (2000) 205–217CrossRef
49.
Zurück zum Zitat Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: Probcons: Probabilistic consistency-based multiple sequence alignment. Genome Res 15 (2005) 330–340CrossRef Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: Probcons: Probabilistic consistency-based multiple sequence alignment. Genome Res 15 (2005) 330–340CrossRef
50.
Zurück zum Zitat Edgar, R.C.: Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32 (2004) 1792–1797CrossRef Edgar, R.C.: Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32 (2004) 1792–1797CrossRef
51.
Zurück zum Zitat Katoh, K., Misawa, K., Kuma, K.i., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30 (2002) 3059–3066 Katoh, K., Misawa, K., Kuma, K.i., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30 (2002) 3059–3066
52.
Zurück zum Zitat Morgenstern, B., Frech, K., Dress, A., Werner, T.: Dialign: finding local similarities by multiple sequence alignment. Bioinformatics 14 (1998) 290–294CrossRef Morgenstern, B., Frech, K., Dress, A., Werner, T.: Dialign: finding local similarities by multiple sequence alignment. Bioinformatics 14 (1998) 290–294CrossRef
53.
Zurück zum Zitat Morgenstern, B.: Dialign 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15 (1999) 211–218CrossRef Morgenstern, B.: Dialign 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15 (1999) 211–218CrossRef
54.
Zurück zum Zitat Eddy, S.R.: Profile hidden markov models. Bioinformatics 14 (1998) 755–763CrossRef Eddy, S.R.: Profile hidden markov models. Bioinformatics 14 (1998) 755–763CrossRef
55.
Zurück zum Zitat Gribskov, M., McLachlan, A.D., Eisenberg, D.: Profile analysis: detection of distantly related proteins. Proceedings of the National Academy of Sciences of the United States of America 84 (1987) 4355–8 Gribskov, M., McLachlan, A.D., Eisenberg, D.: Profile analysis: detection of distantly related proteins. Proceedings of the National Academy of Sciences of the United States of America 84 (1987) 4355–8
56.
Zurück zum Zitat Krogh, A., Brown, M., Mian, I.S., Sjölander, K., Haussler, D.: Hidden Markov models in computational biology. applications to protein modeling. Journal of molecular biology 235 (1994) 1501–31 Krogh, A., Brown, M., Mian, I.S., Sjölander, K., Haussler, D.: Hidden Markov models in computational biology. applications to protein modeling. Journal of molecular biology 235 (1994) 1501–31
57.
Zurück zum Zitat Baldi, P., Chauvin, Y., Hunkapiller, T., McClure, M.A.: Hidden Markov models of biological primary sequence information. Proceedings of the National Academy of Sciences of the United States of America 91 (1994) 1059–63 Baldi, P., Chauvin, Y., Hunkapiller, T., McClure, M.A.: Hidden Markov models of biological primary sequence information. Proceedings of the National Academy of Sciences of the United States of America 91 (1994) 1059–63
58.
Zurück zum Zitat Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of the IEEE. (1989) 257–286 Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of the IEEE. (1989) 257–286
59.
Zurück zum Zitat Henikoff, J.G., Henikoff, S.: Using substitution probabilities to improve position-specific scoring matrices. Computer applications in the biosciences : CABIOS 12 (1996) 135–43 Henikoff, J.G., Henikoff, S.: Using substitution probabilities to improve position-specific scoring matrices. Computer applications in the biosciences : CABIOS 12 (1996) 135–43
60.
Zurück zum Zitat Claverie, J.M.: Some useful statistical properties of position-weight matrices. Comput Chem 18 (1994) 287–294CrossRefMATH Claverie, J.M.: Some useful statistical properties of position-weight matrices. Comput Chem 18 (1994) 287–294CrossRefMATH
61.
Zurück zum Zitat Sjölander, K., Karplus, K., Brown, M., Hughey, R., Krogh, A., Mian, I., Haussler, D.: Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology. Computer applications in the biosciences : CABIOS 12 (1996) 327–345 Sjölander, K., Karplus, K., Brown, M., Hughey, R., Krogh, A., Mian, I., Haussler, D.: Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology. Computer applications in the biosciences : CABIOS 12 (1996) 327–345
62.
Zurück zum Zitat Brown, M., Hughey, R., Krogh, A., Mian, I.S., Sjölander, K., Haussler, D.: Using Dirichlet mixture priors to derive hidden Markov models for protein families. In Hunter, L., Searls, D.B., Shavlik, J.W., eds.: Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology, Bethesda, MD, USA, July 1993, AAAI (1993) 47–55 Brown, M., Hughey, R., Krogh, A., Mian, I.S., Sjölander, K., Haussler, D.: Using Dirichlet mixture priors to derive hidden Markov models for protein families. In Hunter, L., Searls, D.B., Shavlik, J.W., eds.: Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology, Bethesda, MD, USA, July 1993, AAAI (1993) 47–55
63.
Zurück zum Zitat Hughey, R., Krogh, A.: Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput Appl Biosci 12 (1996) 95–107 Hughey, R., Krogh, A.: Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput Appl Biosci 12 (1996) 95–107
64.
Zurück zum Zitat Sonnhammer, E.L., Eddy, S.R., Durbin, R.: Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28 (1997) 405–420CrossRef Sonnhammer, E.L., Eddy, S.R., Durbin, R.: Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28 (1997) 405–420CrossRef
65.
Zurück zum Zitat Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E.L.L., Tate, J., Punta, M.: Pfam: the protein families database. Nucleic Acids Res (2013) Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E.L.L., Tate, J., Punta, M.: Pfam: the protein families database. Nucleic Acids Res (2013)
66.
Zurück zum Zitat Haft, D.H., Selengut, J.D., Richter, R.A., Harkins, D., Basu, M.K., Beck, E.: TIGRFAMS and genome properties in 2013. Nucleic Acids Res 41 (2013) D387–D395 Haft, D.H., Selengut, J.D., Richter, R.A., Harkins, D., Basu, M.K., Beck, E.: TIGRFAMS and genome properties in 2013. Nucleic Acids Res 41 (2013) D387–D395
67.
Zurück zum Zitat Moult, J.: A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 15 (2005) 285–289 Moult, J.: A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 15 (2005) 285–289
68.
Zurück zum Zitat Gough, J., Karplus, K., Hughey, R., Chothia, C.: Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol 313 (2001) 903–919 Gough, J., Karplus, K., Hughey, R., Chothia, C.: Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol 313 (2001) 903–919
69.
Zurück zum Zitat Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25 (1997) 3389–3402 Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25 (1997) 3389–3402
70.
Zurück zum Zitat UniProt: Update on activities at the universal protein resource (UniProt) in 2013. Nucleic Acids Res 41 (2013) D43–D47 UniProt: Update on activities at the universal protein resource (UniProt) in 2013. Nucleic Acids Res 41 (2013) D43–D47
71.
Zurück zum Zitat Pruitt, K.D., Tatusova, T., Maglott, D.R.: Ncbi reference sequence (refseq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33 (2005) D501–D504CrossRef Pruitt, K.D., Tatusova, T., Maglott, D.R.: Ncbi reference sequence (refseq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33 (2005) D501–D504CrossRef
72.
Zurück zum Zitat Karplus, K.: Hidden Markov models for detecting remote protein homologies. Bioinformatics 14 (1998) 846–865CrossRef Karplus, K.: Hidden Markov models for detecting remote protein homologies. Bioinformatics 14 (1998) 846–865CrossRef
73.
Zurück zum Zitat Karplus, K., Karchin, R., Barrett, C., Tu, S., Cline, M., Diekhans, M., Grate, L., Casper, J., Hughey, R.: What is the value added by human intervention in protein structure prediction? Proteins Suppl 5 (2001) 86–91CrossRef Karplus, K., Karchin, R., Barrett, C., Tu, S., Cline, M., Diekhans, M., Grate, L., Casper, J., Hughey, R.: What is the value added by human intervention in protein structure prediction? Proteins Suppl 5 (2001) 86–91CrossRef
74.
Zurück zum Zitat Karplus, K., Karchin, R., Draper, J., Casper, J., Mandel-Gutfreund, Y., Diekhans, M., Hughey, R.: Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins 53 Suppl 6 (2003) 491–496CrossRef Karplus, K., Karchin, R., Draper, J., Casper, J., Mandel-Gutfreund, Y., Diekhans, M., Hughey, R.: Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins 53 Suppl 6 (2003) 491–496CrossRef
76.
Zurück zum Zitat Söding, J.: Protein homology detection by HMM-HMM comparison. Bioinformatics 21 (2005) 951–960CrossRef Söding, J.: Protein homology detection by HMM-HMM comparison. Bioinformatics 21 (2005) 951–960CrossRef
77.
Zurück zum Zitat Remmert, M., Biegert, A., Hauser, A., Söding, J.: HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9 (2012) 173–175 Remmert, M., Biegert, A., Hauser, A., Söding, J.: HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9 (2012) 173–175
78.
Zurück zum Zitat Wheeler, T.J., Eddy, S.R.: nhmmer: DNA homology search with profile hmms. Bioinformatics 29 (2013) 2487–2489CrossRef Wheeler, T.J., Eddy, S.R.: nhmmer: DNA homology search with profile hmms. Bioinformatics 29 (2013) 2487–2489CrossRef
79.
Zurück zum Zitat Wheeler, T.J., Clements, J., Eddy, S.R., Hubley, R., Jones, T.A., Jurka, J., Smit, A.F.A., Finn, R.D.: Dfam: a database of repetitive DNA based on profile hidden markov models. Nucleic Acids Res 41 (2013) D70–D82CrossRef Wheeler, T.J., Clements, J., Eddy, S.R., Hubley, R., Jones, T.A., Jurka, J., Smit, A.F.A., Finn, R.D.: Dfam: a database of repetitive DNA based on profile hidden markov models. Nucleic Acids Res 41 (2013) D70–D82CrossRef
80.
Zurück zum Zitat Eddy, S.R.: A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics 3 (2002) 18CrossRef Eddy, S.R.: A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics 3 (2002)  18CrossRef
81.
Zurück zum Zitat Sakakibara, Y., Brown, M., Hughey, R., Mian, I.S., Sjölander, K., Underwood, R.C., Haussler, D.: Recent methods for RNA modeling using stochastic context-free grammars. In: Proceedings of the Asilomar Conference on Combinatorial Pattern Matching, New York, NY, Springer-Verlag (1994) 289–306 Sakakibara, Y., Brown, M., Hughey, R., Mian, I.S., Sjölander, K., Underwood, R.C., Haussler, D.: Recent methods for RNA modeling using stochastic context-free grammars. In: Proceedings of the Asilomar Conference on Combinatorial Pattern Matching, New York, NY, Springer-Verlag (1994) 289–306
82.
Zurück zum Zitat Eddy, S.R., Durbin, R.: RNA sequence analysis using covariance models. Nucleic Acids Res 22 (1994) 2079–2088 Eddy, S.R., Durbin, R.: RNA sequence analysis using covariance models. Nucleic Acids Res 22 (1994) 2079–2088
83.
Zurück zum Zitat Burge, S.W., Daub, J., Eberhardt, R., Tate, J., Barquist, L., Nawrocki, E.P., Eddy, S.R., Gardner, P.P., Bateman, A.: Rfam 11.0: 10 years of RNA families. Nucleic Acids Res 41 (2013) D226–D232CrossRef Burge, S.W., Daub, J., Eberhardt, R., Tate, J., Barquist, L., Nawrocki, E.P., Eddy, S.R., Gardner, P.P., Bateman, A.: Rfam 11.0: 10 years of RNA families. Nucleic Acids Res 41 (2013) D226–D232CrossRef
84.
Zurück zum Zitat Nawrocki, E.P., Eddy, S.R.: Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29 (2013) 2933–2935CrossRef Nawrocki, E.P., Eddy, S.R.: Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29 (2013) 2933–2935CrossRef
85.
Zurück zum Zitat Uemura, Y., Hasegawa, A., Kobayashi, S., Yokomori, T.: Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science 210 (1999) 277–303 Uemura, Y., Hasegawa, A., Kobayashi, S., Yokomori, T.: Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science 210 (1999) 277–303
86.
Zurück zum Zitat Rivas, E., Eddy, S.: The language of RNA: a formal grammar that includes pseudoknots. Bioinformatics 16 (2000) 334CrossRef Rivas, E., Eddy, S.: The language of RNA: a formal grammar that includes pseudoknots. Bioinformatics 16 (2000) 334CrossRef
87.
Zurück zum Zitat Cai, L., Malmberg, R.L., Wu, Y.: Stochastic modeling of RNA pseudoknotted structures: a grammatical approach. Bioinformatics 19 Suppl 1 (2003) i66–i73CrossRef Cai, L., Malmberg, R.L., Wu, Y.: Stochastic modeling of RNA pseudoknotted structures: a grammatical approach. Bioinformatics 19 Suppl 1 (2003) i66–i73CrossRef
88.
Zurück zum Zitat Matsui, H., Sato, K., Sakakibara, Y.: Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures. Proc IEEE Comput Syst Bioinform Conf (2004) 290–299 Matsui, H., Sato, K., Sakakibara, Y.: Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures. Proc IEEE Comput Syst Bioinform Conf (2004) 290–299
89.
Zurück zum Zitat Grundy, W.N., Bailey, T.L., Elkan, C.P., Baker, M.E.: Meta-meme: motif-based hidden Markov models of protein families. Comput Appl Biosci 13 (1997) 397–406 Grundy, W.N., Bailey, T.L., Elkan, C.P., Baker, M.E.: Meta-meme: motif-based hidden Markov models of protein families. Comput Appl Biosci 13 (1997) 397–406
90.
Zurück zum Zitat Jonassen, I. Collins, J., Higgins, D.: Finding flexible patterns in unaligned protein sequences. Protein Science 4 (1995) 1587–1595 Jonassen, I. Collins, J., Higgins, D.: Finding flexible patterns in unaligned protein sequences. Protein Science 4 (1995) 1587–1595
91.
Zurück zum Zitat Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., Cuche, B.A., de Castro, E., Lachaize, C., Langendijk-Genevaux, P.S., Sigrist, C.J.A.: The 20 years of PROSITE. Nucleic Acids Res 36 (2008) D245–D249 Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., Cuche, B.A., de Castro, E., Lachaize, C., Langendijk-Genevaux, P.S., Sigrist, C.J.A.: The 20 years of PROSITE. Nucleic Acids Res 36 (2008) D245–D249
92.
Zurück zum Zitat Yokomori, T., Ishida, N., Kobayashi, S.: Learning local languages and its application to protein \(\alpha \)-chain identification. In: 27th Annual Hawaii International Conference on System Sciences (HICSS-27), January 4-7, 1994, Maui, Hawaii, USA, IEEE Computer Society (1994) 113–122 Yokomori, T., Ishida, N., Kobayashi, S.: Learning local languages and its application to protein \(\alpha \)-chain identification. In: 27th Annual Hawaii International Conference on System Sciences (HICSS-27), January 4-7, 1994, Maui, Hawaii, USA, IEEE Computer Society (1994) 113–122
93.
Zurück zum Zitat Yokomori, T., Kobayashi, S.: Learning local languages and their application to DNA sequence analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 1067–1079CrossRef Yokomori, T., Kobayashi, S.: Learning local languages and their application to DNA sequence analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 1067–1079CrossRef
94.
Zurück zum Zitat Garcia, P., Vidal, E., Oncina, J.: Learning locally testable languages in the strict sense. In: Proceedings of the International Conference on Algorithmic Learning Theory. (1990) 325–338 Garcia, P., Vidal, E., Oncina, J.: Learning locally testable languages in the strict sense. In: Proceedings of the International Conference on Algorithmic Learning Theory. (1990) 325–338
95.
Zurück zum Zitat Garcia, P., Vidal, E.: Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 920–925CrossRef Garcia, P., Vidal, E.: Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 920–925CrossRef
96.
Zurück zum Zitat Peris, P., López, D., Campos, M., Sempere, J.M.: Protein motif prediction by grammatical inference. In Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E., eds.: Ig TM. Volume 4201 of Lecture Notes in Computer Science, Springer (2006) 175–187 Peris, P., López, D., Campos, M., Sempere, J.M.: Protein motif prediction by grammatical inference. In Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E., eds.: Ig TM. Volume 4201 of Lecture Notes in Computer Science, Springer (2006) 175–187
97.
Zurück zum Zitat Peris, P., López, D., Campos, M.: IGTM: An algorithm to predict transmembrane domains and topology in proteins. BMC Bioinformatics 9 (2008) Peris, P., López, D., Campos, M.: IGTM: An algorithm to predict transmembrane domains and topology in proteins. BMC Bioinformatics 9 (2008)
98.
Zurück zum Zitat Garcia, P., Vidal, E., Casacuberta, F.: Local languages, the succesor method, and a step towards a general methodology for the inference of regular grammars. IEEE Trans. Pattern Anal. Mach. Intell. 9 (1987) 841–845CrossRef Garcia, P., Vidal, E., Casacuberta, F.: Local languages, the succesor method, and a step towards a general methodology for the inference of regular grammars. IEEE Trans. Pattern Anal. Mach. Intell. 9 (1987) 841–845CrossRef
99.
Zurück zum Zitat Oncina, J., Garcia, P.: Inferring regular languages in polynomial update time. In: Pattern Recognition and Image Analysis. (1992) 49–61 Oncina, J., Garcia, P.: Inferring regular languages in polynomial update time. In: Pattern Recognition and Image Analysis. (1992) 49–61
100.
Zurück zum Zitat Lang, K.J. In: Random DFA’s can be approximately learned from sparse uniform examples. Association for Computing Machinery (1992) 45–52 Lang, K.J. In: Random DFA’s can be approximately learned from sparse uniform examples. Association for Computing Machinery (1992) 45–52
101.
Zurück zum Zitat Lang, K.J., Pearlmutter, B.A., Price, R.A.: Results of the Abbadingo One DFA learning competition and a new evidence-driven state merging algorithm. In: Proceedings of the 4th International Colloquium on Grammatical Inference. ICGI ’98, London, UK, Springer-Verlag (1998) 1–12 Lang, K.J., Pearlmutter, B.A., Price, R.A.: Results of the Abbadingo One DFA learning competition and a new evidence-driven state merging algorithm. In: Proceedings of the 4th International Colloquium on Grammatical Inference. ICGI ’98, London, UK, Springer-Verlag (1998) 1–12
102.
Zurück zum Zitat Coste, F., Kerbellec, G., Idmont, B., Fredouille, D., Delamarche, C.: Apprentissage d’automates par fusions de paires de fragments significativement similaires et premières expérimentations sur les protéines MIP. In: JOBIM. (2004) Coste, F., Kerbellec, G., Idmont, B., Fredouille, D., Delamarche, C.: Apprentissage d’automates par fusions de paires de fragments significativement similaires et premières expérimentations sur les protéines MIP. In: JOBIM. (2004)
103.
Zurück zum Zitat Coste, F., Kerbellec, G.: A similar fragments merging approach to learn automata on proteins. In Gama, J., Camacho, R., Brazdil, P., Jorge, A., Torgo, L., eds.: ECML. Volume 3720 of Lecture Notes in Computer Science., Springer (2005) 522–529 Coste, F., Kerbellec, G.: A similar fragments merging approach to learn automata on proteins. In Gama, J., Camacho, R., Brazdil, P., Jorge, A., Torgo, L., eds.: ECML. Volume 3720 of Lecture Notes in Computer Science., Springer (2005) 522–529
104.
Zurück zum Zitat Coste, F., Kerbellec, G.: Learning Automata on Protein Sequences. In Denise, A., Durrens, P., Robin, S., Rocha, E., de Daruvar, A., Groppi, A., eds.: JOBIM, Bordeaux, France (2006) 199–210 Coste, F., Kerbellec, G.: Learning Automata on Protein Sequences. In Denise, A., Durrens, P., Robin, S., Rocha, E., de Daruvar, A., Groppi, A., eds.: JOBIM, Bordeaux, France (2006) 199–210
105.
Zurück zum Zitat Kerbellec, G.: Apprentissage d’automates modélisant des familles de séquences protéiques. PhD thesis, Université de Rennes 1 (2008) Kerbellec, G.: Apprentissage d’automates modélisant des familles de séquences protéiques. PhD thesis, Université de Rennes 1 (2008)
106.
Zurück zum Zitat Bretaudeau, A., Coste, F., Humily, F., Garczarek, L., Corguillé, G.L., Six, C., Ratin, M., Collin, O., Schluchter, W.M., Partensky, F.: Cyanolyase: a database of phycobilin lyase sequences, motifs and functions. Nucleic Acids Research 41 (2013) 396–401CrossRef Bretaudeau, A., Coste, F., Humily, F., Garczarek, L., Corguillé, G.L., Six, C., Ratin, M., Collin, O., Schluchter, W.M., Partensky, F.: Cyanolyase: a database of phycobilin lyase sequences, motifs and functions. Nucleic Acids Research 41 (2013) 396–401CrossRef
107.
Zurück zum Zitat Burgos, A., Coste, F., Kerbellec, G.: Learning automata on protein sequences by partial multiple sequence alignment. (in preparation) Burgos, A., Coste, F., Kerbellec, G.: Learning automata on protein sequences by partial multiple sequence alignment. (in preparation)
108.
Zurück zum Zitat Coste, F., Fredouille, D.: What is the Search Space for the Inference of Non Deterministic, Unambiguous and Deterministic Automata? Rapport de recherche RR-4907, INRIA (2003) Coste, F., Fredouille, D.: What is the Search Space for the Inference of Non Deterministic, Unambiguous and Deterministic Automata? Rapport de recherche RR-4907, INRIA (2003)
109.
Zurück zum Zitat Dyrka, W., Nebel, J.C.: A stochastic context free grammar based framework for analysis of protein sequences. BMC Bioinformatics 10 (2009) 323CrossRef Dyrka, W., Nebel, J.C.: A stochastic context free grammar based framework for analysis of protein sequences. BMC Bioinformatics 10 (2009) 323CrossRef
110.
Zurück zum Zitat Coste, F., Garet, G., Nicolas, J.: Local Substitutability for Sequence Generalization. In Heinz, J., de la Higuera, C., Oates, T., eds.: ICGI 2012. Volume 21 of JMLR Workshop and Conference Proceedings, University of Maryland, MIT Press (2012) 97–111 Coste, F., Garet, G., Nicolas, J.: Local Substitutability for Sequence Generalization. In Heinz, J., de la Higuera, C., Oates, T., eds.: ICGI 2012. Volume 21 of JMLR Workshop and Conference Proceedings, University of Maryland, MIT Press (2012) 97–111
111.
Zurück zum Zitat Clark, A., Eyraud, R.: Identification in the limit of substitutable context free languages. In Jain, S., Simon, H.U., Tomita, E., eds.: Proceedings of the 16th International Conference on Algorithmic Learning Theory, Springer-Verlag (2005) 283–296 Clark, A., Eyraud, R.: Identification in the limit of substitutable context free languages. In Jain, S., Simon, H.U., Tomita, E., eds.: Proceedings of the 16th International Conference on Algorithmic Learning Theory, Springer-Verlag (2005) 283–296
112.
Zurück zum Zitat Clark, A., Eyraud, R.: Polynomial identification in the limit of substitutable context-free languages. Journal of Machine Learning Research 8 (2007) 1725–1745MathSciNetMATH Clark, A., Eyraud, R.: Polynomial identification in the limit of substitutable context-free languages. Journal of Machine Learning Research 8 (2007) 1725–1745MathSciNetMATH
113.
Zurück zum Zitat Yoshinaka, R.: Identification in the limit of k, l-substitutable context-free languages. In Clark, A., Coste, F., Miclet, L., eds.: ICGI. Volume 5278 of Lecture Notes in Computer Science., Springer (2008) 266–279 Yoshinaka, R.: Identification in the limit of k, l-substitutable context-free languages. In Clark, A., Coste, F., Miclet, L., eds.: ICGI. Volume 5278 of Lecture Notes in Computer Science., Springer (2008) 266–279
114.
Zurück zum Zitat Harris, Z.: Distributional structure. Word 10 (1954) 146–162 Harris, Z.: Distributional structure. Word 10 (1954) 146–162
115.
Zurück zum Zitat Coste, F., Garet, G., Nicolas, J.: A bottom-up efficient algorithm learning substitutable languages from positive examples. In Clark, A., Kanazawa, M., Yoshinaka, R., eds.: ICGI 2014. Volume 34 of JMLR Workshop and Conference Proceedings. (2014) 49–63 Coste, F., Garet, G., Nicolas, J.: A bottom-up efficient algorithm learning substitutable languages from positive examples. In Clark, A., Kanazawa, M., Yoshinaka, R., eds.: ICGI 2014. Volume 34 of JMLR Workshop and Conference Proceedings. (2014) 49–63
116.
Zurück zum Zitat Nevill-Manning, C.G., Witten, I.H.: Compression and explanation using hierarchical grammars. The Computer Journal 40 (1997) 103–116CrossRefMATH Nevill-Manning, C.G., Witten, I.H.: Compression and explanation using hierarchical grammars. The Computer Journal 40 (1997) 103–116CrossRefMATH
117.
Zurück zum Zitat Cherniavsky, N., Lander, R.: Grammar-based compression of DNA sequences. In: DIMACS Working Group on the Burrows-Wheeler Transform. (2004) 21 Cherniavsky, N., Lander, R.: Grammar-based compression of DNA sequences. In: DIMACS Working Group on the Burrows-Wheeler Transform. (2004)  21
118.
Zurück zum Zitat Lanctot, J.K., Li, M., Yang, E.H.: Estimating DNA sequence entropy. In: ACM-SIAM Symposium on Discrete Algorithms. (2000) 409–418 Lanctot, J.K., Li, M., Yang, E.H.: Estimating DNA sequence entropy. In: ACM-SIAM Symposium on Discrete Algorithms. (2000) 409–418
119.
Zurück zum Zitat Apostolico, A., Lonardi, S.: Off-line compression by greedy textual substitution. Proceedings of the IEEE 88 (2000) 1733–1744CrossRef Apostolico, A., Lonardi, S.: Off-line compression by greedy textual substitution. Proceedings of the IEEE 88 (2000) 1733–1744CrossRef
120.
Zurück zum Zitat Apostolico, A., Lonardi, S.: Compression of biological sequences by greedy off-line textual substitution. In: Data Compression Conference. (2000) 143–153 Apostolico, A., Lonardi, S.: Compression of biological sequences by greedy off-line textual substitution. In: Data Compression Conference. (2000) 143–153
121.
Zurück zum Zitat Nevill-Manning, C., Witten, I.: On-line and off-line heuristics for inferring hierarchies of repetitions in sequences. In: Data Compression Conference, IEEE (2000) 1745–1755 Nevill-Manning, C., Witten, I.: On-line and off-line heuristics for inferring hierarchies of repetitions in sequences. In: Data Compression Conference, IEEE (2000) 1745–1755
122.
Zurück zum Zitat Carrascosa, R., Coste, F., Gallé, M., López, G.G.I.: The smallest grammar problem as constituents choice and minimal grammar parsing. Algorithms 4 (2011) 262–284MathSciNetCrossRef Carrascosa, R., Coste, F., Gallé, M., López, G.G.I.: The smallest grammar problem as constituents choice and minimal grammar parsing. Algorithms 4 (2011) 262–284MathSciNetCrossRef
123.
Zurück zum Zitat Carrascosa, R., Coste, F., Gallé, M., López, G.G.I.: Searching for smallest grammars on large sequences and application to DNA. J. Discrete Algorithms 11 (2012) 62–72MathSciNetCrossRefMATH Carrascosa, R., Coste, F., Gallé, M., López, G.G.I.: Searching for smallest grammars on large sequences and application to DNA. J. Discrete Algorithms 11 (2012) 62–72MathSciNetCrossRefMATH
124.
Zurück zum Zitat Brejova, B., Vinar, T., Li, M.: Pattern Discovery: Methods and Software. In Krawetz, S.A., Womble, D.D., eds.: Introduction to Bioinformatics. Humana Press (2003) 491–522 Brejova, B., Vinar, T., Li, M.: Pattern Discovery: Methods and Software. In Krawetz, S.A., Womble, D.D., eds.: Introduction to Bioinformatics. Humana Press (2003) 491–522
125.
Zurück zum Zitat Sakakibara, Y.: Grammatical inference in bioinformatics. IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 1051–1062CrossRef Sakakibara, Y.: Grammatical inference in bioinformatics. IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 1051–1062CrossRef
126.
Zurück zum Zitat Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1999) Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1999)
127.
Zurück zum Zitat Baldi, P., Brunak, S.: Bioinformatics: The Machine Learning Approach. 2nd edn. Cambridge: MIT Press (2001)MATH Baldi, P., Brunak, S.: Bioinformatics: The Machine Learning Approach. 2nd edn. Cambridge: MIT Press (2001)MATH
128.
Zurück zum Zitat de la Higuera, C.: Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, (2010) de la Higuera, C.: Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, (2010)
Metadaten
Titel
Learning the Language of Biological Sequences
verfasst von
François Coste
Copyright-Jahr
2016
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-48395-4_8