Skip to main content
Erschienen in: Artificial Intelligence Review 3/2019

22.11.2017

Comparison of algorithms to divide noisy phone sequences into syllables for automatic unconstrained English speaking proficiency scoring

verfasst von: David O. Johnson, Okim Kang

Erschienen in: Artificial Intelligence Review | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Four algorithms for syllabifying phones are compared in automatically scoring English oral proficiency. The first algorithm clusters consonants into groups with the vowel nearer to them temporally, taking into account the maximal onset principle. A Hidden Markov Model (HMM) predicts the syllable boundaries based on their sonority value in the second algorithm. The third one employs three HMMs which are tuned to specific categories of utterances. The final algorithm uses a genetic algorithm to identify a set of rules for syllabifying the phones. They were evaluated by: (1) how well they syllabified utterances from the Boston University Radio News Corpus (BURNC) and (2) how well they worked as part of a process to automatically score English speaking proficiency. A measure of the temporal alignment of the syllables was utilized to judge how satisfactorily they syllabified utterances. Their suitability in the proficiency process was assessed with the Pearson correlation between the computer’s predicted proficiency scores and the scores determined by human examiners. We found that syllabification-by-genetic-algorithm performed the best in syllabifying the BURNC, but that syllabification-by-grouping (i.e., syllables are made by grouping non-syllabic consonant phones with the vowel or syllabic consonant phone nearest to them with respect to time) performed the best in the English oral proficiency rating application.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ananthakrishnan S (2004) Statistical syllabification of english phoneme sequences using supervised unsupervised algorithms. CS562 term project report Ananthakrishnan S (2004) Statistical syllabification of english phoneme sequences using supervised unsupervised algorithms. CS562 term project report
Zurück zum Zitat Baayen RH, Piepenbrock R, Van Rijn H (1993) The CELEX lexical database. Linguistic data consortium, University of Pennsylvania, Philadelphia Baayen RH, Piepenbrock R, Van Rijn H (1993) The CELEX lexical database. Linguistic data consortium, University of Pennsylvania, Philadelphia
Zurück zum Zitat Bartlett S, Kondrak G, Cherry C (2009) On the syllabification of phonemes. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, pp 308–316 Bartlett S, Kondrak G, Cherry C (2009) On the syllabification of phonemes. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, pp 308–316
Zurück zum Zitat Bernstein J, Van Moere A, Cheng J (2010) Validating automated speaking tests. Lang Test 27:355–377CrossRef Bernstein J, Van Moere A, Cheng J (2010) Validating automated speaking tests. Lang Test 27:355–377CrossRef
Zurück zum Zitat Boersma P, Weenink D (2014) Praat: doing phonetics by computer (version 5.3.83) [Computer program]. Retrieved 19 Aug 2014 Boersma P, Weenink D (2014) Praat: doing phonetics by computer (version 5.3.83) [Computer program]. Retrieved 19 Aug 2014
Zurück zum Zitat Brazil D (1997) The communicative value of intonation in english book. Cambridge University Press, Cambridge Brazil D (1997) The communicative value of intonation in english book. Cambridge University Press, Cambridge
Zurück zum Zitat Clements GN (1990) The role of the sonority cycle in core syllabification. Pap Lab Phonol 1:283–333CrossRef Clements GN (1990) The role of the sonority cycle in core syllabification. Pap Lab Phonol 1:283–333CrossRef
Zurück zum Zitat Daelemans W, van den Bosch A (1992) Generalization performance of backpropagation learning on a syllabification task. In: Proceedings of the 3rd twente workshop on language technology, pp 27–38 Daelemans W, van den Bosch A (1992) Generalization performance of backpropagation learning on a syllabification task. In: Proceedings of the 3rd twente workshop on language technology, pp 27–38
Zurück zum Zitat Daelemans W, van den Bosch A, Weijters T (1997) IGTree: using trees for compression classification in lazy learning algorithms. Artif Intell Rev 11(1–5):407–423CrossRef Daelemans W, van den Bosch A, Weijters T (1997) IGTree: using trees for compression classification in lazy learning algorithms. Artif Intell Rev 11(1–5):407–423CrossRef
Zurück zum Zitat Dehak N, Dumouchel P, Kenny P (2007) Modeling prosodic features with joint factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 15(7):2095–2103CrossRef Dehak N, Dumouchel P, Kenny P (2007) Modeling prosodic features with joint factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 15(7):2095–2103CrossRef
Zurück zum Zitat Demberg V (2006) Letter-to-phoneme conversion for a German text-to-speech system. Master’s thesis, University of Stuttgart Demberg V (2006) Letter-to-phoneme conversion for a German text-to-speech system. Master’s thesis, University of Stuttgart
Zurück zum Zitat Evanini K, Wang X (2013) Automated speech scoring for non-native middle school students with multiple task types. In: INTERSPEECH, pp 2435–2439 Evanini K, Wang X (2013) Automated speech scoring for non-native middle school students with multiple task types. In: INTERSPEECH, pp 2435–2439
Zurück zum Zitat Fine S, Singer Y, Tishby N (1998) The hierarchical hidden Markov model: analysis and applications. Mach Learn 32(1):41–62CrossRefMATH Fine S, Singer Y, Tishby N (1998) The hierarchical hidden Markov model: analysis and applications. Mach Learn 32(1):41–62CrossRefMATH
Zurück zum Zitat Fisher W (1996) The tsylb2 program: algorithm description. NIST, 1996b, Part of the tsylb2-11 software package Fisher W (1996) The tsylb2 program: algorithm description. NIST, 1996b, Part of the tsylb2-11 software package
Zurück zum Zitat Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS (1993) DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM NIST speech disc 1–11. NASA STI/Recon Technical Report N, vol 93, p 27403 Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS (1993) DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM NIST speech disc 1–11. NASA STI/Recon Technical Report N, vol 93, p 27403
Zurück zum Zitat Goldwater S, Johnson M (2005) Representational bias in unsupervised learning of syllable structure. In: Proceedings of the ninth conference on computational natural language learning, pp 112–119 Goldwater S, Johnson M (2005) Representational bias in unsupervised learning of syllable structure. In: Proceedings of the ninth conference on computational natural language learning, pp 112–119
Zurück zum Zitat Hooper JB (1972) The syllable in phonological theory. Language 48:525–540CrossRef Hooper JB (1972) The syllable in phonological theory. Language 48:525–540CrossRef
Zurück zum Zitat Johnson DO, Kang O (2015) Automatic prominent syllable detection with machine learning classifiers. Int J Speech Technol 18(4):583–592CrossRef Johnson DO, Kang O (2015) Automatic prominent syllable detection with machine learning classifiers. Int J Speech Technol 18(4):583–592CrossRef
Zurück zum Zitat Johnson DO, Kang O (2016) Automatic detection of Brazil’s prosodic tone unit. In: Proceedings of speech prosody 8, Boston, MA, USA, May 31–June 3, 2016 Johnson DO, Kang O (2016) Automatic detection of Brazil’s prosodic tone unit. In: Proceedings of speech prosody 8, Boston, MA, USA, May 31–June 3, 2016
Zurück zum Zitat Johnson DO, Kang O, Ghanem R (2016b) Language proficiency ratings: human versus machine. In: Proceedings of the 7th pronunciation in second language learning and teaching conference, pp 119–129 Johnson DO, Kang O, Ghanem R (2016b) Language proficiency ratings: human versus machine. In: Proceedings of the 7th pronunciation in second language learning and teaching conference, pp 119–129
Zurück zum Zitat Kahn D (1976) Syllable-based generalizations in English phonology, vol 156. Indiana University Linguistics Club, Bloomington Kahn D (1976) Syllable-based generalizations in English phonology, vol 156. Indiana University Linguistics Club, Bloomington
Zurück zum Zitat Kibre N, Pearson S, Kuhn R, Fincke S (2000) Automatic methods for lexical stress assignment syllabification. In: The proceedings of the 6th international conference on spoken language processing, vol 2 Kibre N, Pearson S, Kuhn R, Fincke S (2000) Automatic methods for lexical stress assignment syllabification. In: The proceedings of the 6th international conference on spoken language processing, vol 2
Zurück zum Zitat Kiraz GA, Möbius B (1998) Multilingual syllabification using weighted finite-state transducers. In: The third ESCA/COCOSDA workshop (ETRW) on speech synthesis Kiraz GA, Möbius B (1998) Multilingual syllabification using weighted finite-state transducers. In: The third ESCA/COCOSDA workshop (ETRW) on speech synthesis
Zurück zum Zitat Kockmann M, Burget L (2008) Contour modeling of prosodic acoustic features for speaker recognition. In: 2008 IEEE spoken language technology workshop—SLT, pp 45–48 Kockmann M, Burget L (2008) Contour modeling of prosodic acoustic features for speaker recognition. In: 2008 IEEE spoken language technology workshop—SLT, pp 45–48
Zurück zum Zitat Kockmann M, Burget L, Černocky JH (2010) Investigations into prosodic syllable contour features for speaker recognition. In: 2010 IEEE international conference on acoustics speech signal processing (ICASSP), pp 4418–4421 Kockmann M, Burget L, Černocky JH (2010) Investigations into prosodic syllable contour features for speaker recognition. In: 2010 IEEE international conference on acoustics speech signal processing (ICASSP), pp 4418–4421
Zurück zum Zitat Krenn B (1997) Tagging syllables. In: Fifth European conference on speech communication and technology (EUROSPEECH’97) Krenn B (1997) Tagging syllables. In: Fifth European conference on speech communication and technology (EUROSPEECH’97)
Zurück zum Zitat Lin CY, Wang HC (2005) Language identification using pitch contour information. In: ICASSP, vol 1, pp 601–604 Lin CY, Wang HC (2005) Language identification using pitch contour information. In: ICASSP, vol 1, pp 601–604
Zurück zum Zitat Longman P (2009) Official guide to Pearson test of English academic (with CD-ROM). Pearson Education, India Longman P (2009) Official guide to Pearson test of English academic (with CD-ROM). Pearson Education, India
Zurück zum Zitat Marchand Y, Adsett CR, Damper RI (2007) Evaluating automatic syllabification algorithms for English. 316–321 Marchand Y, Adsett CR, Damper RI (2007) Evaluating automatic syllabification algorithms for English. 316–321
Zurück zum Zitat Marchand Y, Adsett CR, Damper RI (2009) Automatic syllabification in English: a comparison of different algorithms. Lang Speech 52(1):1–27CrossRef Marchand Y, Adsett CR, Damper RI (2009) Automatic syllabification in English: a comparison of different algorithms. Lang Speech 52(1):1–27CrossRef
Zurück zum Zitat Mary L, Yegnanarayana B (2008) Extraction representation of prosodic features for language speaker recognition. Speech Commun 50(10):782–796CrossRef Mary L, Yegnanarayana B (2008) Extraction representation of prosodic features for language speaker recognition. Speech Commun 50(10):782–796CrossRef
Zurück zum Zitat MathWorks, Inc (2013) MATLAB release 2013a. [Computer program] MathWorks, Inc (2013) MATLAB release 2013a. [Computer program]
Zurück zum Zitat Mayer T (2010) Toward a totally unsupervised, language-independent method for the syllabification of written texts. In: Proceedings of the 11th meeting of the ACL special interest group on computational morphology phonology, pp 63–71 Mayer T (2010) Toward a totally unsupervised, language-independent method for the syllabification of written texts. In: Proceedings of the 11th meeting of the ACL special interest group on computational morphology phonology, pp 63–71
Zurück zum Zitat Müller K (2001) Automatic detection of syllable boundaries combining the advantages of treebank bracketed corpora training. In: Proceedings of the 39th annual meeting on association for computational linguistics, pp 410–417 Müller K (2001) Automatic detection of syllable boundaries combining the advantages of treebank bracketed corpora training. In: Proceedings of the 39th annual meeting on association for computational linguistics, pp 410–417
Zurück zum Zitat Müller K (2006) Improving syllabification models with phonotactic knowledge. In: Proceedings of the eighth meeting of the ACL special interest group on computational phonology morphology, pp 11–20 Müller K (2006) Improving syllabification models with phonotactic knowledge. In: Proceedings of the eighth meeting of the ACL special interest group on computational phonology morphology, pp 11–20
Zurück zum Zitat Oller DK, Niyogi P, Gray S, Richards JA, Gilkerson J, Xu D, Yapaneld U, Warren SF (2010) Automated vocal analysis of naturalistic recordings from children with autism, language delay, typical development. Proc Natl Acad Sci 107(30):13354–13359CrossRef Oller DK, Niyogi P, Gray S, Richards JA, Gilkerson J, Xu D, Yapaneld U, Warren SF (2010) Automated vocal analysis of naturalistic recordings from children with autism, language delay, typical development. Proc Natl Acad Sci 107(30):13354–13359CrossRef
Zurück zum Zitat Ostendorf M, Price PJ, Shattuck-Hufnagel S (1995) The Boston University radio news corpus. Linguist Data Consort 323:1–19 Ostendorf M, Price PJ, Shattuck-Hufnagel S (1995) The Boston University radio news corpus. Linguist Data Consort 323:1–19
Zurück zum Zitat Ouellet P, Dumouchel P (2001) Heuristic syllabification and statistical syllable-based modeling for speech-input topic identification. In: Workshop on grammar and NLP, pp 13–14 Ouellet P, Dumouchel P (2001) Heuristic syllabification and statistical syllable-based modeling for speech-input topic identification. In: Workshop on grammar and NLP, pp 13–14
Zurück zum Zitat Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlíček P, Qian Y, Schwarz P, Silovsky J, Stemmer G, Vesel K (2011) The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding (No. EPFL-CONF-192584). IEEE Signal Processing Society Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlíček P, Qian Y, Schwarz P, Silovsky J, Stemmer G, Vesel K (2011) The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding (No. EPFL-CONF-192584). IEEE Signal Processing Society
Zurück zum Zitat Pulgram E (1970) Syllable, word, nexus, cursus. No. 81–85. Mouton, The Hague Pulgram E (1970) Syllable, word, nexus, cursus. No. 81–85. Mouton, The Hague
Zurück zum Zitat Rogova K, Demuynck K, Van Compernolle D (2013) Automatic syllabification using segmental conditional random fields. In: Book of abstracts of the 23rd meeting of computational linguistics in the Netherlands: CLIN, p 41 Rogova K, Demuynck K, Van Compernolle D (2013) Automatic syllabification using segmental conditional random fields. In: Book of abstracts of the 23rd meeting of computational linguistics in the Netherlands: CLIN, p 41
Zurück zum Zitat Schmid H, Möbius B, Weidenkaff J (2007) Tagging syllable boundaries with joint n-gram models. In: INTERSPEECH, pp 2857–2860 Schmid H, Möbius B, Weidenkaff J (2007) Tagging syllable boundaries with joint n-gram models. In: INTERSPEECH, pp 2857–2860
Zurück zum Zitat Selkirk EO (1984) On the major class features and syllable theory. 107–136 Selkirk EO (1984) On the major class features and syllable theory. 107–136
Zurück zum Zitat Shriberg E, Ferrer L, Kajarekar S, Venkataraman A, Stolcke A (2005) Modeling prosodic feature sequences for speaker recognition. Speech Commun 46(3):455–472CrossRef Shriberg E, Ferrer L, Kajarekar S, Venkataraman A, Stolcke A (2005) Modeling prosodic feature sequences for speaker recognition. Speech Commun 46(3):455–472CrossRef
Zurück zum Zitat Taherdangkoo M, Paziresh M, Yazdi M, Bagheri M (2013) An efficient algorithm for function optimization: modified stem cells algorithm. Open Eng 3(1):36–50CrossRef Taherdangkoo M, Paziresh M, Yazdi M, Bagheri M (2013) An efficient algorithm for function optimization: modified stem cells algorithm. Open Eng 3(1):36–50CrossRef
Zurück zum Zitat Vennemann T (1987) Preference laws for syllable structure: and the explanation of sound change with special reference to German, Germanic, Italian, and Latin. Walter de Gruyter, BerlinCrossRef Vennemann T (1987) Preference laws for syllable structure: and the explanation of sound change with special reference to German, Germanic, Italian, and Latin. Walter de Gruyter, BerlinCrossRef
Zurück zum Zitat Zechner K, Higgins D, Xi X, Williamson DM (2009) Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Commun 51(10):883–895CrossRef Zechner K, Higgins D, Xi X, Williamson DM (2009) Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Commun 51(10):883–895CrossRef
Zurück zum Zitat Zhang J, Hamilton HJ (1997) Learning English syllabification for words. Foundations of intelligent systems. Springer, Berlin, pp 177–186CrossRef Zhang J, Hamilton HJ (1997) Learning English syllabification for words. Foundations of intelligent systems. Springer, Berlin, pp 177–186CrossRef
Zurück zum Zitat Ziaei A, Kaushik L, Sangwan A, Hansen JH, Oard D (2014) Speech activity detection for nasa apollo space missions: challenges solutions. In: INTERSPEECH Ziaei A, Kaushik L, Sangwan A, Hansen JH, Oard D (2014) Speech activity detection for nasa apollo space missions: challenges solutions. In: INTERSPEECH
Metadaten
Titel
Comparison of algorithms to divide noisy phone sequences into syllables for automatic unconstrained English speaking proficiency scoring
verfasst von
David O. Johnson
Okim Kang
Publikationsdatum
22.11.2017
Verlag
Springer Netherlands
Erschienen in
Artificial Intelligence Review / Ausgabe 3/2019
Print ISSN: 0269-2821
Elektronische ISSN: 1573-7462
DOI
https://doi.org/10.1007/s10462-017-9594-y

Weitere Artikel der Ausgabe 3/2019

Artificial Intelligence Review 3/2019 Zur Ausgabe