Skip to main content
Erschienen in: International Journal of Speech Technology 4/2015

10.09.2015

Automatic prominent syllable detection with machine learning classifiers

verfasst von: David O. Johnson, Okim Kang

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we examine the performance of automatically detecting Brazil’s prominent syllables using five machine learning classifiers and seven sets of features consisting of three features: pitch, intensity, and duration, taken one at time, two at a time, and all three. Prominent syllables are the foundation of Brazil’s prosodic intonation model. We found that using pitch, intensity, and duration as features produces the best optimal results. Our findings also revealed that in terms of accuracy, F-measure, and Cohen’s kappa coefficient that bagging an ensemble of decision tree learners performed the best (accuracy = 95.9 ± 0.2 %; F-measure = 93.7 ± 0.4; κ = 0.907 ± 0.005). The performance of our current model proves to be significantly better than any other automatic detection software that exists or that of human transcription experts of prosody.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ananthakrishnan, S., & Narayanan, S. S. (2008). Automatic prosodic event detection using acoustic, lexical, and syntactic evidence. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 216–228.CrossRef Ananthakrishnan, S., & Narayanan, S. S. (2008). Automatic prosodic event detection using acoustic, lexical, and syntactic evidence. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 216–228.CrossRef
Zurück zum Zitat Avanzi, M., Lacheret-Dujour, A., & Victorri, B. (2010). A corpus-based learning method for prominence detection in spontaneous speech. In Proceedings of prosodic prominence, speech prosody 2010 satellite workshop, Chicago, 10 May. Avanzi, M., Lacheret-Dujour, A., & Victorri, B. (2010). A corpus-based learning method for prominence detection in spontaneous speech. In Proceedings of prosodic prominence, speech prosody 2010 satellite workshop, Chicago, 10 May.
Zurück zum Zitat Bocklet, T., & Shriberg, E. (2009, April). Speaker recognition using syllable-based constraints for cepstral frame selection. In IEEE international conference on acoustics, speech and signal processing, 2009 (ICASSP 2009) (pp. 4525–4528). IEEE. Bocklet, T., & Shriberg, E. (2009, April). Speaker recognition using syllable-based constraints for cepstral frame selection. In IEEE international conference on acoustics, speech and signal processing, 2009 (ICASSP 2009) (pp. 4525–4528). IEEE.
Zurück zum Zitat Boersma, P., & Weenink, D. (2014). Praat: Doing phonetics by computer (version 5.3.83). [Computer program]. Retrieved August 19, 2014. Boersma, P., & Weenink, D. (2014). Praat: Doing phonetics by computer (version 5.3.83). [Computer program]. Retrieved August 19, 2014.
Zurück zum Zitat Brazil, D. (1997). The communicative value of intonation in English. Cambridge: Cambridge University Press. Brazil, D. (1997). The communicative value of intonation in English. Cambridge: Cambridge University Press.
Zurück zum Zitat Breen, M., Dilley, L. C., Kraemer, J., & Gibson, E. (2012). Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch). Breen, M., Dilley, L. C., Kraemer, J., & Gibson, E. (2012). Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch).
Zurück zum Zitat Breiman, L. (1994). Bagging predictors. Technical Report 421. Department of Statistics, University of California at Berkeley. Breiman, L. (1994). Bagging predictors. Technical Report 421. Department of Statistics, University of California at Berkeley.
Zurück zum Zitat Breiman, L. (1996). Bias, variance, and arcing classifiers. Technical Report 460. Department of Statistics, University of California at Berkeley. Breiman, L. (1996). Bias, variance, and arcing classifiers. Technical Report 460. Department of Statistics, University of California at Berkeley.
Zurück zum Zitat Cauldwell, R. (2012). RIAS VAN DEN DOEL, How friendly are the natives? An evaluation of native-speaker judgements of foreign-accented British and American English. Utrecht: Netherlands Graduate School of Linguistics (LOT), 2006. pp. xii + 341. ISBN-10: 90-78328-09-6, ISBN-13: 978-90-78328-09-4. Journal of the International Phonetic Association, 42(02), 213–215. Cauldwell, R. (2012). RIAS VAN DEN DOEL, How friendly are the natives? An evaluation of native-speaker judgements of foreign-accented British and American English. Utrecht: Netherlands Graduate School of Linguistics (LOT), 2006. pp. xii + 341. ISBN-10: 90-78328-09-6, ISBN-13: 978-90-78328-09-4. Journal of the International Phonetic Association, 42(02), 213–215.
Zurück zum Zitat Christodoulides, G., & Avanzi, M. (2014). An evaluation of machine learning methods for prominence detection in French. In Fifteenth annual conference of the International Speech Communication Association. Christodoulides, G., & Avanzi, M. (2014). An evaluation of machine learning methods for prominence detection in French. In Fifteenth annual conference of the International Speech Communication Association.
Zurück zum Zitat Chun, D. M. (2002). Discourse intonation in L2: From theory and research to practice. Amsterdam: John Benjamins.CrossRef Chun, D. M. (2002). Discourse intonation in L2: From theory and research to practice. Amsterdam: John Benjamins.CrossRef
Zurück zum Zitat Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.CrossRef Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.CrossRef
Zurück zum Zitat Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297.MATH Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297.MATH
Zurück zum Zitat Cutugno, F., Leone, E., Ludusan, B., & Origlia, A. (2012). Investigating syllabic prominence with conditional random fields and latent-dynamic conditional random fields. In INTERSPEECH. Cutugno, F., Leone, E., Ludusan, B., & Origlia, A. (2012). Investigating syllabic prominence with conditional random fields and latent-dynamic conditional random fields. In INTERSPEECH.
Zurück zum Zitat Dilley, L. C. (2005). The phonetics and phonology of tonal systems. Doctoral dissertation, Massachusetts Institute of Technology. Dilley, L. C. (2005). The phonetics and phonology of tonal systems. Doctoral dissertation, Massachusetts Institute of Technology.
Zurück zum Zitat Dilley, L. C., & Brown, M. (2005). The RaP (Rhythm and Pitch) labeling system. Unpublished manuscript. Dilley, L. C., & Brown, M. (2005). The RaP (Rhythm and Pitch) labeling system. Unpublished manuscript.
Zurück zum Zitat Escudero-Mancebo, D., González-Ferreras, C., Vivaracho-Pascual, C., & Cardeñoso-Payo, V. (2014). A fuzzy classifier to deal with similarity between labels on automatic prosodic labeling. Computer Speech & Language, 28(1), 326–341.CrossRef Escudero-Mancebo, D., González-Ferreras, C., Vivaracho-Pascual, C., & Cardeñoso-Payo, V. (2014). A fuzzy classifier to deal with similarity between labels on automatic prosodic labeling. Computer Speech & Language, 28(1), 326–341.CrossRef
Zurück zum Zitat Fine, J., Bartolucci, G., Ginsberg, G., & Szatmari, P. (1991). The use of intonation to communicate in pervasive developmental disorders. Journal of Child Psychology and Psychiatry, 32(5), 771–782.CrossRef Fine, J., Bartolucci, G., Ginsberg, G., & Szatmari, P. (1991). The use of intonation to communicate in pervasive developmental disorders. Journal of Child Psychology and Psychiatry, 32(5), 771–782.CrossRef
Zurück zum Zitat Frith, U., & Happé, F. (1994). Language and communication in autistic disorders. Philosophical Transactions of the Royal Society B: Biological Sciences, 346(1315), 97–104.CrossRef Frith, U., & Happé, F. (1994). Language and communication in autistic disorders. Philosophical Transactions of the Royal Society B: Biological Sciences, 346(1315), 97–104.CrossRef
Zurück zum Zitat Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., & Pallett, D. S. (1993). DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report N, 93, 27403. Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., & Pallett, D. S. (1993). DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report N, 93, 27403.
Zurück zum Zitat González-Ferreras, C., Escudero-Mancebo, D., Vivaracho-Pascual, C., & Cardeñoso-Payo, V. (2012). Improving automatic classification of prosodic events by pairwise coupling. IEEE Transactions on Audio, Speech, and Language Processing, 20(7), 2045–2058.CrossRef González-Ferreras, C., Escudero-Mancebo, D., Vivaracho-Pascual, C., & Cardeñoso-Payo, V. (2012). Improving automatic classification of prosodic events by pairwise coupling. IEEE Transactions on Audio, Speech, and Language Processing, 20(7), 2045–2058.CrossRef
Zurück zum Zitat Hämäläinen, A., Boves, L., de Veth, J., & Bosch, L. T. (2007). On the utility of syllable-based acoustic models for pronunciation variation modelling. EURASIP Journal on Audio, Speech, and Music Processing, 2007(2), 3. Hämäläinen, A., Boves, L., de Veth, J., & Bosch, L. T. (2007). On the utility of syllable-based acoustic models for pronunciation variation modelling. EURASIP Journal on Audio, Speech, and Music Processing, 2007(2), 3.
Zurück zum Zitat Happel, B. L., & Murre, J. M. (1994). Design and evolution of modular neural network architectures. Neural Networks, 7(6), 985–1004.CrossRef Happel, B. L., & Murre, J. M. (1994). Design and evolution of modular neural network architectures. Neural Networks, 7(6), 985–1004.CrossRef
Zurück zum Zitat Jeon, J. H., & Liu, Y. (2009). Automatic prosodic events detection using syllable-based acoustic and syntactic features. In IEEE international conference on acoustics, speech and signal processing, 2009 (ICASSP 2009) (pp. 4565–4568). IEEE. Jeon, J. H., & Liu, Y. (2009). Automatic prosodic events detection using syllable-based acoustic and syntactic features. In IEEE international conference on acoustics, speech and signal processing, 2009 (ICASSP 2009) (pp. 4565–4568). IEEE.
Zurück zum Zitat Juslin, P. N., & Laukka, P. (2004). Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening. Journal of New Music Research, 33(3), 217–238.CrossRef Juslin, P. N., & Laukka, P. (2004). Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening. Journal of New Music Research, 33(3), 217–238.CrossRef
Zurück zum Zitat Kang, O. (2010). Relative salience of suprasegmental features on judgments of L2 comprehensibility and accentedness. System, 38(2), 301–315.CrossRef Kang, O. (2010). Relative salience of suprasegmental features on judgments of L2 comprehensibility and accentedness. System, 38(2), 301–315.CrossRef
Zurück zum Zitat Kang, O., & Pickering, L. (2013). Using acoustic and temporal analysis for assessing speaking. In A. Kunnan (Ed.), Companion to language assessment (pp. 1047–1062). Hoboken: Wiley-Blackwell.CrossRef Kang, O., & Pickering, L. (2013). Using acoustic and temporal analysis for assessing speaking. In A. Kunnan (Ed.), Companion to language assessment (pp. 1047–1062). Hoboken: Wiley-Blackwell.CrossRef
Zurück zum Zitat Kang, O., Rubin, D., & Pickering, L. (2010). Suprasegmental measures of accentedness and judgments of language learner proficiency in oral English. The Modern Language Journal, 94(4), 554–566.CrossRef Kang, O., Rubin, D., & Pickering, L. (2010). Suprasegmental measures of accentedness and judgments of language learner proficiency in oral English. The Modern Language Journal, 94(4), 554–566.CrossRef
Zurück zum Zitat Kang, O., & Wang, L. (2014). Impact of different task types on candidates’ speaking performances and interactive features that distinguish between CEFR levels. ISSN 1756-509X, 40. Kang, O., & Wang, L. (2014). Impact of different task types on candidates’ speaking performances and interactive features that distinguish between CEFR levels. ISSN 1756-509X, 40.
Zurück zum Zitat KayPENTAX. (2008). Multi-speech and CSL software. Lincoln Park, NJ: KayPENTAX. KayPENTAX. (2008). Multi-speech and CSL software. Lincoln Park, NJ: KayPENTAX.
Zurück zum Zitat Kochanski, G., Grabe, E., Coleman, J., & Rosner, B. (2005). Loudness predicts prominence: Fundamental frequency lends little. The Journal of the Acoustical Society of America, 118(2), 1038–1054.CrossRef Kochanski, G., Grabe, E., Coleman, J., & Rosner, B. (2005). Loudness predicts prominence: Fundamental frequency lends little. The Journal of the Acoustical Society of America, 118(2), 1038–1054.CrossRef
Zurück zum Zitat Litman, D. J., Hirschberg, J. B., & Swerts, M. (2000). Predicting automatic speech recognition performance using prosodic cues. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference (pp. 218–225). Association for Computational Linguistics. Litman, D. J., Hirschberg, J. B., & Swerts, M. (2000). Predicting automatic speech recognition performance using prosodic cues. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference (pp. 218–225). Association for Computational Linguistics.
Zurück zum Zitat Ludusan, B., & Dupoux, E. (2014). Towards low-resource prosodic boundary detection. Ludusan, B., & Dupoux, E. (2014). Towards low-resource prosodic boundary detection.
Zurück zum Zitat Ludusan, B., Origlia, A., & Cutugno, F. (2011). On the use of the rhythmogram for automatic syllabic prominence detection (pp. 2424–2427). In INTERSPEECH. Ludusan, B., Origlia, A., & Cutugno, F. (2011). On the use of the rhythmogram for automatic syllabic prominence detection (pp. 2424–2427). In INTERSPEECH.
Zurück zum Zitat Mahrt, T., Cole, J., Fleck, M. M., & Hasegawa-Johnson, M. (2012a). F0 and the perception of prominence. In INTERSPEECH. Mahrt, T., Cole, J., Fleck, M. M., & Hasegawa-Johnson, M. (2012a). F0 and the perception of prominence. In INTERSPEECH.
Zurück zum Zitat Mahrt, T., Cole, J., Fleck, M., & Hasegawa-Johnson, M. (2012b). Modeling speaker variation in cues to prominence using the Bayesian information criterion. In Speech prosody 2012. Mahrt, T., Cole, J., Fleck, M., & Hasegawa-Johnson, M. (2012b). Modeling speaker variation in cues to prominence using the Bayesian information criterion. In Speech prosody 2012.
Zurück zum Zitat Mahrt, T., Huang, J. T., Mo, Y., Fleck, M. M., Hasegawa-Johnson, M., & Cole, J. (2011). Optimal models of prosodic prominence using the Bayesian information criterion (pp. 2037–2040). In INTERSPEECH. Mahrt, T., Huang, J. T., Mo, Y., Fleck, M. M., Hasegawa-Johnson, M., & Cole, J. (2011). Optimal models of prosodic prominence using the Bayesian information criterion (pp. 2037–2040). In INTERSPEECH.
Zurück zum Zitat MathWorks, Inc. (2013). MATLAB release 2013a. [Computer program]. Retrieved February 15, 2013. MathWorks, Inc. (2013). MATLAB release 2013a. [Computer program]. Retrieved February 15, 2013.
Zurück zum Zitat McCann, J., & Peppé, S. (2003). Prosody in autism spectrum disorders: A critical review. International Journal of Language & Communication Disorders, 38(4), 325–350.CrossRef McCann, J., & Peppé, S. (2003). Prosody in autism spectrum disorders: A critical review. International Journal of Language & Communication Disorders, 38(4), 325–350.CrossRef
Zurück zum Zitat Nadel, J., Simon, M., Canet, P., Soussignan, R., Blancard, P., Canamero, L., & Gaussier, P. (2006). Human responses to an expressive robot. In Proceedings of the sixth international workshop on epigenetic robotics. Lund University. Nadel, J., Simon, M., Canet, P., Soussignan, R., Blancard, P., Canamero, L., & Gaussier, P. (2006). Human responses to an expressive robot. In Proceedings of the sixth international workshop on epigenetic robotics. Lund University.
Zurück zum Zitat Ni, C. J., Liu, W., & Xu, B. (2011). Automatic prosodic events detection by using syllable-based acoustic, lexical and syntactic features. In INTERSPEECH (pp. 2017–2020). Ni, C. J., Liu, W., & Xu, B. (2011). Automatic prosodic events detection by using syllable-based acoustic, lexical and syntactic features. In INTERSPEECH (pp. 2017–2020).
Zurück zum Zitat Ni, C., Liu, W., & Xu, B. (2012). From English pitch accent detection to Mandarin stress detection, where is the difference? Computer Speech & Language, 26(3), 127–148.CrossRef Ni, C., Liu, W., & Xu, B. (2012). From English pitch accent detection to Mandarin stress detection, where is the difference? Computer Speech & Language, 26(3), 127–148.CrossRef
Zurück zum Zitat Obin, N., Rodet, X., & Lacheret-Dujour, A. (2009). A syllable-based prominence detection model based on discriminant analysis and context-dependency. In SPECOM (pp. 97–100). Obin, N., Rodet, X., & Lacheret-Dujour, A. (2009). A syllable-based prominence detection model based on discriminant analysis and context-dependency. In SPECOM (pp. 97–100).
Zurück zum Zitat Opitz, D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169–198.MATH Opitz, D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169–198.MATH
Zurück zum Zitat Ostendorf, M. (1999, December). Moving beyond the ‘beads-on-a-string’ model of speech. In Proceedings of IEEE ASRU workshop (pp. 79–84). Piscataway, NJ: IEEE. Ostendorf, M. (1999, December). Moving beyond the ‘beads-on-a-string’ model of speech. In Proceedings of IEEE ASRU workshop (pp. 79–84). Piscataway, NJ: IEEE.
Zurück zum Zitat Ostendorf, M., Price, P. J., & Shattuck-Hufnagel, S. (1995). The Boston University radio news corpus. Linguistic Data Consortium, 1–19. Ostendorf, M., Price, P. J., & Shattuck-Hufnagel, S. (1995). The Boston University radio news corpus. Linguistic Data Consortium, 1–19.
Zurück zum Zitat Paul, R., Augustyn, A., Klin, A., & Volkmar, F. R. (2005). Perception and production of prosody by speakers with autism spectrum disorders. Journal of Autism and Developmental Disorders, 35(2), 205–220.CrossRef Paul, R., Augustyn, A., Klin, A., & Volkmar, F. R. (2005). Perception and production of prosody by speakers with autism spectrum disorders. Journal of Autism and Developmental Disorders, 35(2), 205–220.CrossRef
Zurück zum Zitat Pickering, L. (1999). An analysis of prosodic systems in the classroom discourse of native speaker and nonnative speaker teaching assistants. Unpublished doctoral dissertation, University of Florida, Gainesville. Pickering, L. (1999). An analysis of prosodic systems in the classroom discourse of native speaker and nonnative speaker teaching assistants. Unpublished doctoral dissertation, University of Florida, Gainesville.
Zurück zum Zitat Pickering, L. (2009). Intonation as a pragmatic resource in ELF interaction. Intercultural Pragmatics, 6(2), 235–255.CrossRef Pickering, L. (2009). Intonation as a pragmatic resource in ELF interaction. Intercultural Pragmatics, 6(2), 235–255.CrossRef
Zurück zum Zitat Pierrehumbert, J. B. (1980). The phonology and phonetics of English intonation. Doctoral dissertation, Massachusetts Institute of Technology. Pierrehumbert, J. B. (1980). The phonology and phonetics of English intonation. Doctoral dissertation, Massachusetts Institute of Technology.
Zurück zum Zitat Pierrehumbert, J., & Beckman, M. (1988). Japanese tone structure. Linguistic Inquiry Monographs, 15, 1–282. Pierrehumbert, J., & Beckman, M. (1988). Japanese tone structure. Linguistic Inquiry Monographs, 15, 1–282.
Zurück zum Zitat Price, P., Ostendorf, M., Shattuck-Hufnagel, S., & Veilleux, N. (1988). A methodology for analyzing prosody. The Journal of the Acoustical Society of America, 84(S1), S99.CrossRef Price, P., Ostendorf, M., Shattuck-Hufnagel, S., & Veilleux, N. (1988). A methodology for analyzing prosody. The Journal of the Acoustical Society of America, 84(S1), S99.CrossRef
Zurück zum Zitat Quinlan, J. R. (1999). Simplifying decision trees. International Journal of Human-Computer Studies, 51(2), 497–510.CrossRef Quinlan, J. R. (1999). Simplifying decision trees. International Journal of Human-Computer Studies, 51(2), 497–510.CrossRef
Zurück zum Zitat Rosenberg, A., & Hirschberg, J. (2006). On the correlation between energy and pitch accent in read English speech. In INTERSPEECH. Rosenberg, A., & Hirschberg, J. (2006). On the correlation between energy and pitch accent in read English speech. In INTERSPEECH.
Zurück zum Zitat Rosenberg, A., & Hirschberg, J. (2009). Detecting pitch accents at the word, syllable and vowel level. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the Association for Computational Linguistics, Companion Volume: Short Papers (pp. 81–84). Association for Computational Linguistics. Rosenberg, A., & Hirschberg, J. (2009). Detecting pitch accents at the word, syllable and vowel level. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the Association for Computational Linguistics, Companion Volume: Short Papers (pp. 81–84). Association for Computational Linguistics.
Zurück zum Zitat Rosenberg, A., & Hirschberg, J. B. (2010). Production of English prominence by native mandarin Chinese speakers. Rosenberg, A., & Hirschberg, J. B. (2010). Production of English prominence by native mandarin Chinese speakers.
Zurück zum Zitat Shriberg, E., Ferrer, L., Kajarekar, S., Venkataraman, A., & Stolcke, A. (2005). Modeling prosodic feature sequences for speaker recognition. Speech Communication, 46(3), 455–472.CrossRef Shriberg, E., Ferrer, L., Kajarekar, S., Venkataraman, A., & Stolcke, A. (2005). Modeling prosodic feature sequences for speaker recognition. Speech Communication, 46(3), 455–472.CrossRef
Zurück zum Zitat Shriberg, L. D., Paul, R., McSweeny, J. L., Klin, A., Cohen, D. J., & Volkmar, F. R. (2001). Speech and prosody characteristics of adolescents and adults with high-functioning autism and Asperger syndrome. Journal of Speech, Language, and Hearing Research, 44(5), 1097–1115.CrossRef Shriberg, L. D., Paul, R., McSweeny, J. L., Klin, A., Cohen, D. J., & Volkmar, F. R. (2001). Speech and prosody characteristics of adolescents and adults with high-functioning autism and Asperger syndrome. Journal of Speech, Language, and Hearing Research, 44(5), 1097–1115.CrossRef
Zurück zum Zitat Silipo, R., & Greenberg, S. (1999). Automatic transcription of prosodic stress for spontaneous English discourse. In Proceedings of the XIVth international congress of phonetic sciences (ICPhS) (Vol. 3, p. 2351). Silipo, R., & Greenberg, S. (1999). Automatic transcription of prosodic stress for spontaneous English discourse. In Proceedings of the XIVth international congress of phonetic sciences (ICPhS) (Vol. 3, p. 2351).
Zurück zum Zitat Silipo, R., & Greenberg, S. (2000). Prosodic stress revisited: Reassessing the role of fundamental frequency. In Proceedings of NIST speech transcription workshop. Silipo, R., & Greenberg, S. (2000). Prosodic stress revisited: Reassessing the role of fundamental frequency. In Proceedings of NIST speech transcription workshop.
Zurück zum Zitat Sridhar, V. R., Bangalore, S., & Narayanan, S. S. (2008). Exploiting acoustic and syntactic features for automatic prosody labeling in a maximum entropy framework. IEEE Transactions on Audio, Speech, and Language Processing, 16(4), 797–811.CrossRef Sridhar, V. R., Bangalore, S., & Narayanan, S. S. (2008). Exploiting acoustic and syntactic features for automatic prosody labeling in a maximum entropy framework. IEEE Transactions on Audio, Speech, and Language Processing, 16(4), 797–811.CrossRef
Zurück zum Zitat Streefkerk, B. M., Pols, L. C., & Ten Bosch, L. F. (1997). Prominence in read aloud sentences, as marked by listeners and classified automatically. In Proceedings of the Institute of Phonetic Sciences, University of Amsterdam (Vol. 21, pp. 101–116). Streefkerk, B. M., Pols, L. C., & Ten Bosch, L. F. (1997). Prominence in read aloud sentences, as marked by listeners and classified automatically. In Proceedings of the Institute of Phonetic Sciences, University of Amsterdam (Vol. 21, pp. 101–116).
Zurück zum Zitat Syrdal, A. K., & McGory, J. T. (2000). Inter-transcriber reliability of ToBI prosodic labeling. In INTERSPEECH (pp. 235–238). Syrdal, A. K., & McGory, J. T. (2000). Inter-transcriber reliability of ToBI prosodic labeling. In INTERSPEECH (pp. 235–238).
Zurück zum Zitat Tamburini, F. (2006). Reliable prominence identification in English spontaneous speech. Proceedings of speech prosody 2006. Tamburini, F. (2006). Reliable prominence identification in English spontaneous speech. Proceedings of speech prosody 2006.
Zurück zum Zitat Terken, J. (1991). Fundamental frequency and perceived prominence of accented syllables. The Journal of the Acoustical Society of America, 89(4), 1768–1776.CrossRef Terken, J. (1991). Fundamental frequency and perceived prominence of accented syllables. The Journal of the Acoustical Society of America, 89(4), 1768–1776.CrossRef
Zurück zum Zitat Wightman, C., Price, P., Pierrehumbert, J., & Hirschberg, J. (1992). ToBI: A standard for labeling English prosody. In Proceedings of the 1992 international conference on spoken language processing, ICSLP (pp. 12–16). Wightman, C., Price, P., Pierrehumbert, J., & Hirschberg, J. (1992). ToBI: A standard for labeling English prosody. In Proceedings of the 1992 international conference on spoken language processing, ICSLP (pp. 12–16).
Zurück zum Zitat Xu, Y. (2012). Speech prosody: A methodological review. Journal of Speech Sciences, 1(1), 85–115. Xu, Y. (2012). Speech prosody: A methodological review. Journal of Speech Sciences, 1(1), 85–115.
Zurück zum Zitat Yoon, T., Chavarria, S., Cole, J., & Hasegawa-Johnson, M. (2004). Intertranscriber reliability of prosodic labeling on telephone conversation using ToBI. In INTERSPEECH. Yoon, T., Chavarria, S., Cole, J., & Hasegawa-Johnson, M. (2004). Intertranscriber reliability of prosodic labeling on telephone conversation using ToBI. In INTERSPEECH.
Metadaten
Titel
Automatic prominent syllable detection with machine learning classifiers
verfasst von
David O. Johnson
Okim Kang
Publikationsdatum
10.09.2015
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2015
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-015-9299-z

Weitere Artikel der Ausgabe 4/2015

International Journal of Speech Technology 4/2015 Zur Ausgabe

Neuer Inhalt