Skip to main content
Erschienen in: Multimedia Systems 4/2020

28.05.2020 | Regular Paper

A survey on speech synthesis techniques in Indian languages

verfasst von: Soumya Priyadarsini Panda, Ajit Kumar Nayak, Satyananda Champati Rai

Erschienen in: Multimedia Systems | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The text to speech technology has achieved significant progress during the past decade and is an active area of research and development in providing different human–computer interactive systems. Even though a number of speech synthesis models are available for different languages focusing on the domain requirements with many motive applications, a source of information on current trends in Indian language speech synthesis is unavailable till date making it difficult for the beginners to initiate research for the development of TTS systems for the low-resourced languages. This paper provides a review of the contributions made by different researchers in the field of Indian language speech synthesis along with a study on the Indian language characteristics and the associated challenges in designing TTS systems. A set of available applications and tools results out of different projects undertaken by different organizations along with a set of possible future developments are also discussed to provide a single reference to an important strand of research in speech synthesis which may benefit anyone interested to initiate research in this area.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Coelho, L.P., Braga, D., Dias, M.S., Mateo, C.G.: On the development of an automatic voice pleasantness classification and intensity estimation system. Comput. Speech Lang. 27(1), 75–88 (2013)CrossRef Coelho, L.P., Braga, D., Dias, M.S., Mateo, C.G.: On the development of an automatic voice pleasantness classification and intensity estimation system. Comput. Speech Lang. 27(1), 75–88 (2013)CrossRef
2.
Zurück zum Zitat Feng, J., Ramabhadran, B., Hansel, J., Williams, J.D.: Trends in speech and language processing. IEEE Signal Process. Mag. 29(1), 177–179 (2012)CrossRef Feng, J., Ramabhadran, B., Hansel, J., Williams, J.D.: Trends in speech and language processing. IEEE Signal Process. Mag. 29(1), 177–179 (2012)CrossRef
3.
Zurück zum Zitat Alwan, A., Narayanan, S., Strope, B., Shen, A.: A speech production and perception models and their applications to synthesis, recognition, and coding. In Proc: URSI International Symposium on Signals, Systems, and Electronics, pp. 367–372 (19950 Alwan, A., Narayanan, S., Strope, B., Shen, A.: A speech production and perception models and their applications to synthesis, recognition, and coding. In Proc: URSI International Symposium on Signals, Systems, and Electronics, pp. 367–372 (19950
4.
Zurück zum Zitat Ostendorf, M., Bulyko, I.: The impact of speech recognition on speech synthesis. In Proc: IEEE Workshop on Speech Synthesis, pp. 99–106 (2002) Ostendorf, M., Bulyko, I.: The impact of speech recognition on speech synthesis. In Proc: IEEE Workshop on Speech Synthesis, pp. 99–106 (2002)
5.
Zurück zum Zitat Botha, G.R., Barnard, E.: Factors that affect the accuracy of text-based language identification. Comput. Speech Lang. 26(5), 307–320 (2012)CrossRef Botha, G.R., Barnard, E.: Factors that affect the accuracy of text-based language identification. Comput. Speech Lang. 26(5), 307–320 (2012)CrossRef
6.
Zurück zum Zitat Li, Y., Lee, T., Qian, Y.: Analysis and modeling of F0 contours for Cantonese text-to-speech. ACM Trans. Asian Lang. Information Process. (TALIP) 3(3), 169–180 (2004)CrossRef Li, Y., Lee, T., Qian, Y.: Analysis and modeling of F0 contours for Cantonese text-to-speech. ACM Trans. Asian Lang. Information Process. (TALIP) 3(3), 169–180 (2004)CrossRef
7.
Zurück zum Zitat Bali, K., Talukdar, P.P., Krishna, N.S., Ramakrishnan, A.G.: Tools for the development of a Hindi speech synthesis system. In Proc: Fifth ISCA Workshop on Speech Synthesis (2004) Bali, K., Talukdar, P.P., Krishna, N.S., Ramakrishnan, A.G.: Tools for the development of a Hindi speech synthesis system. In Proc: Fifth ISCA Workshop on Speech Synthesis (2004)
8.
Zurück zum Zitat Narasimhan, B., Sproat, R., Kiraz, G.: Schwa-deletion in hindi text-to-speech synthesis. Int. J. Speech Technol. 7(4), 319–333 (2004)CrossRef Narasimhan, B., Sproat, R., Kiraz, G.: Schwa-deletion in hindi text-to-speech synthesis. Int. J. Speech Technol. 7(4), 319–333 (2004)CrossRef
9.
Zurück zum Zitat Rama, J., Ramakrishnan, A.G., Muralishankar, R., Prathibha, R.: A complete text-to-speech synthesis system in tamil. In Proc: WSS, pp. 191–194 (2002) Rama, J., Ramakrishnan, A.G., Muralishankar, R., Prathibha, R.: A complete text-to-speech synthesis system in tamil. In Proc: WSS, pp. 191–194 (2002)
10.
Zurück zum Zitat Talesara, S., Patil, H.A., Patel, T., Sailor, H., Shah, N.A.: Novel Gaussian filter-based automatic labeling of speech data for tts system in gujarati language. In Proc: ICALP, pp 139–142 (2013) Talesara, S., Patil, H.A., Patel, T., Sailor, H., Shah, N.A.: Novel Gaussian filter-based automatic labeling of speech data for tts system in gujarati language. In Proc: ICALP, pp 139–142 (2013)
11.
Zurück zum Zitat Panda, S.P., Nayak, A.K.: Integration of fuzzy if-then rule with waveform concatenation technique for text-to-speech synthesis in Odia. In Proc: 13th IEEE International Conference on Information Technology, pp. 88–93 (2014) Panda, S.P., Nayak, A.K.: Integration of fuzzy if-then rule with waveform concatenation technique for text-to-speech synthesis in Odia. In Proc: 13th IEEE International Conference on Information Technology, pp. 88–93 (2014)
12.
Zurück zum Zitat Christogiannis, C., Varvarigou, T., Zappa, A., Vamvakoulas, Y., Shih, C., and Arvaniti, A.: Construction of the acoustic inventory for a greek text-to-speech concatenative synthesis system. In Proc: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. II929–II932 (2002) Christogiannis, C., Varvarigou, T., Zappa, A., Vamvakoulas, Y., Shih, C., and Arvaniti, A.: Construction of the acoustic inventory for a greek text-to-speech concatenative synthesis system. In Proc: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. II929–II932 (2002)
13.
Zurück zum Zitat Maia, R., Akamine, M., Gales, M.J.: Complex cepstrum for statistical parametric speech synthesis. Speech Commun. 55(5), 606–618 (2013)CrossRef Maia, R., Akamine, M., Gales, M.J.: Complex cepstrum for statistical parametric speech synthesis. Speech Commun. 55(5), 606–618 (2013)CrossRef
14.
Zurück zum Zitat Maia, R., Akamine, M.: On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis. Comput. Speech Lang. 28(5), 1209–1232 (2014)CrossRef Maia, R., Akamine, M.: On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis. Comput. Speech Lang. 28(5), 1209–1232 (2014)CrossRef
15.
Zurück zum Zitat Panda, S.P., Nayak, A.K.: An efficient model for text-to-speech synthesis in Indian languages. Int. J. Speech Technol. 18(3), 305–315 (2015)CrossRef Panda, S.P., Nayak, A.K.: An efficient model for text-to-speech synthesis in Indian languages. Int. J. Speech Technol. 18(3), 305–315 (2015)CrossRef
16.
Zurück zum Zitat Panda, S.P., Nayak, A.K.: A waveform concatenation technique for text-to-speech synthesis. Int. J. Speech Technol. 20(4), 959–976 (2017)CrossRef Panda, S.P., Nayak, A.K.: A waveform concatenation technique for text-to-speech synthesis. Int. J. Speech Technol. 20(4), 959–976 (2017)CrossRef
17.
Zurück zum Zitat Besacier, L., Barnard, E., Karpov, A., Schultz, T.: Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)CrossRef Besacier, L., Barnard, E., Karpov, A., Schultz, T.: Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)CrossRef
18.
Zurück zum Zitat Panda, S.P., Nayak, A.K.: A rule-based concatenative approach to speech synthesis in Indian language text-to-speech systems. In Proc: Intelligent Computing, Communication and Devices, pp. 523-531, Springer (2015) Panda, S.P., Nayak, A.K.: A rule-based concatenative approach to speech synthesis in Indian language text-to-speech systems. In Proc: Intelligent Computing, Communication and Devices, pp. 523-531, Springer (2015)
19.
Zurück zum Zitat Handley, Z.: Is text-to-speech synthesis ready for use in computer-assisted language learning. Speech Commun. 51(10), 906–919 (2009)CrossRef Handley, Z.: Is text-to-speech synthesis ready for use in computer-assisted language learning. Speech Commun. 51(10), 906–919 (2009)CrossRef
20.
Zurück zum Zitat McCoy, K.F., Arnott, J.L., Ferres, L., Oken, M.F., Roark, B.: Speech and language processing as assistive technologies. Comput. Speech Lang. 27(6), 1143–1146 (2013)CrossRef McCoy, K.F., Arnott, J.L., Ferres, L., Oken, M.F., Roark, B.: Speech and language processing as assistive technologies. Comput. Speech Lang. 27(6), 1143–1146 (2013)CrossRef
21.
Zurück zum Zitat Bates, M.: The use of syntax in a speech understanding system. IEEE Trans. Acoust. Speech Signal Process. 23(6), 112–117 (1975)CrossRef Bates, M.: The use of syntax in a speech understanding system. IEEE Trans. Acoust. Speech Signal Process. 23(6), 112–117 (1975)CrossRef
22.
Zurück zum Zitat Moller, S., Jekosch, U., Mersdorf, J., Kraft, V.: Auditory assessment of synthesized speech in application scenarios: two case studies. Speech Commun. 34(3), 229–246 (2001)MATHCrossRef Moller, S., Jekosch, U., Mersdorf, J., Kraft, V.: Auditory assessment of synthesized speech in application scenarios: two case studies. Speech Commun. 34(3), 229–246 (2001)MATHCrossRef
23.
Zurück zum Zitat Panda, S.P., Nayak, A.K., Patnaik, S.: Text-to-speech synthesis with an Indian language perspective. Int. J. Grid Util. Comput. 6(3–4), 170–178 (2015)CrossRef Panda, S.P., Nayak, A.K., Patnaik, S.: Text-to-speech synthesis with an Indian language perspective. Int. J. Grid Util. Comput. 6(3–4), 170–178 (2015)CrossRef
24.
Zurück zum Zitat Liang, M.S., Yang, R.C., Chiang, Y.C., Lyu, D.C., Lyu, R. Y.: A Taiwanese text-to-speech system with applications to language learning. In Proc: IEEE International Conference on Advanced Learning Technologies, pp. 91–95 (20010 Liang, M.S., Yang, R.C., Chiang, Y.C., Lyu, D.C., Lyu, R. Y.: A Taiwanese text-to-speech system with applications to language learning. In Proc: IEEE International Conference on Advanced Learning Technologies, pp. 91–95 (20010
25.
Zurück zum Zitat Panda, S.P., Nayak, A.K.: modified rule-based concatenative technique for intelligible speech synthesis In indian languages. Adv. Sci. Lett. 22(2), 557–563 (2016)CrossRef Panda, S.P., Nayak, A.K.: modified rule-based concatenative technique for intelligible speech synthesis In indian languages. Adv. Sci. Lett. 22(2), 557–563 (2016)CrossRef
26.
Zurück zum Zitat Manning, A., Amare, N.: A simpler approach to grammar: (re)engineering parts-of-speech instruction to assist efl/esp students. In Proc: IEEE International Professional Communication Conference, pp. 1–9 (2007) Manning, A., Amare, N.: A simpler approach to grammar: (re)engineering parts-of-speech instruction to assist efl/esp students. In Proc: IEEE International Professional Communication Conference, pp. 1–9 (2007)
27.
Zurück zum Zitat Nebbia, L., Quazza, S., Luigi, P.S.: A specialised speech synthesis technique for application to automatic reverse directory service. In Proc: 4th Workshop on Interactive Voice Technology for Telecommunication, pp. 223–228 (1998) Nebbia, L., Quazza, S., Luigi, P.S.: A specialised speech synthesis technique for application to automatic reverse directory service. In Proc: 4th Workshop on Interactive Voice Technology for Telecommunication, pp. 223–228 (1998)
28.
Zurück zum Zitat Rafieee, M.S., Jafari, S., Ahmadi, H.S., Jafari, M.: Considerations to spoken language recognition for text-to-speech applications. In Proc: 13th ICCMS, pp. 303–309 (2011) Rafieee, M.S., Jafari, S., Ahmadi, H.S., Jafari, M.: Considerations to spoken language recognition for text-to-speech applications. In Proc: 13th ICCMS, pp. 303–309 (2011)
29.
Zurück zum Zitat Sak, H., Saraclar, M., Guungoor, T.: Morphology-based and sub-word language modeling for turkish speech recognition. In Proc: ICASSP, pp. 5402–5405 (2010) Sak, H., Saraclar, M., Guungoor, T.: Morphology-based and sub-word language modeling for turkish speech recognition. In Proc: ICASSP, pp. 5402–5405 (2010)
30.
Zurück zum Zitat Boldt, J., Ellis, D.: A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation. In Proc: EUPSIPCO, pp. 1849–1853 (2009) Boldt, J., Ellis, D.: A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation. In Proc: EUPSIPCO, pp. 1849–1853 (2009)
31.
Zurück zum Zitat Coulston, R., Oviatt, S., Darves, C.: Amplitude convergence in children’s conversational speech with animated personas. In Proc: Seventh International Conference on Spoken Language, pp. 5402–5405 (2002) Coulston, R., Oviatt, S., Darves, C.: Amplitude convergence in children’s conversational speech with animated personas. In Proc: Seventh International Conference on Spoken Language, pp. 5402–5405 (2002)
32.
Zurück zum Zitat Kleinberger, T., Becker, M., Ras, E., Holzinger, A.: Ambient intelligence in assisted living: enable elderly people to handle future interfaces. Lecture Notes Comput. Sci. Springer 4555, 103–112 (2007)CrossRef Kleinberger, T., Becker, M., Ras, E., Holzinger, A.: Ambient intelligence in assisted living: enable elderly people to handle future interfaces. Lecture Notes Comput. Sci. Springer 4555, 103–112 (2007)CrossRef
33.
Zurück zum Zitat Qiu, L., Benbasat, I.: An investigation into the effects of Text-To-Speech voice and 3D avatars on the perception of presence and flow of live help in electronic commerce. ACM Trans. Comput. Hum. Interact. (TOCHI) 12(4), 329–355 (2005)CrossRef Qiu, L., Benbasat, I.: An investigation into the effects of Text-To-Speech voice and 3D avatars on the perception of presence and flow of live help in electronic commerce. ACM Trans. Comput. Hum. Interact. (TOCHI) 12(4), 329–355 (2005)CrossRef
34.
Zurück zum Zitat Lu, H., Brush, A., Priyantha, B., Karlson, A.K., Liu, J.: Speaker- sense: energy efficient unobtrusive speaker identification on mobile phones. In Proc: 9th International Conference on Pervasive Computing, pp. 188–205 (2011) Lu, H., Brush, A., Priyantha, B., Karlson, A.K., Liu, J.: Speaker- sense: energy efficient unobtrusive speaker identification on mobile phones. In Proc: 9th International Conference on Pervasive Computing, pp. 188–205 (2011)
35.
Zurück zum Zitat Tabet, Y., Boughazi, M.: Speech synthesis techniques. a survey. In Proc: 7th IEEE International Workshop on System, Signal processing and their Applications, pp. 67–70 (2011) Tabet, Y., Boughazi, M.: Speech synthesis techniques. a survey. In Proc: 7th IEEE International Workshop on System, Signal processing and their Applications, pp. 67–70 (2011)
36.
Zurück zum Zitat Buza, O., Toderean, G., Nica, A., Caruntu, A.: Voice signal processing for speech synthesis. In Proc: IEEE International Conference on Automation, Quality and Testing, Robotics, pp. 360–364 (2006) Buza, O., Toderean, G., Nica, A., Caruntu, A.: Voice signal processing for speech synthesis. In Proc: IEEE International Conference on Automation, Quality and Testing, Robotics, pp. 360–364 (2006)
37.
Zurück zum Zitat Rojc, M., Kacic, Z.: Time and space-efficient architecture for a corpus-based text-to-speech synthesis system. Speech Commun. 49(3), 230–249 (2007)CrossRef Rojc, M., Kacic, Z.: Time and space-efficient architecture for a corpus-based text-to-speech synthesis system. Speech Commun. 49(3), 230–249 (2007)CrossRef
38.
Zurück zum Zitat Sasirekha, D., Chandra, E.: Text to speech: a simple tutorial. Int. J. Soft Comput. Eng. 2(1), 275–278 (2012) Sasirekha, D., Chandra, E.: Text to speech: a simple tutorial. Int. J. Soft Comput. Eng. 2(1), 275–278 (2012)
39.
Zurück zum Zitat Panda, S.P., Nayak, A.K.: A pronunciation rule-based speech synthesis technique for Odia numerals. In Proc: Computational Intelligence in Data Mining, pp. 483–491, Springer (2016) Panda, S.P., Nayak, A.K.: A pronunciation rule-based speech synthesis technique for Odia numerals. In Proc: Computational Intelligence in Data Mining, pp. 483–491, Springer (2016)
40.
Zurück zum Zitat Panda, S.P., Nayak, A.K.: A Context-based numeral reading technique for text to speech systems. Int. J. Electr. Comput. Eng. 8(6), 4533–4544 (2018) Panda, S.P., Nayak, A.K.: A Context-based numeral reading technique for text to speech systems. Int. J. Electr. Comput. Eng. 8(6), 4533–4544 (2018)
41.
Zurück zum Zitat Raj, A., Sarkar, T., Pammi, S.C, Yuvaraj, S., Bansal, M., Prahallad, K., Black. A.W.: Text processing for text to speech systems in Indian languages. In: Proc: 6th ISCA Speech Synthesis Workshop, pp. 188–193 (2007) Raj, A., Sarkar, T., Pammi, S.C, Yuvaraj, S., Bansal, M., Prahallad, K., Black. A.W.: Text processing for text to speech systems in Indian languages. In: Proc: 6th ISCA Speech Synthesis Workshop, pp. 188–193 (2007)
42.
Zurück zum Zitat Ebden, P., Sproat, R.: The Kestrel TTS text normalization system. Nat. Lang. Eng. 21(3), 333–353 (2015)CrossRef Ebden, P., Sproat, R.: The Kestrel TTS text normalization system. Nat. Lang. Eng. 21(3), 333–353 (2015)CrossRef
43.
Zurück zum Zitat Alias, F., Sevillano, X., Socor, J.C., Gonzalvo, X.: Towards high-quality next-generation text-to-speech synthesis: a multidomain approach by automatic domain classification. IEEE Trans. Audio Speech Lang. Process. 16(7), 1340–1354 (2008)CrossRef Alias, F., Sevillano, X., Socor, J.C., Gonzalvo, X.: Towards high-quality next-generation text-to-speech synthesis: a multidomain approach by automatic domain classification. IEEE Trans. Audio Speech Lang. Process. 16(7), 1340–1354 (2008)CrossRef
44.
Zurück zum Zitat Kim, B., Lee, G.G., Lee, J.H.: Morpheme-based grapheme to phoneme conversion using phonetic patterns and morphophonemic connectivity information. ACM Trans. Asian Lang. Inf. Process. (TALIP) 1(1), 65–82 (2002)CrossRef Kim, B., Lee, G.G., Lee, J.H.: Morpheme-based grapheme to phoneme conversion using phonetic patterns and morphophonemic connectivity information. ACM Trans. Asian Lang. Inf. Process. (TALIP) 1(1), 65–82 (2002)CrossRef
45.
Zurück zum Zitat Ward, N., Nakagawa, S.: Automatic user-adaptive speaking rate selection for information delivery. In Proc: 7th International Conference on Spoken Language Processing, pp. 341–362 (2002) Ward, N., Nakagawa, S.: Automatic user-adaptive speaking rate selection for information delivery. In Proc: 7th International Conference on Spoken Language Processing, pp. 341–362 (2002)
46.
Zurück zum Zitat Prafianto, H., Nose, T., Chiba, Y., Ito, A.: Improving human scoring of prosody using parametric speech synthesis. Speech Commun. 111, 14–21 (2019)CrossRef Prafianto, H., Nose, T., Chiba, Y., Ito, A.: Improving human scoring of prosody using parametric speech synthesis. Speech Commun. 111, 14–21 (2019)CrossRef
47.
Zurück zum Zitat Jia, Y., Huang, D., Liu, W., Dong, Y., Yu, S., Wang, H.: Text normalization in mandarin text-to-speech system. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4693–4696 (2008) Jia, Y., Huang, D., Liu, W., Dong, Y., Yu, S., Wang, H.: Text normalization in mandarin text-to-speech system. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4693–4696 (2008)
48.
Zurück zum Zitat Zhou, J., Su, X., Ylianttila, M., Riekki, J.: Exploring pervasive service computing opportunities for pursuing successful ageing. The Gerontologist, pp 73–82 (2012) Zhou, J., Su, X., Ylianttila, M., Riekki, J.: Exploring pervasive service computing opportunities for pursuing successful ageing. The Gerontologist, pp 73–82 (2012)
49.
Zurück zum Zitat Kujala, J.V.: A probabilistic approach to pronunciation by analogy. Comput. Speech Lang. 27(5), 1049–1067 (2013)CrossRef Kujala, J.V.: A probabilistic approach to pronunciation by analogy. Comput. Speech Lang. 27(5), 1049–1067 (2013)CrossRef
50.
Zurück zum Zitat Delogu, C., Conte, S., Sementina, C.: Cognitive factors in the evaluation of synthetic speech. Speech Commun. 24(2), 153–168 (1998)CrossRef Delogu, C., Conte, S., Sementina, C.: Cognitive factors in the evaluation of synthetic speech. Speech Commun. 24(2), 153–168 (1998)CrossRef
51.
Zurück zum Zitat Mayo, C., Robert, C., Clark, A.J., King, S.: Weighting of acoustic cues to synthetic speech naturalness: a multidimensional scaling analysis. Speech Commun. 53(3), 311–326 (2011)CrossRef Mayo, C., Robert, C., Clark, A.J., King, S.: Weighting of acoustic cues to synthetic speech naturalness: a multidimensional scaling analysis. Speech Commun. 53(3), 311–326 (2011)CrossRef
52.
Zurück zum Zitat Prahallad, K., Kumar, E.N., Keri, V., Rajendran, S., Black, A.W.: The iiit-h indic speech databases. In: Thirteenth Annual Conference of the International Speech Communication Association (2012) Prahallad, K., Kumar, E.N., Keri, V., Rajendran, S., Black, A.W.: The iiit-h indic speech databases. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
53.
Zurück zum Zitat Viswanathan, M.: Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (mos) scale. Comput. Speech Lang. 19(1), 55–83 (2005)MathSciNetCrossRef Viswanathan, M.: Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (mos) scale. Comput. Speech Lang. 19(1), 55–83 (2005)MathSciNetCrossRef
54.
Zurück zum Zitat Madhavi, G., Mini, B., Balakrishnan, N., Raj, R.: Om: one tool for many (indian) languages. J. Zhejiang Univ. Sci. A 6(11), 1348–1353 (2005)CrossRef Madhavi, G., Mini, B., Balakrishnan, N., Raj, R.: Om: one tool for many (indian) languages. J. Zhejiang Univ. Sci. A 6(11), 1348–1353 (2005)CrossRef
55.
Zurück zum Zitat Sarungbam, J.K., Kumar, B., Choudhary, A.: Script identification and language detection of 12 indian languages using dwt and template matching of frequently occurring character (s). In Proc: 5th IEEE International Conference on Confluence The Next Generation Information Technology, pp. 669–674 (2014) Sarungbam, J.K., Kumar, B., Choudhary, A.: Script identification and language detection of 12 indian languages using dwt and template matching of frequently occurring character (s). In Proc: 5th IEEE International Conference on Confluence The Next Generation Information Technology, pp. 669–674 (2014)
56.
Zurück zum Zitat Hangarge, M., Dhandra, B. V.: Shape and morphological transformation based features for language identification in indian document images. In Proc: IEEE First International Conference on Emerging Trends in Engineering and Technology, pp. 1175–1180 (2008) Hangarge, M., Dhandra, B. V.: Shape and morphological transformation based features for language identification in indian document images. In Proc: IEEE First International Conference on Emerging Trends in Engineering and Technology, pp. 1175–1180 (2008)
57.
Zurück zum Zitat Reddy, M.V., Margaret, M.T., Hanumanthappa, M.: Phoneme-to-speech dictionary for indian languages. In Proc: IEEE International Conference on Soft-Computing and Networks Security, pp. 1–4 (2015) Reddy, M.V., Margaret, M.T., Hanumanthappa, M.: Phoneme-to-speech dictionary for indian languages. In Proc: IEEE International Conference on Soft-Computing and Networks Security, pp. 1–4 (2015)
58.
Zurück zum Zitat Reddy, V.R., Maity, S., Rao, K.S.: Identification of indian languages using multi-level spectral and prosodic features. Int. J. Speech Technol. 16(4), 489–511 (2013)CrossRef Reddy, V.R., Maity, S., Rao, K.S.: Identification of indian languages using multi-level spectral and prosodic features. Int. J. Speech Technol. 16(4), 489–511 (2013)CrossRef
59.
Zurück zum Zitat Kishore, S.P., Kumar, R., Sangal, R.: A data driven synthesis approach for indian languages using syllable as basic unit. In Proc: International Conference on Natural Language Processing, pp. 311–316 (2002) Kishore, S.P., Kumar, R., Sangal, R.: A data driven synthesis approach for indian languages using syllable as basic unit. In Proc: International Conference on Natural Language Processing, pp. 311–316 (2002)
60.
Zurück zum Zitat Kanth, B L., Keri, V., Prahallad. K.S.: Durational characteristics of indian phonemes for language discrimination. In Proc: Information Systems for Indian Languages, pp. 130–135 (2011) Kanth, B L., Keri, V., Prahallad. K.S.: Durational characteristics of indian phonemes for language discrimination. In Proc: Information Systems for Indian Languages, pp. 130–135 (2011)
61.
Zurück zum Zitat Lavanya, P., Kishore, P., Madhavi, G.T.: A simple approach for building transliteration editors for indian languages. J. Zhejiang Univ. Sci. A 6(11), 1354–1361 (2005)CrossRef Lavanya, P., Kishore, P., Madhavi, G.T.: A simple approach for building transliteration editors for indian languages. J. Zhejiang Univ. Sci. A 6(11), 1354–1361 (2005)CrossRef
62.
Zurück zum Zitat Patil, H., Patel, T.B., Shah, N.J., Sailor, H.B., Krishnan, R., Kasthuri, G.R., Nagarajan, T., Christina, L., Kumar, N., Raghavendra, V., Kishore, S.P., Prasanna, S. R.M., Adiga, N., Singh, S.R., Anand, K., Kumar, P., Singh, B.C., Binil Kumar, S.L., Bhadran, T.G., Sajini, T., Saha, A., Basu, T., Rao, K.S., Narendra, N.P., Sao, A.K., Kumar, R., Talukdar, P., Chandra, S., Acharyaa, P., Lata, S., Murthy, H. A.: A syllable-based framework for unit selection synthesis in 13 indian languages. In Proc: IEEE International Conference on Asian Spoken Language Research and Evaluation, pp. 1–8 (2013) Patil, H., Patel, T.B., Shah, N.J., Sailor, H.B., Krishnan, R., Kasthuri, G.R., Nagarajan, T., Christina, L., Kumar, N., Raghavendra, V., Kishore, S.P., Prasanna, S. R.M., Adiga, N., Singh, S.R., Anand, K., Kumar, P., Singh, B.C., Binil Kumar, S.L., Bhadran, T.G., Sajini, T., Saha, A., Basu, T., Rao, K.S., Narendra, N.P., Sao, A.K., Kumar, R., Talukdar, P., Chandra, S., Acharyaa, P., Lata, S., Murthy, H. A.: A syllable-based framework for unit selection synthesis in 13 indian languages. In Proc: IEEE International Conference on Asian Spoken Language Research and Evaluation, pp. 1–8 (2013)
63.
Zurück zum Zitat Murthy, H.A., Bellur, A., Viswanath, V., Narayanan, B., Susan, A., Kasthuri, G., Krishnan, R., Rao, K.S., Maity, S., Narendra, N.P., Reddy, R., Ghosh, K., Sulochana, K. G., Abhilash, E. L., Sajini, T., Sasikumar, M., Singh, B.C., Kumar, P., Vijayaditya, P., Raghavendra, E. V., and Prahallad, K.: Building unit selection speech synthesis in indian languages: An initiative by an indian consortium. In Proc: COCOSDA, pp. 1–7 (2010) Murthy, H.A., Bellur, A., Viswanath, V., Narayanan, B., Susan, A., Kasthuri, G., Krishnan, R., Rao, K.S., Maity, S., Narendra, N.P., Reddy, R., Ghosh, K., Sulochana, K. G., Abhilash, E. L., Sajini, T., Sasikumar, M., Singh, B.C., Kumar, P., Vijayaditya, P., Raghavendra, E. V., and Prahallad, K.: Building unit selection speech synthesis in indian languages: An initiative by an indian consortium. In Proc: COCOSDA, pp. 1–7 (2010)
64.
Zurück zum Zitat Bellur, A., Narayan, K.B., Krishnan, K.R., Murthy, H.: A data driven synthesis approach for indian languages using syllable as basic unit. In Proc: IEEE National Conference on Communications, pp. 1–5 (2011) Bellur, A., Narayan, K.B., Krishnan, K.R., Murthy, H.: A data driven synthesis approach for indian languages using syllable as basic unit. In Proc: IEEE National Conference on Communications, pp. 1–5 (2011)
65.
Zurück zum Zitat Christiansen, C., Pedersen, M.S., Dau, T.: Prediction of speech intelligibility based on an auditory preprocessing model. Speech Commun. 52(7), 678–692 (2010)CrossRef Christiansen, C., Pedersen, M.S., Dau, T.: Prediction of speech intelligibility based on an auditory preprocessing model. Speech Commun. 52(7), 678–692 (2010)CrossRef
66.
Zurück zum Zitat Ma, J., Loizou, P.: Snr loss: a new objective measure for predicting the intelligibility of noise-suppressed speech. Speech Commun. 53(3), 340–354 (2011)CrossRef Ma, J., Loizou, P.: Snr loss: a new objective measure for predicting the intelligibility of noise-suppressed speech. Speech Commun. 53(3), 340–354 (2011)CrossRef
67.
Zurück zum Zitat Taal, C., Hendriks, R., Heusdens, R., Jensen, J.: An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)CrossRef Taal, C., Hendriks, R., Heusdens, R., Jensen, J.: An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)CrossRef
68.
Zurück zum Zitat Kates, J.M., Arehart, K.H.: Coherence and the speech intelligibility index. J. Acoust. Soc. Am. 117(4), 2224–2237 (2005)CrossRef Kates, J.M., Arehart, K.H.: Coherence and the speech intelligibility index. J. Acoust. Soc. Am. 117(4), 2224–2237 (2005)CrossRef
69.
Zurück zum Zitat Huang, G., Er, M.J.: An adaptive control scheme for articulatory synthesis of plosive-vowel sequences. In Proc: 38th Annual Conference on IEEE Industrial Electronics Society, pp. 1465–1470 (2012) Huang, G., Er, M.J.: An adaptive control scheme for articulatory synthesis of plosive-vowel sequences. In Proc: 38th Annual Conference on IEEE Industrial Electronics Society, pp. 1465–1470 (2012)
70.
Zurück zum Zitat Qinsheng, D., Jian, Z., Lirong, W., Lijuan, S.: Articulatory speech synthesis: a survey. In Proc: 14th IEEE International Conference on Computational Science and Engineering, pp. 539–542 (2011) Qinsheng, D., Jian, Z., Lirong, W., Lijuan, S.: Articulatory speech synthesis: a survey. In Proc: 14th IEEE International Conference on Computational Science and Engineering, pp. 539–542 (2011)
71.
Zurück zum Zitat Black, A.W., Bunnell, H.T., Dou, Y., Muthukumar, P.K., Metze, F., Perry, D., Polzeh, T., Prahallad, K., Steidl, S., Vaughn, C.: Articulatory features for expressive speech synthesis. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4005–4008 (2012) Black, A.W., Bunnell, H.T., Dou, Y., Muthukumar, P.K., Metze, F., Perry, D., Polzeh, T., Prahallad, K., Steidl, S., Vaughn, C.: Articulatory features for expressive speech synthesis. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4005–4008 (2012)
72.
Zurück zum Zitat Yu, B.L., Zeng, S.C.: Acoustic-to-articulatory mapping codebook constraint for determining vocal-tract length for inverse speech problem and articulatory synthesis. In Proc: 5th IEEE International Conference on Signal Processing, pp. 827–830 (2020) Yu, B.L., Zeng, S.C.: Acoustic-to-articulatory mapping codebook constraint for determining vocal-tract length for inverse speech problem and articulatory synthesis. In Proc: 5th IEEE International Conference on Signal Processing, pp. 827–830 (2020)
73.
Zurück zum Zitat Aryal, S., Gutierrez-Osuna, R.: Accent conversion through cross-speaker articulatory synthesis. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7694–7698 (2014) Aryal, S., Gutierrez-Osuna, R.: Accent conversion through cross-speaker articulatory synthesis. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7694–7698 (2014)
74.
Zurück zum Zitat Aryal, S., utierrez-Osuna, R.: Articulatory inversion and synthesis: towards articulatory-based modification of speech. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7952–7956 (2013) Aryal, S., utierrez-Osuna, R.: Articulatory inversion and synthesis: towards articulatory-based modification of speech. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7952–7956 (2013)
75.
Zurück zum Zitat Badin, P., Abry, C.: Articulatory synthesis from x-rays and inversion for an adaptive speech robo. In Proc: Fourth International Conference on Spoken Language, pp. 1125–1128 (1996) Badin, P., Abry, C.: Articulatory synthesis from x-rays and inversion for an adaptive speech robo. In Proc: Fourth International Conference on Spoken Language, pp. 1125–1128 (1996)
76.
Zurück zum Zitat Aryal, S., Osuna, R.: Data driven articulatory synthesis with deep neural networks. Comput. Speech Lang. 36, 260–273 (2016)CrossRef Aryal, S., Osuna, R.: Data driven articulatory synthesis with deep neural networks. Comput. Speech Lang. 36, 260–273 (2016)CrossRef
77.
Zurück zum Zitat Illa, A., Ghosh, P.K.: The impact of speaking rate on acoustic-to-articulatory inversion. Comput. Speech Lang. 59, 75–90 (2020)CrossRef Illa, A., Ghosh, P.K.: The impact of speaking rate on acoustic-to-articulatory inversion. Comput. Speech Lang. 59, 75–90 (2020)CrossRef
78.
Zurück zum Zitat Pape, D., Jesus, L., Birkholz, P.: Intervocalic fricative perception in European Portuguese: an articulatory synthesis study. Speech Commun. 74, 93–103 (2015)CrossRef Pape, D., Jesus, L., Birkholz, P.: Intervocalic fricative perception in European Portuguese: an articulatory synthesis study. Speech Commun. 74, 93–103 (2015)CrossRef
79.
Zurück zum Zitat Ngo, T., Akagi, M., Birkholz, P.: Effect of articulatory and acoustic features on the intelligibility of speech in noise: an articulatory synthesis study. Speech Commun. 117, 13–20 (2020)CrossRef Ngo, T., Akagi, M., Birkholz, P.: Effect of articulatory and acoustic features on the intelligibility of speech in noise: an articulatory synthesis study. Speech Commun. 117, 13–20 (2020)CrossRef
80.
Zurück zum Zitat Birkholz, P., Lucia, M., Xu, Y.Scherbaum, Rube, C.: Manipulation of the prosodic features of vocal tract length, nasality and articulatory precision using articulatory synthesis. Comput. Speech Lang. 41, 116–127 (2017)CrossRef Birkholz, P., Lucia, M., Xu, Y.Scherbaum, Rube, C.: Manipulation of the prosodic features of vocal tract length, nasality and articulatory precision using articulatory synthesis. Comput. Speech Lang. 41, 116–127 (2017)CrossRef
81.
Zurück zum Zitat Chen, C.P., Huang, Y.C., Wu, C.H., Lee, K.D.: Polyglot speech synthesis based on cross-lingual frame selection using auditory and articulatory features. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1558–1570 (2014)CrossRef Chen, C.P., Huang, Y.C., Wu, C.H., Lee, K.D.: Polyglot speech synthesis based on cross-lingual frame selection using auditory and articulatory features. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1558–1570 (2014)CrossRef
82.
Zurück zum Zitat Stevens, K.N.: Toward formant synthesis with articulatory controls. In Proc: IEEE Workshop on Speech Synthesis, pp. 67–72 (2002) Stevens, K.N.: Toward formant synthesis with articulatory controls. In Proc: IEEE Workshop on Speech Synthesis, pp. 67–72 (2002)
83.
Zurück zum Zitat Ling, Z.H., Richmond, K., Yamagishi, J., Wang, R.H.: Integrating articulatory features into hmm-based parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1171–1185 (2009)CrossRef Ling, Z.H., Richmond, K., Yamagishi, J., Wang, R.H.: Integrating articulatory features into hmm-based parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1171–1185 (2009)CrossRef
84.
Zurück zum Zitat Klatt, D.H.: Software for a cascade/parallel formant synthesizer. J. Acoust. Soc. Am. 67(3), 971–995 (1980)CrossRef Klatt, D.H.: Software for a cascade/parallel formant synthesizer. J. Acoust. Soc. Am. 67(3), 971–995 (1980)CrossRef
85.
Zurück zum Zitat Summerfield, C.D.: A multi-channel formant speech synthesis system. In Proc: Fourth IEEE Region 10 International Conference, pp. 490–493 (1989) Summerfield, C.D.: A multi-channel formant speech synthesis system. In Proc: Fourth IEEE Region 10 International Conference, pp. 490–493 (1989)
86.
Zurück zum Zitat Khorinphan, C., Phansamdaeng, S., Saiyod, S.: Thai speech synthesis with emotional tone: Based on formant synthesis for home robot. In Proc: Third IEEE ICT International Student Project Conference, pp. 111–114 (2014) Khorinphan, C., Phansamdaeng, S., Saiyod, S.: Thai speech synthesis with emotional tone: Based on formant synthesis for home robot. In Proc: Third IEEE ICT International Student Project Conference, pp. 111–114 (2014)
87.
Zurück zum Zitat Sousa, J., Araujo, F., Klautau, A.: Utterance copy for klatt’s speech synthesizer using genetic algorithm. In Proc: IEEE Workshop on Spoken Language Technology, pp. 89–94 (2014) Sousa, J., Araujo, F., Klautau, A.: Utterance copy for klatt’s speech synthesizer using genetic algorithm. In Proc: IEEE Workshop on Spoken Language Technology, pp. 89–94 (2014)
88.
Zurück zum Zitat Trindade, J., Araujo, F., Klautau, A., Batista, P.: A genetic algorithm with look-ahead mechanism to estimate formant synthesizer input parameters. In Proc: IEEE Congress on Evolutionary Computation, pp. 3035–3042 (2013) Trindade, J., Araujo, F., Klautau, A., Batista, P.: A genetic algorithm with look-ahead mechanism to estimate formant synthesizer input parameters. In Proc: IEEE Congress on Evolutionary Computation, pp. 3035–3042 (2013)
89.
Zurück zum Zitat Chan, K., Hall, M.: The importance of vowel formant frequencies and proximity in vowel space to the perception of foreign accent. J. Phonet. 77, 100919 (2019)CrossRef Chan, K., Hall, M.: The importance of vowel formant frequencies and proximity in vowel space to the perception of foreign accent. J. Phonet. 77, 100919 (2019)CrossRef
90.
Zurück zum Zitat Pellicani, A., Fontes, A., Santos, F., Pellicani, A., Aguiar-Ricz, L.: Fundamental frequency and formants before and after prolonged voice use in teachers. J. Voice 32(2), 177–184 (2018)CrossRef Pellicani, A., Fontes, A., Santos, F., Pellicani, A., Aguiar-Ricz, L.: Fundamental frequency and formants before and after prolonged voice use in teachers. J. Voice 32(2), 177–184 (2018)CrossRef
91.
Zurück zum Zitat Barkana, B., Patel, A.: Analysis of vowel production in Mandarin/Hindi/American- accented English for accent recognition systems. Appl. Acoust. 16, 107203 (2020)CrossRef Barkana, B., Patel, A.: Analysis of vowel production in Mandarin/Hindi/American- accented English for accent recognition systems. Appl. Acoust. 16, 107203 (2020)CrossRef
92.
Zurück zum Zitat Akçay, M., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020)CrossRef Akçay, M., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020)CrossRef
93.
Zurück zum Zitat Hansen, J.H., Chappell, D.T.: An auditory-based distortion measure with application to concatenative speech synthesis. IEEE Trans. On Speech Audio Process. 6(5), 489–495 (1998)CrossRef Hansen, J.H., Chappell, D.T.: An auditory-based distortion measure with application to concatenative speech synthesis. IEEE Trans. On Speech Audio Process. 6(5), 489–495 (1998)CrossRef
94.
Zurück zum Zitat Panda, S.P., Nayak, A.K.: Vowel onset point based waveform concatenation technique for intelligible speech synthesis. In Proc: International Conference on Computing Methodologies and Communication (ICCMC 2017), IEEE, pp. 622–626 (2018) Panda, S.P., Nayak, A.K.: Vowel onset point based waveform concatenation technique for intelligible speech synthesis. In Proc: International Conference on Computing Methodologies and Communication (ICCMC 2017), IEEE, pp. 622–626 (2018)
95.
Zurück zum Zitat Schwarz, D.: Corpus-based concatenative synthesis. IEEE Signal Process. Mag. 24(2), 92–104 (2007)CrossRef Schwarz, D.: Corpus-based concatenative synthesis. IEEE Signal Process. Mag. 24(2), 92–104 (2007)CrossRef
96.
Zurück zum Zitat Conkie, A.: Robust unit selection system for speech synthesis. In Proc: 137th meeting of the Acoustical Society of America, pp. 978 (1999) Conkie, A.: Robust unit selection system for speech synthesis. In Proc: 137th meeting of the Acoustical Society of America, pp. 978 (1999)
97.
Zurück zum Zitat Black, A.W., Lenzo, K.A.: Optimal data selection for unit selection synthesis. In Proc: 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis (2001) Black, A.W., Lenzo, K.A.: Optimal data selection for unit selection synthesis. In Proc: 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis (2001)
98.
Zurück zum Zitat Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In Proc: International Conference on Acoustics, Speech, and Signal Processing, ICASSP-96, pp. 373–376 (1996) Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In Proc: International Conference on Acoustics, Speech, and Signal Processing, ICASSP-96, pp. 373–376 (1996)
99.
Zurück zum Zitat Sharma, P., Abrol, V., Nivedita Sao, A.K.: Reducing footprint of unit selection based text-to-speech system using compressed sensing and sparse representation. Comput. Speech Lang. 52, 191–208 (2018)CrossRef Sharma, P., Abrol, V., Nivedita Sao, A.K.: Reducing footprint of unit selection based text-to-speech system using compressed sensing and sparse representation. Comput. Speech Lang. 52, 191–208 (2018)CrossRef
100.
Zurück zum Zitat Nukaga, N., Kamoshida, R., Nagamatsu, K., Kitahara, Y.: Scalable implementation of unit selection based text-to-speech system for embedded solutions. In Proc: IEEE International Conference on Acoustic, Speech and Signal Processing, pp. 849–852 (2006) Nukaga, N., Kamoshida, R., Nagamatsu, K., Kitahara, Y.: Scalable implementation of unit selection based text-to-speech system for embedded solutions. In Proc: IEEE International Conference on Acoustic, Speech and Signal Processing, pp. 849–852 (2006)
101.
Zurück zum Zitat Bellegarda, J.R.: Unit-centric feature mapping for inventory pruning in unit selection text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 16(1), 74–82 (2008)CrossRef Bellegarda, J.R.: Unit-centric feature mapping for inventory pruning in unit selection text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 16(1), 74–82 (2008)CrossRef
102.
Zurück zum Zitat Narendra, N.P., Rao, K.S.: Optimal weight tuning method for unit selection cost functions in syllable based text-to-speech synthesis. Appl. Soft Comput. 13, 773–781 (2013)CrossRef Narendra, N.P., Rao, K.S.: Optimal weight tuning method for unit selection cost functions in syllable based text-to-speech synthesis. Appl. Soft Comput. 13, 773–781 (2013)CrossRef
103.
Zurück zum Zitat Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)CrossRef Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)CrossRef
104.
Zurück zum Zitat Black, A.W., Campbell, N.: Optimising selection of units from speech databases for concatenative synthesis (1995) Black, A.W., Campbell, N.: Optimising selection of units from speech databases for concatenative synthesis (1995)
105.
Zurück zum Zitat Xia, X.J., Ling, Z.H., Yang, C.Y., Dai, L.R.: Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech. In Proc: 8th IEEE International Symposium on Chinese Spoken Language Processing, pp. 160–164 (2012) Xia, X.J., Ling, Z.H., Yang, C.Y., Dai, L.R.: Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech. In Proc: 8th IEEE International Symposium on Chinese Spoken Language Processing, pp. 160–164 (2012)
106.
Zurück zum Zitat Bellegarda, J.R.: Globally optimal training of unit boundaries in unit selection text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 15(3), 957–965 (2008)CrossRef Bellegarda, J.R.: Globally optimal training of unit boundaries in unit selection text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 15(3), 957–965 (2008)CrossRef
107.
Zurück zum Zitat Epko, J., Talafova, R., Vrabec, J.: Indexing join costs for faster unit selection synthesis. In Proc: 15th IEEE International Conference on Systems, Signals and Image Processing, pp. 503–506 (2008) Epko, J., Talafova, R., Vrabec, J.: Indexing join costs for faster unit selection synthesis. In Proc: 15th IEEE International Conference on Systems, Signals and Image Processing, pp. 503–506 (2008)
108.
Zurück zum Zitat Kishore, S.P., Black, A.W.: Unit size in unit selection speech synthesis. In Proc: INTERSPEECH, pp. 1–7 (2003) Kishore, S.P., Black, A.W.: Unit size in unit selection speech synthesis. In Proc: INTERSPEECH, pp. 1–7 (2003)
109.
Zurück zum Zitat Kishore, S. P., Black, A.W., Kumar, R., Sangal, R.: Experiments with unit selection speech databases for indian languages. In Proc: National seminar on Language Technology Tools, pp. 1–7 (2003) Kishore, S. P., Black, A.W., Kumar, R., Sangal, R.: Experiments with unit selection speech databases for indian languages. In Proc: National seminar on Language Technology Tools, pp. 1–7 (2003)
110.
Zurück zum Zitat Prahallad, K., Vadapalli, A., Elluru, N., Mantena, G., Pulugundla, B., Bhaskararao, P., Murthy, H.A., King, S., Karaiskos, V., Black, A.W.: The blizzard challenge 2013-indian language task. In Proc: Blizzard Challenge Workshop, pp. 1–7 (2013) Prahallad, K., Vadapalli, A., Elluru, N., Mantena, G., Pulugundla, B., Bhaskararao, P., Murthy, H.A., King, S., Karaiskos, V., Black, A.W.: The blizzard challenge 2013-indian language task. In Proc: Blizzard Challenge Workshop, pp. 1–7 (2013)
111.
Zurück zum Zitat Black A., Tokuda, K.: The blizzard challenge 2005: Evaluating corpus-based speech synthesis on common databases. In Proc: Interspeech, pp. 1–7 (2005) Black A., Tokuda, K.: The blizzard challenge 2005: Evaluating corpus-based speech synthesis on common databases. In Proc: Interspeech, pp. 1–7 (2005)
112.
Zurück zum Zitat Charpentier, F.J., Stella, M.G.: Diphone synthesis using an overlap-add technique for speech waveforms concatenation. In Proc: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 2015–2018 (1986) Charpentier, F.J., Stella, M.G.: Diphone synthesis using an overlap-add technique for speech waveforms concatenation. In Proc: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 2015–2018 (1986)
113.
Zurück zum Zitat Justin, T., Struc, V., Dobrisek, S., Vesnicer, B., Ipsic, I., Mihelic, F., 2015. Speaker de-identification using diphone recognition and speech synthesis. In Proc: IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, pp. 1–7 (2015) Justin, T., Struc, V., Dobrisek, S., Vesnicer, B., Ipsic, I., Mihelic, F., 2015. Speaker de-identification using diphone recognition and speech synthesis. In Proc: IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, pp. 1–7 (2015)
114.
Zurück zum Zitat Mellahi, T., Hamdi, R.: Lpc-based formant enhancement method in kalman filtering for speech enhancement. AEU-Int. J. Electron. Commun. 69(2), 545–554 (2015)CrossRef Mellahi, T., Hamdi, R.: Lpc-based formant enhancement method in kalman filtering for speech enhancement. AEU-Int. J. Electron. Commun. 69(2), 545–554 (2015)CrossRef
115.
Zurück zum Zitat Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using psola technique. In Proc: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 145–148 (1992) Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using psola technique. In Proc: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 145–148 (1992)
116.
Zurück zum Zitat Dutoit, T., Leich, H.: Mbr-psola: text-to-speech synthesis based on an mbere-synthesis of the segments database. Speech Commun. 13(3), 435–440 (1993)CrossRef Dutoit, T., Leich, H.: Mbr-psola: text-to-speech synthesis based on an mbere-synthesis of the segments database. Speech Commun. 13(3), 435–440 (1993)CrossRef
117.
Zurück zum Zitat Hamon, C., Mouline, E., Charpentier, F.: A diphone synthesis system based on time-domain prosodic modifications of speech. In Proc: International Conference on Acoustics, Speech, and Signal Processing, pp. 238–241 (1989) Hamon, C., Mouline, E., Charpentier, F.: A diphone synthesis system based on time-domain prosodic modifications of speech. In Proc: International Conference on Acoustics, Speech, and Signal Processing, pp. 238–241 (1989)
118.
Zurück zum Zitat Katae, N., Kimura, S.: Natural prosody generation for domain specific text-to-speech systems. In Proc: Fourth International Conference on Spoken Language Processing, pp. 1852–1855 (1996) Katae, N., Kimura, S.: Natural prosody generation for domain specific text-to-speech systems. In Proc: Fourth International Conference on Spoken Language Processing, pp. 1852–1855 (1996)
119.
Zurück zum Zitat Aust, H., Oerder, M., Seide, F., Steinbiss, V.: A spoken language inquiry system for automatic train timetable information. Philips J. Res. 49(4), 399–418 (1995)MATHCrossRef Aust, H., Oerder, M., Seide, F., Steinbiss, V.: A spoken language inquiry system for automatic train timetable information. Philips J. Res. 49(4), 399–418 (1995)MATHCrossRef
120.
Zurück zum Zitat Meng, H.M., Lee, S., Wai, C.: Intelligent speech for information systems: towards biliteracy and trilingualism. Interact. Comput. 14(4), 327–339 (2002)CrossRef Meng, H.M., Lee, S., Wai, C.: Intelligent speech for information systems: towards biliteracy and trilingualism. Interact. Comput. 14(4), 327–339 (2002)CrossRef
121.
Zurück zum Zitat Fries, G.: Hybrid time-and frequency-domain speech synthesis with extended glottal source generation. In Proc: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 581–584 (1994) Fries, G.: Hybrid time-and frequency-domain speech synthesis with extended glottal source generation. In Proc: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 581–584 (1994)
122.
Zurück zum Zitat Phung, T.N., Mai, C.L., Akagi, M.: A concatenative speech synthesis for monosyllabic languages with limited data. In Proc: IEEE Signal and Information Processing Association Annual Summit and Conference, pp. 1–10 (2012) Phung, T.N., Mai, C.L., Akagi, M.: A concatenative speech synthesis for monosyllabic languages with limited data. In Proc: IEEE Signal and Information Processing Association Annual Summit and Conference, pp. 1–10 (2012)
123.
Zurück zum Zitat Narendra, N.P., Rao, K.S.: Syllable specific unit selection cost functions for text-to-speech synthesis. ACM Trans. Speech Lang. Process. (TSLP) 9(3), 5 (2012) Narendra, N.P., Rao, K.S.: Syllable specific unit selection cost functions for text-to-speech synthesis. ACM Trans. Speech Lang. Process. (TSLP) 9(3), 5 (2012)
124.
Zurück zum Zitat Reddy, V.R., Rao, K.S.: Two-stage intonation modeling using feed forward neural networks for syllable based text-to-speech synthesis. Comput. Speech Lang. 17(5), 1105–1126 (2013)CrossRef Reddy, V.R., Rao, K.S.: Two-stage intonation modeling using feed forward neural networks for syllable based text-to-speech synthesis. Comput. Speech Lang. 17(5), 1105–1126 (2013)CrossRef
125.
Zurück zum Zitat Xie, Y., Zhang, B., Zhang, J.: The training of the tone of mandarin two syllable words based on pitch projection synthesis speech. In Proc: 9th IEEE International Symposium on Chinese Spoken Language Processing, pp. 435–435 (2014) Xie, Y., Zhang, B., Zhang, J.: The training of the tone of mandarin two syllable words based on pitch projection synthesis speech. In Proc: 9th IEEE International Symposium on Chinese Spoken Language Processing, pp. 435–435 (2014)
126.
Zurück zum Zitat Narendra, N.P., Rao, K.S., Ghosh, K., Vempada, R.R., Maity, S.: Development of syllable-based text to speech synthesis system in bengali. Int. J. Speech Technol. 14(1), 167–181 (2011)CrossRef Narendra, N.P., Rao, K.S., Ghosh, K., Vempada, R.R., Maity, S.: Development of syllable-based text to speech synthesis system in bengali. Int. J. Speech Technol. 14(1), 167–181 (2011)CrossRef
127.
Zurück zum Zitat Thomas, S., Rao, M.N., Murthy, H., Ramalingam, C.S.: Natural sounding tts based on syllable-like units. In Proc: 14th IEEE European Signal Processing Conference, pp. 1–5 (2006) Thomas, S., Rao, M.N., Murthy, H., Ramalingam, C.S.: Natural sounding tts based on syllable-like units. In Proc: 14th IEEE European Signal Processing Conference, pp. 1–5 (2006)
128.
Zurück zum Zitat Venugopalakrishna, Y.R., Vinodh, M.V., Murthy, H., Ramalingam, C.S.: Methods for improving the quality of syllable based speech synthesis. In Proc: IEEE Spoken Language Technology Workshop, pp. 29–32 (2008) Venugopalakrishna, Y.R., Vinodh, M.V., Murthy, H., Ramalingam, C.S.: Methods for improving the quality of syllable based speech synthesis. In Proc: IEEE Spoken Language Technology Workshop, pp. 29–32 (2008)
129.
Zurück zum Zitat Wu, C.H., Huang, Y.C., Lee, C.H., Guo, J.C.: Synthesis of spontaneous speech with syllable contraction using state-based context-dependent voice transformation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 585–595 (2014)CrossRef Wu, C.H., Huang, Y.C., Lee, C.H., Guo, J.C.: Synthesis of spontaneous speech with syllable contraction using state-based context-dependent voice transformation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 585–595 (2014)CrossRef
130.
Zurück zum Zitat Raghavendra, E.V., Desai, S., Yegnanarayana, B., Black, A.W., Prahallad, K.: Global syllable set for building speech synthesis in indian languages. In Proc: IEEE Spoken Language Technology Workshop, pp. 49–52 (2008) Raghavendra, E.V., Desai, S., Yegnanarayana, B., Black, A.W., Prahallad, K.: Global syllable set for building speech synthesis in indian languages. In Proc: IEEE Spoken Language Technology Workshop, pp. 49–52 (2008)
131.
Zurück zum Zitat Latorre, J., Iwano, K., Furui, S.: Polyglot synthesis using a mixture of monolingual corpora. In Proc: ICASSP, pp. 1–4 (2005) Latorre, J., Iwano, K., Furui, S.: Polyglot synthesis using a mixture of monolingual corpora. In Proc: ICASSP, pp. 1–4 (2005)
132.
Zurück zum Zitat Black, A.W., Lenzo, K.A.: Multilingual text-to-speech synthesis. Proc Int. Conf. Acoust. Speech Signal Process. 3, iii-761 (2004) Black, A.W., Lenzo, K.A.: Multilingual text-to-speech synthesis. Proc Int. Conf. Acoust. Speech Signal Process. 3, iii-761 (2004)
133.
Zurück zum Zitat Ramani, B., Actlin Jeeva, M.P., Vijayalakshmi, P., Nagarajan, T.: Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages. In Proc: IEEE Region 10 Conference, pp. 1–4 (2013) Ramani, B., Actlin Jeeva, M.P., Vijayalakshmi, P., Nagarajan, T.: Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages. In Proc: IEEE Region 10 Conference, pp. 1–4 (2013)
134.
Zurück zum Zitat Latorre, J., Iwano, K., Furui, S.: New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer. Speech Commun. 48, 1227–1242 (2006)CrossRef Latorre, J., Iwano, K., Furui, S.: New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer. Speech Commun. 48, 1227–1242 (2006)CrossRef
135.
Zurück zum Zitat Chen, C.P., Huang, Y.C., Wu, C.H., Lee, K.D.: Cross-lingual frame selection method for polyglot speech synthesis. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4521–4524 (2012) Chen, C.P., Huang, Y.C., Wu, C.H., Lee, K.D.: Cross-lingual frame selection method for polyglot speech synthesis. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4521–4524 (2012)
136.
Zurück zum Zitat Solomi, V., Sherlin, M.S., Saranya, G., Anushiya, R., Vijayalakshmi, P., Nagarajan, T.: Performance comparison of KLD and PoG metrics for finding the acoustic similarity between phonemes for the development of a polyglot synthesizer. In Proc: IEEE Region 10 Conference, pp. 1–4 (2014) Solomi, V., Sherlin, M.S., Saranya, G., Anushiya, R., Vijayalakshmi, P., Nagarajan, T.: Performance comparison of KLD and PoG metrics for finding the acoustic similarity between phonemes for the development of a polyglot synthesizer. In Proc: IEEE Region 10 Conference, pp. 1–4 (2014)
137.
Zurück zum Zitat Romsdorfer, H., Pfister, B.: Text analysis and language identification for polyglot text-to-speech synthesis. Speech Commun. 49, 697–724 (2007)CrossRef Romsdorfer, H., Pfister, B.: Text analysis and language identification for polyglot text-to-speech synthesis. Speech Commun. 49, 697–724 (2007)CrossRef
138.
Zurück zum Zitat Solomi, V.S., Christina, S.L., Rachel, G.A., Ramani, B., Vijayalakshmi, P., Nagarajan, T.: Analysis on acoustic similarities between tamil and english phonemes using product of likelihood-Gaussians for an HMM-based mixed-language synthesizer. In Proc: IEEE International Conference on Asian Spoken Language Research and Evaluation, pp. 1–5 (2013) Solomi, V.S., Christina, S.L., Rachel, G.A., Ramani, B., Vijayalakshmi, P., Nagarajan, T.: Analysis on acoustic similarities between tamil and english phonemes using product of likelihood-Gaussians for an HMM-based mixed-language synthesizer. In Proc: IEEE International Conference on Asian Spoken Language Research and Evaluation, pp. 1–5 (2013)
139.
Zurück zum Zitat Gibson, M., Byrne, W.: Unsupervised intralingual and cross-lingual speaker adaptation for hmm-based speech synthesis using two-pass decision tree construction. IEEE Trans. Audio Speech Lang. Process. 19(4), 895–904 (2011)CrossRef Gibson, M., Byrne, W.: Unsupervised intralingual and cross-lingual speaker adaptation for hmm-based speech synthesis using two-pass decision tree construction. IEEE Trans. Audio Speech Lang. Process. 19(4), 895–904 (2011)CrossRef
140.
Zurück zum Zitat Lorenzo-Trueba, J., Barra-Chicote, R., San-Segundo, R., Ferreiros, J., Yamagishi, J., Montero, J.M.: Emotion transplantation through adaptation in hmm-based speech synthesis. Comput. Speech Lang. 34(1), 292–307 (2015)CrossRef Lorenzo-Trueba, J., Barra-Chicote, R., San-Segundo, R., Ferreiros, J., Yamagishi, J., Montero, J.M.: Emotion transplantation through adaptation in hmm-based speech synthesis. Comput. Speech Lang. 34(1), 292–307 (2015)CrossRef
141.
Zurück zum Zitat Maeno, Y., Nose, T., Kobayashi, T., Koriyama, T., Ijima, Y., Nakajima, H., Mizuno, H., Yoshioka, O.: Prosodic variation enhancement using unsupervised context labeling for hmm-based expressive speech synthesis. Speech Commun. 57, 144–154 (2014)CrossRef Maeno, Y., Nose, T., Kobayashi, T., Koriyama, T., Ijima, Y., Nakajima, H., Mizuno, H., Yoshioka, O.: Prosodic variation enhancement using unsupervised context labeling for hmm-based expressive speech synthesis. Speech Commun. 57, 144–154 (2014)CrossRef
142.
Zurück zum Zitat Nose, T., Kobayashi, T.: An intuitive style control technique in hmm based expressive speech synthesis using subjective style intensity and multiple regression global variance model. Speech Commun. 55(2), 347–357 (2013)CrossRef Nose, T., Kobayashi, T.: An intuitive style control technique in hmm based expressive speech synthesis using subjective style intensity and multiple regression global variance model. Speech Commun. 55(2), 347–357 (2013)CrossRef
143.
Zurück zum Zitat Ekpenyong, M., Urua, E.A., Watts, O., King, S., Yamagishi, J.: Statistical parametric speech synthesis for ibibio. Speech Commun. 56, 243–251 (2014)CrossRef Ekpenyong, M., Urua, E.A., Watts, O., King, S., Yamagishi, J.: Statistical parametric speech synthesis for ibibio. Speech Commun. 56, 243–251 (2014)CrossRef
144.
Zurück zum Zitat Romsdorfer, H.: Speech prosody control using weighted neural network ensembles. In Proc: IEEE International Workshop on Machine Learning for Signal Processing, pp. 1–6 (2009) Romsdorfer, H.: Speech prosody control using weighted neural network ensembles. In Proc: IEEE International Workshop on Machine Learning for Signal Processing, pp. 1–6 (2009)
145.
Zurück zum Zitat Koriyama, T., Nose, T., Kobayashi, T.: Statistical parametric speech synthesis based on gaussian process regression. IEEE J. Select. Topics Signal Process. 8(2), 173–183 (2014)CrossRef Koriyama, T., Nose, T., Kobayashi, T.: Statistical parametric speech synthesis based on gaussian process regression. IEEE J. Select. Topics Signal Process. 8(2), 173–183 (2014)CrossRef
146.
Zurück zum Zitat Ilyes, R., Ayed, Y. B.: Statistical parametric speech synthesis for Arabic language using ann. In Proc: IEEE International Conference on Advanced Technologies for Signal and Image Processing, pp. 452–457 (2014) Ilyes, R., Ayed, Y. B.: Statistical parametric speech synthesis for Arabic language using ann. In Proc: IEEE International Conference on Advanced Technologies for Signal and Image Processing, pp. 452–457 (2014)
147.
Zurück zum Zitat Al-Radhi, M., Abdo, O., Csapó, T., Abdou, S., Fashal, M.: A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus. Comput. Speech Lang. 60, 101025 (2020)CrossRef Al-Radhi, M., Abdo, O., Csapó, T., Abdou, S., Fashal, M.: A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus. Comput. Speech Lang. 60, 101025 (2020)CrossRef
148.
Zurück zum Zitat Reddy, M., Rao, K.S.: Excitation modelling using epoch features for statistical parametric speech synthesis. Comput. Speech Lang. 60, 101029 (2020)CrossRef Reddy, M., Rao, K.S.: Excitation modelling using epoch features for statistical parametric speech synthesis. Comput. Speech Lang. 60, 101029 (2020)CrossRef
149.
Zurück zum Zitat Nagaraj Adiga, N., Khonglah, B., Mahadeva Prasanna, S.R.: Improved voicing decision using glottal activity features for statistical parametric speech synthesis. Dig. Signal Process. 71, 131–143 (2017)MathSciNetCrossRef Nagaraj Adiga, N., Khonglah, B., Mahadeva Prasanna, S.R.: Improved voicing decision using glottal activity features for statistical parametric speech synthesis. Dig. Signal Process. 71, 131–143 (2017)MathSciNetCrossRef
150.
Zurück zum Zitat Tiomkin, S., Malah, D., Shechtman, S., Kons, Z.: A hybrid text-to-speech system that combines concatenative and statistical synthesis units. IEEE Trans. Audio Speech Lang. Process. 19(5), 1278–1288 (2011)CrossRef Tiomkin, S., Malah, D., Shechtman, S., Kons, Z.: A hybrid text-to-speech system that combines concatenative and statistical synthesis units. IEEE Trans. Audio Speech Lang. Process. 19(5), 1278–1288 (2011)CrossRef
151.
Zurück zum Zitat Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T, Kitamura, T.: Speech parameter generation algorithms for HMM-based speech synthesis. In Proc: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’00, pp. 1315–1318 (2000) Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T, Kitamura, T.: Speech parameter generation algorithms for HMM-based speech synthesis. In Proc: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’00, pp. 1315–1318 (2000)
152.
Zurück zum Zitat Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., Oura, K.: Speech synthesis based on hidden Markov models. Proc. IEEE 101(5), 1234–1252 (2013)CrossRef Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., Oura, K.: Speech synthesis based on hidden Markov models. Proc. IEEE 101(5), 1234–1252 (2013)CrossRef
153.
Zurück zum Zitat Toda, S., Neubig, T., Sakti, G., Nakamura, S.: A postfilter to modify the modulation spectrum in hmm-based speech synthesis. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 290–294 (2014) Toda, S., Neubig, T., Sakti, G., Nakamura, S.: A postfilter to modify the modulation spectrum in hmm-based speech synthesis. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 290–294 (2014)
154.
Zurück zum Zitat Yang, C.Y., Ling, Z.H., Dai, L.R.: Unsupervised prosodic phrase boundary labeling of mandarin speech synthesis database using context-dependent hmm. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6875–6879 (2013) Yang, C.Y., Ling, Z.H., Dai, L.R.: Unsupervised prosodic phrase boundary labeling of mandarin speech synthesis database using context-dependent hmm. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6875–6879 (2013)
155.
Zurück zum Zitat Gu, H.Y., Lai, M.Y., Hong, W.S.: Speech synthesis using articulatory knowledge based hmm structure. In Proc: IEEE International Conference on Machine Learning and Cybernetics, pp. 371–376 (2014) Gu, H.Y., Lai, M.Y., Hong, W.S.: Speech synthesis using articulatory knowledge based hmm structure. In Proc: IEEE International Conference on Machine Learning and Cybernetics, pp. 371–376 (2014)
156.
Zurück zum Zitat Bollepalli, B., Urbain, J., Raitio, T., Gustafson, J., Cakmak, H.: A comparative evaluation of vocoding techniques for hmm-based laughter synthesis. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 255–259 (2014) Bollepalli, B., Urbain, J., Raitio, T., Gustafson, J., Cakmak, H.: A comparative evaluation of vocoding techniques for hmm-based laughter synthesis. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 255–259 (2014)
157.
Zurück zum Zitat Kawahara, H.: Straight-tempo: A universal tool to manipulate linguistic and para-linguistic speech information. In Proc: IEEE International Conference on Systems, Man, and Cybernetics, pp. 1620–1625 (1997) Kawahara, H.: Straight-tempo: A universal tool to manipulate linguistic and para-linguistic speech information. In Proc: IEEE International Conference on Systems, Man, and Cybernetics, pp. 1620–1625 (1997)
158.
Zurück zum Zitat Saoudi, S., Boucher, J.M., Le, A.: Guyader. A new efficient algorithm to compute the lsp parameters for speech coding. Signal Process. 28(2), 201–212 (1992)MATHCrossRef Saoudi, S., Boucher, J.M., Le, A.: Guyader. A new efficient algorithm to compute the lsp parameters for speech coding. Signal Process. 28(2), 201–212 (1992)MATHCrossRef
160.
Zurück zum Zitat Cai, M.Q., Ling, Z.H., Dai, L.R.: Statistical parametric speech synthesis using a hidden trajectory model. Speech Commun. 72, 149–159 (2015)CrossRef Cai, M.Q., Ling, Z.H., Dai, L.R.: Statistical parametric speech synthesis using a hidden trajectory model. Speech Commun. 72, 149–159 (2015)CrossRef
161.
Zurück zum Zitat Kawahara, H., Morise, M., Takahashi, T., Irino, T., Banno, H., Fujimura, O.: Group delay for acoustic event representation and its application for speech aperiodicity analysis. In Proc: EUSIPCO, pp. 2219–2223 (2007) Kawahara, H., Morise, M., Takahashi, T., Irino, T., Banno, H., Fujimura, O.: Group delay for acoustic event representation and its application for speech aperiodicity analysis. In Proc: EUSIPCO, pp. 2219–2223 (2007)
162.
Zurück zum Zitat Ramani, B., Christina, S.L., Rachel, G.A., Solomi, V.S., Nandwana, M.K., Prakash, A., Shanmugam, S.A., Krishnan, R., Kishore, S., Samudravijaya, K. and Vijayalakshmi, P., 2013. A common attribute based unified HTS framework for speech synthesis in indian languages. In Proc: 8th ISCA Workshop on Speech Synthesis, pp. 311-316 Ramani, B., Christina, S.L., Rachel, G.A., Solomi, V.S., Nandwana, M.K., Prakash, A., Shanmugam, S.A., Krishnan, R., Kishore, S., Samudravijaya, K. and Vijayalakshmi, P., 2013. A common attribute based unified HTS framework for speech synthesis in indian languages. In Proc: 8th ISCA Workshop on Speech Synthesis, pp. 311-316
163.
Zurück zum Zitat Kang, S., Qian, X., Meng, H.: Multi-distribution deep belief network for speech synthesis. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8012–8016 (2013) Kang, S., Qian, X., Meng, H.: Multi-distribution deep belief network for speech synthesis. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8012–8016 (2013)
164.
Zurück zum Zitat Ze, H., Andrew, S., Mike, S.: Statistical parametric speech synthesis using deep neural networks. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7962–7966 (2013) Ze, H., Andrew, S., Mike, S.: Statistical parametric speech synthesis using deep neural networks. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7962–7966 (2013)
165.
Zurück zum Zitat Ronanki, S., Reddy, S., Bollepalli, B., King, S.: DNN-based Speech Synthesis for Indian Languages from ASCII text. arXiv preprint arXiv:1608.05374 (2016) Ronanki, S., Reddy, S., Bollepalli, B., King, S.: DNN-based Speech Synthesis for Indian Languages from ASCII text. arXiv preprint arXiv:​1608.​05374 (2016)
166.
Zurück zum Zitat Hayashi, T., Yamamoto, R., Inoue, K., Yoshimura, T., Watanabe, S., Toda, T., Takeda, K., Zhang, Y., Tan, X.: Espnet-Tts: UNIFIED, REPRODUCIBLE, AND INTEGRATABLE OPEN SOURCE END-TO-END TEXT-TO-SPEECH TOOLkit. arXiv preprint arXiv:1910.10909 (2019) Hayashi, T., Yamamoto, R., Inoue, K., Yoshimura, T., Watanabe, S., Toda, T., Takeda, K., Zhang, Y., Tan, X.: Espnet-Tts: UNIFIED, REPRODUCIBLE, AND INTEGRATABLE OPEN SOURCE END-TO-END TEXT-TO-SPEECH TOOLkit. arXiv preprint arXiv:​1910.​10909 (2019)
167.
Zurück zum Zitat Sotelo, J., Mehri, S., Kumar, K., Santosy, J.F., Kastner, K., Courvillez, A., Bengio, Y.: Char2wav: End-to-end speech synthesis. In: ICLR (2017) Sotelo, J., Mehri, S., Kumar, K., Santosy, J.F., Kastner, K., Courvillez, A., Bengio, Y.: Char2wav: End-to-end speech synthesis. In: ICLR (2017)
168.
Zurück zum Zitat Nicolson, A., Paliwal, K.: Deep learning for minimum mean-square error approaches to speech enhancement. Speech Commun. 111, 44–55 (2019)CrossRef Nicolson, A., Paliwal, K.: Deep learning for minimum mean-square error approaches to speech enhancement. Speech Commun. 111, 44–55 (2019)CrossRef
169.
Zurück zum Zitat Chang, Y.: Evaluation of TTS systems in intelligibility and comprehension tasks: a case study of HTS-2008 and multisyn synthesizers. Comput. Linguist. Chin. Lang. Process. 17(3), 109–128 (2012) Chang, Y.: Evaluation of TTS systems in intelligibility and comprehension tasks: a case study of HTS-2008 and multisyn synthesizers. Comput. Linguist. Chin. Lang. Process. 17(3), 109–128 (2012)
170.
Zurück zum Zitat Benoît, C., Grice, M., Hazan, V.: The SUS test: a method for the assessment of text-to-speech synthesis intelligibility using Semantically Unpredictable Sentences. Speech Commun. 18, 381–392 (1996)CrossRef Benoît, C., Grice, M., Hazan, V.: The SUS test: a method for the assessment of text-to-speech synthesis intelligibility using Semantically Unpredictable Sentences. Speech Commun. 18, 381–392 (1996)CrossRef
171.
Zurück zum Zitat Benoit, C.: An intelligibility test using semantically unpredictable sentences: towards the quantification of linguistic complexity. Speech Commun. 9(4), 293–304 (1990)CrossRef Benoit, C.: An intelligibility test using semantically unpredictable sentences: towards the quantification of linguistic complexity. Speech Commun. 9(4), 293–304 (1990)CrossRef
172.
Zurück zum Zitat Bielefeld, N., Schinkel.: Training listeners for multi-channel audio quality evaluation in MUSHRA with a special focus on loop setting. In Proc: Eighth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6 (2016) Bielefeld, N., Schinkel.: Training listeners for multi-channel audio quality evaluation in MUSHRA with a special focus on loop setting. In Proc: Eighth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6 (2016)
173.
Zurück zum Zitat Kraft, S., Zölzer, U.: BeaqleJS: HTML5 and javascript based framework for the subjective evaluation of audio quality. In Proc: Linux Audio Conference, pp. 1–6 (2014) Kraft, S., Zölzer, U.: BeaqleJS: HTML5 and javascript based framework for the subjective evaluation of audio quality. In Proc: Linux Audio Conference, pp. 1–6 (2014)
174.
Zurück zum Zitat Latorre, J., Iwano, K., Furui, S.: Combining Gaussian mixture model with global variance term to improve the quality of an HMM-based polyglot speech synthesizer. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. IV–1241 (2007) Latorre, J., Iwano, K., Furui, S.: Combining Gaussian mixture model with global variance term to improve the quality of an HMM-based polyglot speech synthesizer. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. IV–1241 (2007)
175.
Zurück zum Zitat Lu, H., Ling, Z.H., Dai, L.R., Wang, R H.: Building hmm based unit selection speech synthesis system using synthetic speech naturalness evaluation score. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5352–5355 (2011) Lu, H., Ling, Z.H., Dai, L.R., Wang, R H.: Building hmm based unit selection speech synthesis system using synthetic speech naturalness evaluation score. In Proc: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5352–5355 (2011)
176.
Zurück zum Zitat Morton, H., Gunson, N., Marshall, D., McInnes, F., Ayres, A., Jack, M.: Usability assessment of text-to-speech synthesis for additional detail in an automated telephone banking system. Comput. Speech Lang. 25(2), 341–362 (1996)CrossRef Morton, H., Gunson, N., Marshall, D., McInnes, F., Ayres, A., Jack, M.: Usability assessment of text-to-speech synthesis for additional detail in an automated telephone banking system. Comput. Speech Lang. 25(2), 341–362 (1996)CrossRef
177.
Zurück zum Zitat Panda, S.P., Nayak, A.K.: Spectral Smoothening based Waveform Concatenation Technique for Speech Quality Enhancement in Text to Speech Systems. In proc: 3rd International Conference on Advanced Computing and Intelligent Engineering, vol 1. Springer, pp. 425–432 (2020) Panda, S.P., Nayak, A.K.: Spectral Smoothening based Waveform Concatenation Technique for Speech Quality Enhancement in Text to Speech Systems. In proc: 3rd International Conference on Advanced Computing and Intelligent Engineering, vol 1. Springer, pp. 425–432 (2020)
178.
Zurück zum Zitat Yang, S., Wu, Z., Xie, L.: On the training of DNN-based average voice model for speech synthesis. In Proc: Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016 Asia-Pacific, IEEE, pp. 1–6 (2016) Yang, S., Wu, Z., Xie, L.: On the training of DNN-based average voice model for speech synthesis. In Proc: Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016 Asia-Pacific, IEEE, pp. 1–6 (2016)
Metadaten
Titel
A survey on speech synthesis techniques in Indian languages
verfasst von
Soumya Priyadarsini Panda
Ajit Kumar Nayak
Satyananda Champati Rai
Publikationsdatum
28.05.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
Multimedia Systems / Ausgabe 4/2020
Print ISSN: 0942-4962
Elektronische ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-020-00659-4

Weitere Artikel der Ausgabe 4/2020

Multimedia Systems 4/2020 Zur Ausgabe

Neuer Inhalt