nach oben

International Journal of Speech Technology

Erschienen in:

05.11.2021

A method for constructing Korean spontaneous spoken language corpus based on an imitation of abbreviated and transformed particles

verfasst von: Hyok-Chol Ri, Chol Kim, Mok-Ran Jo

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In the paper, we proposed a method of constructing a language corpus based on the imitation of abbreviated and transformed particles that are distinctive feature of Korean spontaneous spoken language. Since it is not practical to train a spoken-style model using numerous spoken transcripts, the proposed approach generates a spoken-style text from a written-style one such as newspapers, based on characteristics of pronouncing variations, dependent on spoken styles, of typical particles. This method for constructing spoken-style text is based on statistical analysis on particles that play same function in both of written and spoken language. We analyze grammatical functions and pronouncing features of particles that distinguish between written and spoken language, and generate spoken-style text from written-style text by imitating typical abbreviated and transformed particles which play same function. Abbreviated and transformed particles to be imitated have proper and typical pronouncing features of spoken language. We replace particles with abbreviated and transformed particles in written-style text according to correspondence of written particles to spoken ones, which results in spoken-style text. The language model, which is trained from spoken-style text imitating abbreviated and transformed particles, significantly improved a word error rate (WER) on spontaneous speech.

Vorheriger Artikel Handling emotional speech: a prosody based data augmentation technique for improving neutral speech trained ASR systems

Nächster Artikel Sparse representation and reproduction of speech signals in complex Fourier basis

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Cettolo, M., Brugnara, F., & Federico, M. (2004). Advances in the automatic transcription of lectures. In Proc. ICASSP (pp. 769–772).

Furui, S., Maekawa, K., & Isahara, H. (2000). Toward the realization of spontaneous speech recognition—Introduction of a Japanese priority program and preliminary results. In Proc. ICSLP (pp. 518–521).

Garofolo, J., Laprun, C., & Fiscus, J. (2004). The rich transcription 2004 spring meeting recognition evaluation. In Proc. ICASSP Meeting Recognition Workshop.

Glass, J., Hazen, T., Cyphers, S., Malioutov, I., Huynh, D., & Barzilay, R. (2007). Recent progress in the MIT spoken lecture processing project. In Proc. Eurospeech (pp. 2553–2556).

Hain, T., Woodland, P., Niesler, T., & Whittaker, E. (1999). The 1998 HTK system for transcription of conversational telephone speech. In Proc. ICASSP (pp. 57–60).

Hyokchol, R. (2019). A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system. International Journal of Speech Technology. https://doi.org/10.1007/s10772-019-09637-2CrossRef

Kawahara, T., Nemoto, Y., & Akita, Y. (2008). Automatic lecture transcription by exploiting presentation slide information for language model adaptation. In Proc. ICASSP (pp. 4929–4932).

Lamel, L., Adda, G., Bilinski, E., & Gauvain, J. (2005). Transcribing lectures and seminars. In Proc. Eurospeech (pp. 1657–1660).

Lamel, L., Gauvain, J.L., Adda, G., Barras, C., Bilinski, E., et al. (2007). The LIMSI 2006 TC-STAR EPPS transcription systems. In Proc. ICASSP (pp. 997–1000).

Leeuwis, E., Federico, M., & Cettolo, M. (2003). Language modeling and transcription of the TED corpus lectures. In Proc. ICASSP (pp. 232–235).

Loof, J., Bisani, M., Gollan, C., Heigold, G., Hoffmeister, B., Plahl, C., Schluter, R., & Ney, H. (2006). The 2006 RWTH parliamentary speeches transcription system. In Proc. ICSLP (pp. 105–108).

Masumura, R., Hahm, S., & Ito, A. (2011). Training a language model using web data for large vocabulary Japanese spontaneous speech recognition. In Proc. Interspeech (pp. 1465–1468).

Prasad, R., Nguyen, L., Schwartz, R., & Makhoul, J. (2002). Automatic transcription of courtroom speech. In Proc. ICSLP (pp. 1745–1748).

Ramabhadran, B., Siohan, O., Mangu, L., Zweig, G., et al. (2006). The IBM 2006 speech transcription system for European parliamentary speeches. In Proc. ICSLP (pp. 1225–1228).

Renals, S., Hain, T., & Bourlard, H. (2007). Recognition and understanding of meetings: The AMI and AMIDA projects. In Proc. ASRU (pp. 238–247).

Stolcke, A. (2002). SRILM—an extensible language modeling toolkit. In Proc. Int. Conf. on Spoken Language Processing (pp. 901–904). Colorado: Denver.

Xinhui, H., Shigeki, M., Chori, H., & Hideki, K. (2013). Collecting colloquial and spontaneous-like sentences from web resources for constructing Chinese language models of speech recognition. Journal of Information Processing, 21(2), 168–175.CrossRef

Young, S., et al. (2006). The HTK Book Version 3.4. Cambridge: Cambridge University.

Zavaliagkos, G., McDonough, J., Miller, D., El-Jaroudi, et al. (1998). The BBN Byblos 1997 large vocabulary conversational speech recognition system. in Proc. ICASSP (pp. 905–908.)

Titel: A method for constructing Korean spontaneous spoken language corpus based on an imitation of abbreviated and transformed particles
verfasst von: Hyok-Chol Ri
Chol Kim
Mok-Ran Jo
Publikationsdatum: 05.11.2021
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 1/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-021-09937-6

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Customer Experience/© © oatawa / Getty Images / iStock, Erdgasmotor 1.5 TGI evo von Volkswagen/© Volkswagen AG, Thorsten Mücke/© Alexandra Bachran, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2022

Handling emotional speech: a prosody based data augmentation technique for improving neutral speech trained ASR systems

Exploring single channel speech separation for short-time text-dependent speaker verification

Information hiding in proposed 10.6 kbps CS-ACELP based speech codec using Quantization Index Modulation

Exploiting variable length segments with coarticulation effect in online speech recognition based on deep bidirectional recurrent neural network and context-sensitive segment

A novel semantic and logical-based approach integrating RTE technique in the Arabic question–answering

Normalized approximate descent used for spike based automatic bird species recognition system

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.