nach oben

Journal of Intelligent Information Systems

Erschienen in:

11.02.2019

Effect of speech segment samples selection in stutter block detection and remediation

verfasst von: Pierre Arbajian, Ayman Hajja, Zbigniew W. Raś, Alicja A. Wieczorkowska

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Speech remediation by identifying those segments which take away from the substance of the speech content can be performed by identifying portions of speech which may be deleted without diminishing from the speech quality, but rather improving the speech. Speech remediation is important when the speech is disfluent as in the case of stuttered speech. We describe two stuttered speech remediation approaches based on the identification of those segments of speech which, when removed, would enhance speech understandability in terms of both, speech content and speech flow. We adopted two approaches, in the first approach we identify and extract speech segments that have weak semantic significance due to their low relative intensity; we subsequently trained several classifiers using a large set of inherent and derived features which provided a second layer filtering stage. The first approach was effective but required a two step process. In order to streamline the detection and remediation process, we introduced an enhancement which expands the realm of disfluency detection to include a broader range of speech anomalies by eliminating the need for a domain-dependent pre-qualification stage. The results of the new approach offer improved accuracy with enhanced simplicity, flexibility and extensibility.

Vorheriger Artikel Density-based unsupervised ensemble learning methods for time series forecasting of aggregated or clustered electricity consumption

Nächster Artikel CBSSD: community-based semantic subgroup discovery

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Ai, O.C., Hariharan, M., Yaacob, S., Chee, L.S. (2012). Classification of speech dysfluencies with MFCC and LPCC features, (Vol. 39 pp. 2157–2165).

Arbajian, P., Hajja, A., Raś, Z.W., Wieczorkowska, A.A. (2017). Segment-Removal based stuttered speech remediation. In International workshop on new frontiers in mining complex patterns (pp. 16–34). Cham: Springer.CrossRef

Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Proceedings of the institute of phonetic sciences, vol. 17, no. 1193.

Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341–345.

Chee, L.S., Ai, O.C., Yaacob, S. (2009). Overview of automatic stuttering recognition system. In Proc. International conference on man-machine systems (pp. 1–6) no. october, Batu Ferringhi, Penang Malaysia.

Czyzewski, A., Kaczmarek, A., Kostek, B. (2003). Intelligent processing of stuttered speech. J. Intell. IN Inf. Syst., 21, 143–171.CrossRef

Esmaili, I., Dabanloo, N.J., Vali, M. (2016). Automatic classification of speech dysfluencies in continuous speech based on similarity measures and morphological image processing tools. Biomedical Signal Processing and Control, 23, 104–114.CrossRef

Fook, C.Y., Muthusamy, H., Chee, L.S., Yaacob, S.B., Adom, A.H.B. (2013). Comparison of speech parameterization techniques for the classification of speech disfluencies. In Turkish journal of electrical engineering & computer sciences, vol. 21, no. Sup. 1.CrossRef

Hariharan, M., Chee, L.S., Ai, O.C., Yaacob, S. (2012). Classification of speech dysfluencies using LPC based parameterization techniques, (Vol. 36 pp. 1821–1830).CrossRef

Honal, M., & Schultz, T. (2003). Correction of disfluencies in spontaneous speech using a noisy-channel approach. In Interspeech.

Honal, M., & Schultz, T. (2005). Automatic disfluency removal on recognized spontaneous Speech-Rapid adaptation to speaker dependent disfluencies. In ICASSP (no. 1, pp. 969–972).

Howell, P., Davis, S., Bartrip, J. (2009). The UCLASS archive of stuttered speech, (Vol. 52 pp. 556–569).

Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1–26. https://doi.org/10.18637/jss.v028.i05.CrossRef

KM, R.K., & Ganesan, S. (2011). Comparison of multidimensional MFCC feature vectors for objective assessment of stuttered disfluencies. International Journal of Advanced Networking Applications, 2(05), 854–860.

Lease, M., Johnson, M., Charniak, E. (2006). Recognizing disfluencies in conversational speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1566–1573.CrossRef

Liu, Y., Shriberg, E., Stolcke, A., Harper, M.P. (2005). Comparing HMM, maximum entropy, and conditional random fields for disfluency detection. In Interspeech (pp. 3313–3316).

Raghavendra, M., & Rajeswari, P. (2016). Determination of disfluencies associated in stuttered speech using MFCC feature extraction, (Vol. 4 pp. 2321–9939).

Ravikumar, K.M., Rajagopal, R., Nagaraj, H.C. (2009). An approach for objective assessment of stuttered speech using MFCC. In The international congress for global science and technology (p. 19).

Ribeiro, M.T., Singh, S., Guestrin, C. (2016). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). ACM.

Świetlicka, I., Kuniszyk-Jóźkowiak, W., Smołka, E. (2013). Hierarchical ANN system for stuttering identification, (Vol. 27 pp. 228–242).CrossRef

The H2O.ai team. (2015). h2o: Python Interface for H2O Python package version 3.1.0.99999, https://github.com/h2oai/h2o-3.

Winkelmann, R., & Raess, G. (2014). Introducing a web application for labeling, visualizing speech and correcting derived speech signals. In LREC (pp. 4129–4133).

Titel: Effect of speech segment samples selection in stutter block detection and remediation
verfasst von: Pierre Arbajian
Ayman Hajja
Zbigniew W. Raś
Alicja A. Wieczorkowska
Publikationsdatum: 11.02.2019
Verlag: Springer US
Erschienen in: Journal of Intelligent Information Systems / Ausgabe 2/2019
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI: https://doi.org/10.1007/s10844-019-00546-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2019

Rank correlated subgroup discovery

CBSSD: community-based semantic subgroup discovery

Density-based unsupervised ensemble learning methods for time series forecasting of aggregated or clustered electricity consumption

Ensembles of density estimators for positive-unlabeled learning

StreamPref: a query language for temporal conditional preferences on data streams

Database of speech and facial expressions recorded with optimized face motion capture settings