nach oben

International Journal of Speech Technology

Erschienen in:

06.06.2018

Determining the optimal conditions for signal reconstruction based on STFT magnitude

verfasst von: Raja Abdelmalek, Zied Mnasri, Faouzi Benzarti

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Signal reconstruction from a given sequence of short-time Fourier transform magnitude spectra without phase information has been a challenging topic since many years. The key issue is how to invert a sequence of overlapping magnitude spectrum containing minimal phase data to generate a real-valued signal free of audible artifacts. Yet, practical implementations are still not able to accurately do that. Based on an implementation of the classical RTISI method for a variety of signal types including both monophonic and polyphonic audio signals such as speech and music, this study aims to determine the optimal conditions required to reconstruct a signal from magnitude spectrum, to understand the relevance of the contribution of each parameter and to take care of the recording conditions of the original signal. Results prove that an optimal selection of the frame length, the shift rate and the iterations number allows enhancing the quality of the reconstructed signals.

Vorheriger Artikel Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command

Nächster Artikel Research on English pronunciation training based on intelligent speech recognition

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Adler, A., et al. (2016). A deep learning approach to block-based compressed sensing of images. arXiv preprint arXiv: 1606.01519.

Alsteris, L. D., & Paliwal, K. K. (2004). Importance of window shape for phase-only reconstruction of speech. In Proceedings of IEEE international conference on acoustics, speech, and signal processing, 2004 (ICASSP’04), Vol. 1, Montreal, QC, Canada.

Barnwell, I. I. I., Thomas, P., & Voiers, W. D. (1978). Objective measures for speech quality testing. The Journal of the Acoustical Society of America, 64(S1), S140.CrossRef

Beauregard, G. T., Harish, M., & Wyse, L. (2015). Single pass spectrogram inversion. In IEEE international conference on digital signal processing (DSP), Imperial College, London.

Beauregard, G. T., Zhu, X. & Wyse, L. (2005). An efficient algorithm for real-time spectrogram inversion. In Proceedings of the 8th international conference on digital audio effects, Madrid, Spain.

Boudraa, M., Boudraa, B., & Guerin, B. (2000). Twenty lists of ten Arabic sentences for assessment. Acta Acustica United with Acustica, 86(5), 870–882.

Candes, E. J., et al. (2015). Phase retrieval via matrix completion. SIAM Review, 57(2), 225–251.MathSciNetCrossRefMATH

Dias, U. V., Mascarenhas, J. E., & Dias, L. J. (2016). Compressive sensed speech recognition. International Journal of Signal Processing Systems, 4(6), 483–486.CrossRef

Dimolitsas, S., Corcoran, F. L., & Ravishankar, C. (1995). Dependence of opinion scores on listening sets used in degradation category rating assessments. IEEE Transactions on Speech and Audio Processing, 3(5), 421–424.CrossRef

Govind, D., & Prasanna, S. R. M. (2012). Epoch extraction from emotional speech. In International conference on signal processing and communications (SPCOM), Indian Institute of Science, Bangalore.

Govind, D., Prasanna, S. R. M., & Pati, D. (2011a). Epoch extraction in high pass filtered speech using hilbert envelope. In Twelfth annual conference of the international speech communication association, Florence, Italy.

Govind, D., Prasanna, S. R. M. & Yegnanarayana, B. (2011b). Neutral to target emotion conversion using source and suprasegmental information. In Twelfth annual conference of the international speech communication association.

Govind, D., Vishnu, R., & Pravena, D. (2015). Improved method for epoch estimation in telephonic speech signals using zero frequency filtering. In IEEE international conference on signal and image processing applications (ICSIPA), Kuala Lumpur, Malaysia.

Griffin, D., & Lim, J. (1984). Signal estimation from modified short-time Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2), 236–243.CrossRef

Gunawan, D., & Sen, D. (2010). Iterative phase estimation for the synthesis of separated sources from single-channel mixtures. IEEE Transactions on Signal Processing Letters, 17(5), 421–424.CrossRef

Halabi, N. & Wald, M. (2016). Phonetic inventory for an Arabic speech corpus, pp. 734–738.

Hayes, M., Lim, J., & Oppenheim, A. (1980). Signal reconstruction from phase or magnitude. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(6), 672–680.MathSciNetCrossRefMATH

ITU-T Recommendation P.861. (1996). Objective quality measurement of telephone-band (300–3400 Hz) speech codecs. Geneva: ITU-T Recommendation P.861.

Le Roux, J., Ono, N. & Sagayama, S. (2008). Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction. In Proceedings of SAPA@ INTERSPEECH, pp. 23–28.

Le Roux, J., & Vincent, E. (2013). Consistent Wiener filtering for audio source separation. IEEE Transactions on Signal Processing Letters, 20(3), 217–220.CrossRef

Loveimi, E., Ahadi, S. M. (2010). Objective evaluation of magnitude and phase only spectrum-based reconstruction of the speech signal. In 4th international symposium on communications, control and signal processing (ISCCSP), Limassol, Cyprus.

Maia, R., Akamine, M., & Gales, M. J. (2012). Complex cepstrum as phase information in statistical parametric speech synthesis. In IEEE international conference on acoustics, speech and signal processing (ICASSP), Kyoto, Japan.

Mowlaee, P., & Kulmer, J. (2015). Phase estimation in single-channel speech enhancement: Limits-potential. IEEE Transactions on Audio, Speech, and Language Processing, 23(8), 1283–1294.CrossRef

Mowlaee, P., Saeidi, R. & Martin, R. (2012). Phase estimation in single-channel source separation. In Proceedings of the INTERSPEECH, Singapore, pp. 1–4.

Mowlaee, P., Saeidi, R., & Stylanou, Y. (2014). INTERSPEECH 2014 special session: Phase importance in speech processing applications. In Proceedings of the 15th international conference on spoken language processing, The Pennsylvania State University, Pennsylvania.

Nakagawa, S., Asakawa, K., & Wang, L. (2007). Speaker recognition by combining MFCC and phase information spectrum 60.700 Hz. In Eighth annual conference of the international speech communication association, Francisco, CA, pp. 76–74.

Naylor, P. A., et al. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.CrossRef

Paliwal, K. K., & Alsteris, L. D. (2003). Usefulness of phase spectrum in human speech perception. In INTERSPEECH, Geneva, Switzerland.

Paliwal, K. K., & Alsteris, L. D. (2005). On the usefulness of STFT phase spectrum in human listening tests. Speech Communication, 45(2), 153–170.CrossRef

Pruša, Z., & Søndergaard, P. L. (2016). Real-time spectrogram inversion using phase gradient heap integration. In Proceedings of international conference on digital audio effects (DAFx-16), Edinburgh, Scotland.

Ramdas, V., Mishra, D. & Gorthi, S. S. (2015). Speech coding and enhancement using quantized compressive sensing measurements. In IEEE international conference on Signal processing, informatics, communication and energy systems (SPICES), National Institute of Technology Calicut, Kerala, India.

Shawky, H., et al. (2017). Efficient compression and reconstruction of speech signals using compressed sensing. International Journal of Speech Technology, 20(4), 851–857.CrossRef

Shi, G., Shanechi, M. M., & Aarabi, P. (2006). On the importance of phase in human speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1867–1874.CrossRef

Stanković, S., Orović, I. & Stanković, L. (2014). An automated signal reconstruction method based on analysis of compressive sensed signals in noisy environment. Signal Processing, 104, 43–50.CrossRef

Sturmel, N., & Daudet, L. (2011). Signal reconstruction from STFT magnitude: A state of the art. In International conference on digital audio effects (DAFx), Paris, France.

Thorpe, L. A., & Shelton, B. (1993). Subjective test methodology: MOS vs. DMOS in evaluation of speech coding algorithms. In IEEE speech coding workshop, pp. 73–74, St. Adele, QC, Canada.

Van Hove, P., et al. (1983). Signal reconstruction from signed Fourier transform magnitude. IEEE Transactions on Acoustics, Speech, and Signal Processing, 31(5), 1286–1293.CrossRef

Voiers, W. D. (1976). Methods of predicting user acceptance of voice communication systems. No. D-76-001-U. Austin, TX: Dynastat Inc.

Wang, D., & Lim, J. (1982). The unimportance of phase in speech enhancement. IEEE Transactions on Acoustics, Speech, and Signal Processing, 30(4), 679–681.CrossRef

Wang, J.-C., et al. (2016). Compressive sensing-based speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(11), 2122–2131.CrossRef

Watanabe, M. K., & Mowlaee, P. (2013). Iterative sinusoidal-based partial phase reconstruction in single-channel source separation. In INTERSPEECH, Portland, USA.

Wonho, Y. (1999). Enhanced modified bark spectral distortion (EMBSD): An objective speech quality measure based on audible distortion and cognition model. Thesis (PhD), Temple University, Source DAI-B 60/07, p. 3479, January 2000, p. 163.

Zen, H., Tokuda, K. & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51(11), 1039–1064.CrossRef

Zhu, X., Gerald, T., Beauregard, & Wyse, L. L. (2006). Real-time iterative spectrum inversion with look-ahead. In IEEE international conference on multimedia and expo, Toronto, ON, Canada.

Zhu, X., Gerald, T., Beauregard, & Wyse, L. L. (2007). Real-time signal estimation from modified short-time Fourier transform magnitude spectra. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1645–1653.CrossRef

Titel: Determining the optimal conditions for signal reconstruction based on STFT magnitude
verfasst von: Raja Abdelmalek
Zied Mnasri
Faouzi Benzarti
Publikationsdatum: 06.06.2018
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 3/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-018-9522-9

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Benedikt Bonnmann von Adesso/© Adesso, Teilzeit/© Fokussiert / stock.adobe.com, Hans-Joachim Lefeld/© Lucht Probst Associates GmbH, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2018

Research on English pronunciation training based on intelligent speech recognition

Spoken language recognition using a new conditional cascade method to combine acoustic and phonetic results

Neural network and GMM based feature mappings for consonant–vowel recognition in emotional environment

Multi-style speaker recognition database in practical conditions

Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command

Robust emotion recognition from speech: Gamma tone features and models

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.