Skip to main content
Erschienen in: International Journal of Speech Technology 3/2018

06.06.2018

Determining the optimal conditions for signal reconstruction based on STFT magnitude

verfasst von: Raja Abdelmalek, Zied Mnasri, Faouzi Benzarti

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Signal reconstruction from a given sequence of short-time Fourier transform magnitude spectra without phase information has been a challenging topic since many years. The key issue is how to invert a sequence of overlapping magnitude spectrum containing minimal phase data to generate a real-valued signal free of audible artifacts. Yet, practical implementations are still not able to accurately do that. Based on an implementation of the classical RTISI method for a variety of signal types including both monophonic and polyphonic audio signals such as speech and music, this study aims to determine the optimal conditions required to reconstruct a signal from magnitude spectrum, to understand the relevance of the contribution of each parameter and to take care of the recording conditions of the original signal. Results prove that an optimal selection of the frame length, the shift rate and the iterations number allows enhancing the quality of the reconstructed signals.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Adler, A., et al. (2016). A deep learning approach to block-based compressed sensing of images. arXiv preprint arXiv: 1606.01519. Adler, A., et al. (2016). A deep learning approach to block-based compressed sensing of images. arXiv preprint arXiv: 1606.01519.
Zurück zum Zitat Alsteris, L. D., & Paliwal, K. K. (2004). Importance of window shape for phase-only reconstruction of speech. In Proceedings of IEEE international conference on acoustics, speech, and signal processing, 2004 (ICASSP’04), Vol. 1, Montreal, QC, Canada. Alsteris, L. D., & Paliwal, K. K. (2004). Importance of window shape for phase-only reconstruction of speech. In Proceedings of IEEE international conference on acoustics, speech, and signal processing, 2004 (ICASSP’04), Vol. 1, Montreal, QC, Canada.
Zurück zum Zitat Barnwell, I. I. I., Thomas, P., & Voiers, W. D. (1978). Objective measures for speech quality testing. The Journal of the Acoustical Society of America, 64(S1), S140.CrossRef Barnwell, I. I. I., Thomas, P., & Voiers, W. D. (1978). Objective measures for speech quality testing. The Journal of the Acoustical Society of America, 64(S1), S140.CrossRef
Zurück zum Zitat Beauregard, G. T., Harish, M., & Wyse, L. (2015). Single pass spectrogram inversion. In IEEE international conference on digital signal processing (DSP), Imperial College, London. Beauregard, G. T., Harish, M., & Wyse, L. (2015). Single pass spectrogram inversion. In IEEE international conference on digital signal processing (DSP), Imperial College, London.
Zurück zum Zitat Beauregard, G. T., Zhu, X. & Wyse, L. (2005). An efficient algorithm for real-time spectrogram inversion. In Proceedings of the 8th international conference on digital audio effects, Madrid, Spain. Beauregard, G. T., Zhu, X. & Wyse, L. (2005). An efficient algorithm for real-time spectrogram inversion. In Proceedings of the 8th international conference on digital audio effects, Madrid, Spain.
Zurück zum Zitat Boudraa, M., Boudraa, B., & Guerin, B. (2000). Twenty lists of ten Arabic sentences for assessment. Acta Acustica United with Acustica, 86(5), 870–882. Boudraa, M., Boudraa, B., & Guerin, B. (2000). Twenty lists of ten Arabic sentences for assessment. Acta Acustica United with Acustica, 86(5), 870–882.
Zurück zum Zitat Dias, U. V., Mascarenhas, J. E., & Dias, L. J. (2016). Compressive sensed speech recognition. International Journal of Signal Processing Systems, 4(6), 483–486.CrossRef Dias, U. V., Mascarenhas, J. E., & Dias, L. J. (2016). Compressive sensed speech recognition. International Journal of Signal Processing Systems, 4(6), 483–486.CrossRef
Zurück zum Zitat Dimolitsas, S., Corcoran, F. L., & Ravishankar, C. (1995). Dependence of opinion scores on listening sets used in degradation category rating assessments. IEEE Transactions on Speech and Audio Processing, 3(5), 421–424.CrossRef Dimolitsas, S., Corcoran, F. L., & Ravishankar, C. (1995). Dependence of opinion scores on listening sets used in degradation category rating assessments. IEEE Transactions on Speech and Audio Processing, 3(5), 421–424.CrossRef
Zurück zum Zitat Govind, D., & Prasanna, S. R. M. (2012). Epoch extraction from emotional speech. In International conference on signal processing and communications (SPCOM), Indian Institute of Science, Bangalore. Govind, D., & Prasanna, S. R. M. (2012). Epoch extraction from emotional speech. In International conference on signal processing and communications (SPCOM), Indian Institute of Science, Bangalore.
Zurück zum Zitat Govind, D., Prasanna, S. R. M., & Pati, D. (2011a). Epoch extraction in high pass filtered speech using hilbert envelope. In Twelfth annual conference of the international speech communication association, Florence, Italy. Govind, D., Prasanna, S. R. M., & Pati, D. (2011a). Epoch extraction in high pass filtered speech using hilbert envelope. In Twelfth annual conference of the international speech communication association, Florence, Italy.
Zurück zum Zitat Govind, D., Prasanna, S. R. M. & Yegnanarayana, B. (2011b). Neutral to target emotion conversion using source and suprasegmental information. In Twelfth annual conference of the international speech communication association. Govind, D., Prasanna, S. R. M. & Yegnanarayana, B. (2011b). Neutral to target emotion conversion using source and suprasegmental information. In Twelfth annual conference of the international speech communication association.
Zurück zum Zitat Govind, D., Vishnu, R., & Pravena, D. (2015). Improved method for epoch estimation in telephonic speech signals using zero frequency filtering. In IEEE international conference on signal and image processing applications (ICSIPA), Kuala Lumpur, Malaysia. Govind, D., Vishnu, R., & Pravena, D. (2015). Improved method for epoch estimation in telephonic speech signals using zero frequency filtering. In IEEE international conference on signal and image processing applications (ICSIPA), Kuala Lumpur, Malaysia.
Zurück zum Zitat Griffin, D., & Lim, J. (1984). Signal estimation from modified short-time Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2), 236–243.CrossRef Griffin, D., & Lim, J. (1984). Signal estimation from modified short-time Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2), 236–243.CrossRef
Zurück zum Zitat Gunawan, D., & Sen, D. (2010). Iterative phase estimation for the synthesis of separated sources from single-channel mixtures. IEEE Transactions on Signal Processing Letters, 17(5), 421–424.CrossRef Gunawan, D., & Sen, D. (2010). Iterative phase estimation for the synthesis of separated sources from single-channel mixtures. IEEE Transactions on Signal Processing Letters, 17(5), 421–424.CrossRef
Zurück zum Zitat Halabi, N. & Wald, M. (2016). Phonetic inventory for an Arabic speech corpus, pp. 734–738. Halabi, N. & Wald, M. (2016). Phonetic inventory for an Arabic speech corpus, pp. 734–738.
Zurück zum Zitat Hayes, M., Lim, J., & Oppenheim, A. (1980). Signal reconstruction from phase or magnitude. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(6), 672–680.MathSciNetCrossRefMATH Hayes, M., Lim, J., & Oppenheim, A. (1980). Signal reconstruction from phase or magnitude. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(6), 672–680.MathSciNetCrossRefMATH
Zurück zum Zitat ITU-T Recommendation P.861. (1996). Objective quality measurement of telephone-band (300–3400 Hz) speech codecs. Geneva: ITU-T Recommendation P.861. ITU-T Recommendation P.861. (1996). Objective quality measurement of telephone-band (300–3400 Hz) speech codecs. Geneva: ITU-T Recommendation P.861.
Zurück zum Zitat Le Roux, J., Ono, N. & Sagayama, S. (2008). Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction. In Proceedings of SAPA@ INTERSPEECH, pp. 23–28. Le Roux, J., Ono, N. & Sagayama, S. (2008). Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction. In Proceedings of SAPA@ INTERSPEECH, pp. 23–28.
Zurück zum Zitat Le Roux, J., & Vincent, E. (2013). Consistent Wiener filtering for audio source separation. IEEE Transactions on Signal Processing Letters, 20(3), 217–220.CrossRef Le Roux, J., & Vincent, E. (2013). Consistent Wiener filtering for audio source separation. IEEE Transactions on Signal Processing Letters, 20(3), 217–220.CrossRef
Zurück zum Zitat Loveimi, E., Ahadi, S. M. (2010). Objective evaluation of magnitude and phase only spectrum-based reconstruction of the speech signal. In 4th international symposium on communications, control and signal processing (ISCCSP), Limassol, Cyprus. Loveimi, E., Ahadi, S. M. (2010). Objective evaluation of magnitude and phase only spectrum-based reconstruction of the speech signal. In 4th international symposium on communications, control and signal processing (ISCCSP), Limassol, Cyprus.
Zurück zum Zitat Maia, R., Akamine, M., & Gales, M. J. (2012). Complex cepstrum as phase information in statistical parametric speech synthesis. In IEEE international conference on acoustics, speech and signal processing (ICASSP), Kyoto, Japan. Maia, R., Akamine, M., & Gales, M. J. (2012). Complex cepstrum as phase information in statistical parametric speech synthesis. In IEEE international conference on acoustics, speech and signal processing (ICASSP), Kyoto, Japan.
Zurück zum Zitat Mowlaee, P., & Kulmer, J. (2015). Phase estimation in single-channel speech enhancement: Limits-potential. IEEE Transactions on Audio, Speech, and Language Processing, 23(8), 1283–1294.CrossRef Mowlaee, P., & Kulmer, J. (2015). Phase estimation in single-channel speech enhancement: Limits-potential. IEEE Transactions on Audio, Speech, and Language Processing, 23(8), 1283–1294.CrossRef
Zurück zum Zitat Mowlaee, P., Saeidi, R. & Martin, R. (2012). Phase estimation in single-channel source separation. In Proceedings of the INTERSPEECH, Singapore, pp. 1–4. Mowlaee, P., Saeidi, R. & Martin, R. (2012). Phase estimation in single-channel source separation. In Proceedings of the INTERSPEECH, Singapore, pp. 1–4.
Zurück zum Zitat Mowlaee, P., Saeidi, R., & Stylanou, Y. (2014). INTERSPEECH 2014 special session: Phase importance in speech processing applications. In Proceedings of the 15th international conference on spoken language processing, The Pennsylvania State University, Pennsylvania. Mowlaee, P., Saeidi, R., & Stylanou, Y. (2014). INTERSPEECH 2014 special session: Phase importance in speech processing applications. In Proceedings of the 15th international conference on spoken language processing, The Pennsylvania State University, Pennsylvania.
Zurück zum Zitat Nakagawa, S., Asakawa, K., & Wang, L. (2007). Speaker recognition by combining MFCC and phase information spectrum 60.700 Hz. In Eighth annual conference of the international speech communication association, Francisco, CA, pp. 76–74. Nakagawa, S., Asakawa, K., & Wang, L. (2007). Speaker recognition by combining MFCC and phase information spectrum 60.700 Hz. In Eighth annual conference of the international speech communication association, Francisco, CA, pp. 76–74.
Zurück zum Zitat Naylor, P. A., et al. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.CrossRef Naylor, P. A., et al. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.CrossRef
Zurück zum Zitat Paliwal, K. K., & Alsteris, L. D. (2003). Usefulness of phase spectrum in human speech perception. In INTERSPEECH, Geneva, Switzerland. Paliwal, K. K., & Alsteris, L. D. (2003). Usefulness of phase spectrum in human speech perception. In INTERSPEECH, Geneva, Switzerland.
Zurück zum Zitat Paliwal, K. K., & Alsteris, L. D. (2005). On the usefulness of STFT phase spectrum in human listening tests. Speech Communication, 45(2), 153–170.CrossRef Paliwal, K. K., & Alsteris, L. D. (2005). On the usefulness of STFT phase spectrum in human listening tests. Speech Communication, 45(2), 153–170.CrossRef
Zurück zum Zitat Pruša, Z., & Søndergaard, P. L. (2016). Real-time spectrogram inversion using phase gradient heap integration. In Proceedings of international conference on digital audio effects (DAFx-16), Edinburgh, Scotland. Pruša, Z., & Søndergaard, P. L. (2016). Real-time spectrogram inversion using phase gradient heap integration. In Proceedings of international conference on digital audio effects (DAFx-16), Edinburgh, Scotland.
Zurück zum Zitat Ramdas, V., Mishra, D. & Gorthi, S. S. (2015). Speech coding and enhancement using quantized compressive sensing measurements. In IEEE international conference on Signal processing, informatics, communication and energy systems (SPICES), National Institute of Technology Calicut, Kerala, India. Ramdas, V., Mishra, D. & Gorthi, S. S. (2015). Speech coding and enhancement using quantized compressive sensing measurements. In IEEE international conference on Signal processing, informatics, communication and energy systems (SPICES), National Institute of Technology Calicut, Kerala, India.
Zurück zum Zitat Shawky, H., et al. (2017). Efficient compression and reconstruction of speech signals using compressed sensing. International Journal of Speech Technology, 20(4), 851–857.CrossRef Shawky, H., et al. (2017). Efficient compression and reconstruction of speech signals using compressed sensing. International Journal of Speech Technology, 20(4), 851–857.CrossRef
Zurück zum Zitat Shi, G., Shanechi, M. M., & Aarabi, P. (2006). On the importance of phase in human speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1867–1874.CrossRef Shi, G., Shanechi, M. M., & Aarabi, P. (2006). On the importance of phase in human speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1867–1874.CrossRef
Zurück zum Zitat Stanković, S., Orović, I. & Stanković, L. (2014). An automated signal reconstruction method based on analysis of compressive sensed signals in noisy environment. Signal Processing, 104, 43–50.CrossRef Stanković, S., Orović, I. & Stanković, L. (2014). An automated signal reconstruction method based on analysis of compressive sensed signals in noisy environment. Signal Processing, 104, 43–50.CrossRef
Zurück zum Zitat Sturmel, N., & Daudet, L. (2011). Signal reconstruction from STFT magnitude: A state of the art. In International conference on digital audio effects (DAFx), Paris, France. Sturmel, N., & Daudet, L. (2011). Signal reconstruction from STFT magnitude: A state of the art. In International conference on digital audio effects (DAFx), Paris, France.
Zurück zum Zitat Thorpe, L. A., & Shelton, B. (1993). Subjective test methodology: MOS vs. DMOS in evaluation of speech coding algorithms. In IEEE speech coding workshop, pp. 73–74, St. Adele, QC, Canada. Thorpe, L. A., & Shelton, B. (1993). Subjective test methodology: MOS vs. DMOS in evaluation of speech coding algorithms. In IEEE speech coding workshop, pp. 73–74, St. Adele, QC, Canada.
Zurück zum Zitat Van Hove, P., et al. (1983). Signal reconstruction from signed Fourier transform magnitude. IEEE Transactions on Acoustics, Speech, and Signal Processing, 31(5), 1286–1293.CrossRef Van Hove, P., et al. (1983). Signal reconstruction from signed Fourier transform magnitude. IEEE Transactions on Acoustics, Speech, and Signal Processing, 31(5), 1286–1293.CrossRef
Zurück zum Zitat Voiers, W. D. (1976). Methods of predicting user acceptance of voice communication systems. No. D-76-001-U. Austin, TX: Dynastat Inc. Voiers, W. D. (1976). Methods of predicting user acceptance of voice communication systems. No. D-76-001-U. Austin, TX: Dynastat Inc.
Zurück zum Zitat Wang, D., & Lim, J. (1982). The unimportance of phase in speech enhancement. IEEE Transactions on Acoustics, Speech, and Signal Processing, 30(4), 679–681.CrossRef Wang, D., & Lim, J. (1982). The unimportance of phase in speech enhancement. IEEE Transactions on Acoustics, Speech, and Signal Processing, 30(4), 679–681.CrossRef
Zurück zum Zitat Wang, J.-C., et al. (2016). Compressive sensing-based speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(11), 2122–2131.CrossRef Wang, J.-C., et al. (2016). Compressive sensing-based speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(11), 2122–2131.CrossRef
Zurück zum Zitat Watanabe, M. K., & Mowlaee, P. (2013). Iterative sinusoidal-based partial phase reconstruction in single-channel source separation. In INTERSPEECH, Portland, USA. Watanabe, M. K., & Mowlaee, P. (2013). Iterative sinusoidal-based partial phase reconstruction in single-channel source separation. In INTERSPEECH, Portland, USA.
Zurück zum Zitat Wonho, Y. (1999). Enhanced modified bark spectral distortion (EMBSD): An objective speech quality measure based on audible distortion and cognition model. Thesis (PhD), Temple University, Source DAI-B 60/07, p. 3479, January 2000, p. 163. Wonho, Y. (1999). Enhanced modified bark spectral distortion (EMBSD): An objective speech quality measure based on audible distortion and cognition model. Thesis (PhD), Temple University, Source DAI-B 60/07, p. 3479, January 2000, p. 163.
Zurück zum Zitat Zen, H., Tokuda, K. & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51(11), 1039–1064.CrossRef Zen, H., Tokuda, K. & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51(11), 1039–1064.CrossRef
Zurück zum Zitat Zhu, X., Gerald, T., Beauregard, & Wyse, L. L. (2006). Real-time iterative spectrum inversion with look-ahead. In IEEE international conference on multimedia and expo, Toronto, ON, Canada. Zhu, X., Gerald, T., Beauregard, & Wyse, L. L. (2006). Real-time iterative spectrum inversion with look-ahead. In IEEE international conference on multimedia and expo, Toronto, ON, Canada.
Zurück zum Zitat Zhu, X., Gerald, T., Beauregard, & Wyse, L. L. (2007). Real-time signal estimation from modified short-time Fourier transform magnitude spectra. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1645–1653.CrossRef Zhu, X., Gerald, T., Beauregard, & Wyse, L. L. (2007). Real-time signal estimation from modified short-time Fourier transform magnitude spectra. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1645–1653.CrossRef
Metadaten
Titel
Determining the optimal conditions for signal reconstruction based on STFT magnitude
verfasst von
Raja Abdelmalek
Zied Mnasri
Faouzi Benzarti
Publikationsdatum
06.06.2018
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-9522-9

Weitere Artikel der Ausgabe 3/2018

International Journal of Speech Technology 3/2018 Zur Ausgabe

Neuer Inhalt