Skip to main content

2016 | OriginalPaper | Buchkapitel

Reversible Speech De-identification Using Parametric Transformations and Watermarking

verfasst von : Aitor Valdivielso, Daniel Erro, Inma Hernaez

Erschienen in: Advances in Speech and Language Technologies for Iberian Languages

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents a system capable of de-identifying speech signals in order to hide and protect the identity of the speaker. It applies a relatively simple yet effective transformation of the pitch and the frequency axis of the spectral envelope thanks to a flexible wideband harmonic model. Moreover, it inserts the parameters of the transformation in the signal by means of watermarking techniques, thus enabling re-identification. Our experiments show that for adequate modification factors its performance is satisfactory in terms of quality, de-identification degree and naturalness. The limitations due to the signal processing framework are discussed as well.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Voice conversion can be seen as a particular case of voice transformation where there is a specific target speaker.
 
2
PESQ predicts the mean opinion score of a distorted signal in comparison with its original clean version.
 
Literatur
1.
Zurück zum Zitat Ribaric, S., Ariyaeeinia, A., Pavesic, N.: De-identification for privacy protection in multimedia content: a survey. Signal Process. Image Commun. 47, 131–151 (2016)CrossRef Ribaric, S., Ariyaeeinia, A., Pavesic, N.: De-identification for privacy protection in multimedia content: a survey. Signal Process. Image Commun. 47, 131–151 (2016)CrossRef
2.
Zurück zum Zitat Jin, Q., Toth, A.R., Schultz, T., Black, A.W.: Voice convergin: speaker de-identification by voice transformation. In: Proceedings of ICASSP, pp. 3909–3912 (2009) Jin, Q., Toth, A.R., Schultz, T., Black, A.W.: Voice convergin: speaker de-identification by voice transformation. In: Proceedings of ICASSP, pp. 3909–3912 (2009)
3.
Zurück zum Zitat Pobar, M., Ipsic, I.: Online speaker de-identification using voice transformation. In: Proceedings of MIPRO, pp. 1264–1267 (2014) Pobar, M., Ipsic, I.: Online speaker de-identification using voice transformation. In: Proceedings of MIPRO, pp. 1264–1267 (2014)
4.
Zurück zum Zitat Justin, T., Struc, V., Dobrisek, S., Vesnicer, B., Ipsic, I., Mihelic, F.: Speaker de-identification using diphone recognition and speech synthesis. In: Proceedings of 11th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 1–7 (2015) Justin, T., Struc, V., Dobrisek, S., Vesnicer, B., Ipsic, I., Mihelic, F.: Speaker de-identification using diphone recognition and speech synthesis. In: Proceedings of 11th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 1–7 (2015)
5.
Zurück zum Zitat Magariños, C., Lopez-Otero, P., Docio, L., Erro, D., Rodriguez-Banga, E., Garcia-Mateo, C.: Piecewise linear definition of transformation functions for speaker de-identification. In: Proceedings of SPLINE (2016) Magariños, C., Lopez-Otero, P., Docio, L., Erro, D., Rodriguez-Banga, E., Garcia-Mateo, C.: Piecewise linear definition of transformation functions for speaker de-identification. In: Proceedings of SPLINE (2016)
6.
Zurück zum Zitat Magariños, C., Lopez-Otero, P., Docio, L., Rodriguez-Banga, E., Erro, D., Garcia-Mateo, C.: Reversible speaker de-identification using pre-trained transformation functions. IEEE Signal Process. Lett. (2016, submitted) Magariños, C., Lopez-Otero, P., Docio, L., Rodriguez-Banga, E., Erro, D., Garcia-Mateo, C.: Reversible speaker de-identification using pre-trained transformation functions. IEEE Signal Process. Lett. (2016, submitted)
7.
Zurück zum Zitat Erro, D., Moreno, A., Bonafonte, A.: Flexible harmonic/stochastic speech synthesis. In: Proceedings of 6th ISCA Speech Synthesis Workshop, pp. 194–199 (2007) Erro, D., Moreno, A., Bonafonte, A.: Flexible harmonic/stochastic speech synthesis. In: Proceedings of 6th ISCA Speech Synthesis Workshop, pp. 194–199 (2007)
8.
Zurück zum Zitat Degottex, G., Stylianou, Y.: Analysis and synthesis of speech using an adaptive full-band harmonic model. IEEE Trans. Audio Speech Lang. Process. 21(10), 2085–2095 (2013)CrossRef Degottex, G., Stylianou, Y.: Analysis and synthesis of speech using an adaptive full-band harmonic model. IEEE Trans. Audio Speech Lang. Process. 21(10), 2085–2095 (2013)CrossRef
9.
Zurück zum Zitat Stylianou, Y.: Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification. Ph.D. thesis, ENST, Paris (1996) Stylianou, Y.: Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification. Ph.D. thesis, ENST, Paris (1996)
10.
Zurück zum Zitat Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: Proceedings of Institute of Phonetic Sciences, University of Amsterdam, pp. 97–110 (1993) Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: Proceedings of Institute of Phonetic Sciences, University of Amsterdam, pp. 97–110 (1993)
11.
Zurück zum Zitat Tokuda, K., Kobayashi, T., Masuko, T., Imai, S.: Mel-generalized cepstral analysis - a unified approach to speech spectral estimation. In: Proceedings of ICSLP, vol. 3, pp. 1043–1046 (1994) Tokuda, K., Kobayashi, T., Masuko, T., Imai, S.: Mel-generalized cepstral analysis - a unified approach to speech spectral estimation. In: Proceedings of ICSLP, vol. 3, pp. 1043–1046 (1994)
12.
Zurück zum Zitat Nematollahi, M.A., Al-Haddad, S.A.R.: An overview of digital speech watermarking. Int. J. Speech Tech. 16(4), 471–488 (2013)CrossRef Nematollahi, M.A., Al-Haddad, S.A.R.: An overview of digital speech watermarking. Int. J. Speech Tech. 16(4), 471–488 (2013)CrossRef
13.
Zurück zum Zitat Kirovski, D., Malvar, H.S.: Spread-spectrum watermarking of audio signals. IEEE Trans. Signal Process. 51(4), 1020–1033 (2003)MathSciNetCrossRef Kirovski, D., Malvar, H.S.: Spread-spectrum watermarking of audio signals. IEEE Trans. Signal Process. 51(4), 1020–1033 (2003)MathSciNetCrossRef
14.
Zurück zum Zitat Korzhik, V.I., Morales-Luna, G., Fedyanin, I.: Audio watermarking based on echo hiding with zero error probability. Int. J. Emerg. Technol. Adv. Eng. 10(1), 1–10 (2013) Korzhik, V.I., Morales-Luna, G., Fedyanin, I.: Audio watermarking based on echo hiding with zero error probability. Int. J. Emerg. Technol. Adv. Eng. 10(1), 1–10 (2013)
15.
Zurück zum Zitat Hernaez, I., Saratxaga, I., Ye, J., Sanchez, J., Erro, D., Navas, E.: Speech watermarking based on coding of the harmonic phase. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 259–268. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13623-3_27 Hernaez, I., Saratxaga, I., Ye, J., Sanchez, J., Erro, D., Navas, E.: Speech watermarking based on coding of the harmonic phase. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 259–268. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-13623-3_​27
16.
Zurück zum Zitat Zeki, A.M., Manaf, A.A.: A novel digital watermarking technique based on ISB (Intermediate Significant Bit). Int. J. Comput. Electr. Autom. Control Inf. Eng. 3(2), 444–451 (2009) Zeki, A.M., Manaf, A.A.: A novel digital watermarking technique based on ISB (Intermediate Significant Bit). Int. J. Comput. Electr. Autom. Control Inf. Eng. 3(2), 444–451 (2009)
17.
Zurück zum Zitat Moon, T.K.: Error Correction Coding: Mathematical Methods and Algorithms. Wiley, New York (2005)CrossRefMATH Moon, T.K.: Error Correction Coding: Mathematical Methods and Algorithms. Wiley, New York (2005)CrossRefMATH
18.
Zurück zum Zitat Rix, A., Beerends, J., Hollier, M., Hekstra, A.: Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs. In: Proceedings of ICASSP, vol. 2, pp. 749–752 (2001) Rix, A., Beerends, J., Hollier, M., Hekstra, A.: Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs. In: Proceedings of ICASSP, vol. 2, pp. 749–752 (2001)
20.
Zurück zum Zitat Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRef Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRef
Metadaten
Titel
Reversible Speech De-identification Using Parametric Transformations and Watermarking
verfasst von
Aitor Valdivielso
Daniel Erro
Inma Hernaez
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-49169-1_26