skip to main content
research-article

Fully perceptual-based 3D spatial sound individualization with an adaptive variational autoencoder

Published:20 November 2017Publication History
Skip Abstract Section

Abstract

To realize 3D spatial sound rendering with a two-channel headphone, one needs head-related transfer functions (HRTFs) tailored for a specific user. However, measurement of HRTFs requires a tedious and expensive procedure. To address this, we propose a fully perceptual-based HRTF fitting method for individual users using machine learning techniques. The user only needs to answer pairwise comparisons of test signals presented by the system during calibration. This reduces the efforts necessary for the user to obtain individualized HRTFs. Technically, we present a novel adaptive variational AutoEncoder with a convolutional neural network. In the training, this AutoEncoder analyzes publicly available HRTFs dataset and identifies factors that depend on the individuality of users in a nonlinear space. In calibration, the AutoEncoder generates high-quality HRTFs fitted to a specific user by blending the factors. We validate the feasibilities of our method through several quantitative experiments and a user study.

Skip Supplemental Material Section

Supplemental Material

a212-yamamoto.mp4

mp4

61.9 MB

References

  1. V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano. 2001. The CIPIC HRTF Database. In IEEE Workshop on Applications of Signal Processing to Audio and Electroacoustics. 99--102.Google ScholarGoogle Scholar
  2. P. Bilinski, J. Ahrens, M. Thomas, I. Tashev, and J. Platt. 2014. HRTF magnitude synthesis via sparse representation of anthropometric features. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Proces.Google ScholarGoogle Scholar
  3. Eric Brochu, Tyson Brochu, and Nando de Freitas. 2010. A Bayesian Interactive Optimization Approach to Procedural Animation Design. In Proc. of ACM SCA. 103--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Xuefeng Chen, Xiabi Liu, and Yunde Jia. 2009. Combining Evolution Strategy and Gradient Descent Method for Discriminative Learning of Bayesian Classifiers. In Proc. of Genetic and Evolutionary Computation. 507--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Djork-Arne Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2016. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). In Proc. of ICLR.Google ScholarGoogle Scholar
  6. Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or --- 1. In arXiv.Google ScholarGoogle Scholar
  7. R. Duraiswaini, D.N. Zotkin, and N.A. Gumerov. 2004. Interpolation and range extrapolation of HRTFs {head related transfer functions}. In ICASSP.Google ScholarGoogle Scholar
  8. Leon A. Gatys, Alexander S. Ecker, and Matthias Bethges. 2016. Image Style Transfer Using Convolutional Neural Networks. In Proc. of IEEE CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  9. Felipe Grijalva, Luiz Martini, Siome Goldenstein, and Dinei Florencio. 2014. Anthropometric-Based Customization of Head-Related Transfer Functions using Isomap in The Horizontal Plane. In ICASSP.Google ScholarGoogle Scholar
  10. Nail A. Gumerov, Adam E. O' Donovan, Ramani Duraiswami, and Dmitry N. Zotkin. 2010. Computation of the head-related transfer function via the fast multipole accelerated boundary element method and its spherical harmonic representation. In J. Acoust Soc. Am, Vol. 127.Google ScholarGoogle ScholarCross RefCross Ref
  11. N Hansen, SD Muller, and P Koumoutsakos. 2003. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). In Evolutionary Computation. 1--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. In Proc. of CVPR.Google ScholarGoogle Scholar
  13. Daniel Holden, Jun Saito, and Taku Komura. 2016. A Deep Learning Framework for Character Motion Synthesis and Editing. ACM Transaction on Graphics (SIGGRAPH), 35, 4 (2016), 138:1--138:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Josef Holzl. 2014. A Global Model for HRTF Individualization by Adjustment of Principal Component Weights. In Diploma Thesis.Google ScholarGoogle Scholar
  15. Hongmei Hu, Lin Zhou, Hao Ma, and Zhenyang Wu. 2008. HRTF personalization based on artificial neural net- work in individual virtual auditory space. In Applied Acoustics, Vol. 69. 163--172.Google ScholarGoogle ScholarCross RefCross Ref
  16. Q Huang and Y Fang. 2009. Modeling personalized head- related impulse response using support vector regressions. In J. Shanghai Univ.Google ScholarGoogle Scholar
  17. Q. Huang and Q. Zhuang. 2009. HRIR personalisation using support vector regression in independent feature space. In Electron. Letter, Vol. 45.Google ScholarGoogle Scholar
  18. PK. Iida, Y. Ishii, and S. Nishioka. 2014. Personalization of head-related transfer functions in the median plane based on the anthropometry of the listener's pinnae. In J. Acoust Soc. Am.Google ScholarGoogle Scholar
  19. Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proc. of ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Craig T. Jin, Pierre Guillon, Nicolas Epain, Reza Zolfaghari, Andre van Schaik, Anthony I. Tew, Carl Hetherington, and Jonathan Thorpe. 2014. Creating the Sydney York Morphological and Acoustic Recordings of Ears Database. In IEEE Transactions on Multimedia, Vol. 16.Google ScholarGoogle ScholarCross RefCross Ref
  21. Y. Kahana and P. A. Nelson. 2007. Boundary element simulations of the transfer function of human heads and baffled pinnae using accurate geometric model. In Journal of sound and vibration. 552--579.Google ScholarGoogle Scholar
  22. Shoken Kaneko, Tsukasa Suenaga, and Satoshi Sekine. 2016. DeepEarNet: individualizing spatial audio with photography, ear shape modeling, and neural networks. In AES Conference on Audio for Virtual and Augmented Reality.Google ScholarGoogle Scholar
  23. B. F. Katz. 2001. Boundary element method calculation of individual head-related transfer function. i. rigid model calculation. In J. Acoust Soc. Am.Google ScholarGoogle Scholar
  24. Kingma and Diederik P. 2014. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D Kingma and J P Ba. 2014. Adam: A method for stochastic optimization. In CoRR abs/1412.6980.Google ScholarGoogle Scholar
  26. Diederik P Kingma and Max Welling. 2014. Auto-encoding variational Bayes. In Proc. of ICLR.Google ScholarGoogle Scholar
  27. Yehuda Koren, Rovert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. In IEEE Computer, Vol. 42. IEEE, 30--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yuki Koyama, Daisuke Sakamoto, and Takeo Igarashi. 2014. Crowd-powered parameter analysis for visual design exploration. In Proc. of ACM UIST. 65--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E.H.A. Langendijk and A.W. Bronkhorst. 2000. Fidelity of three-dimensional-sound reproduction using a virtual auditory display. In J. Acoust. Soc. Am.Google ScholarGoogle ScholarCross RefCross Ref
  30. Yuancheng Luo, Dmitry N. Zotkin, Hal Daume, and Ramani Duraiswami. 2013b. Kernel regression for Head-Related Transfer Function interpolation and spectral extrema extraction. In ICASSP.Google ScholarGoogle Scholar
  31. Yuancheng Luo, Dmitry N. Zotkin, and Ramani Duraiswami. 2013a. Virtual AutoEncoder Based Recommendation System for Individualizing Head-Related Transfer Functions. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.Google ScholarGoogle Scholar
  32. G Matheron. 1963. Principles of geostatistics. In Economic Geology. 1246--1266.Google ScholarGoogle Scholar
  33. Noriyuki Matsunaga and Tatsuya Hirahara. 2010. Reexamination of fast head-related transfer function measurement by reciprocal method. In J. Acoust Soc. Ja, Vol. 31, 6.Google ScholarGoogle Scholar
  34. Alok Meshram, Ravish Mehra, and Dinesh Manocha. 2014. Efficient HRTF Computation using Adaptive Rectangular Decomposition. In AES 55th International Conference.Google ScholarGoogle Scholar
  35. J.C Middlebrooks. 1999. Virtual localization improved by scaling non-individualized external-ear transfer functions in frequency. In J. Acoust. Soc. Am. 106.Google ScholarGoogle Scholar
  36. P. Mokhtari, H Takemoto, R. Nishimura, and H. Kato. 2008. Computer simulation of hrtfs for personalization of 3d audio. In In Universal Communication, IEEE. ISUC '08. Second International Symposium. 435--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. P. Mokhtari, H Takemoto, R. Nishimura, and H. Kato. 2010. Computer simulation of kemar's head-related transfer functions: verification with measurements and acoustic effects of modifying head shape and pinna concavity. In Principles and Applications of Spatial Hearing. 179--194.Google ScholarGoogle Scholar
  38. H. Moller., M.F. Sorensen., Jensen C.B, and HammershOi. 1996. Binaural technique: do we need individual recordings?. In J. Audio Eng. Soc. 44, 451e469.Google ScholarGoogle Scholar
  39. Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In Proc. of ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning Structured Output Representation using Deep Conditional Generative Models. In Advances in Neural Information Processing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ryusuke Takahama, Toshihiro Kamishima, and Hisashi Kashima. 2016. Progressive Comparison for Ranking Estimation. In Proc. of IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning Spatiotemporal Features with 3D Convolutional Networks. In Proc. of IEEE ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Ko-ray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. In CoRR abs/1609.03499.Google ScholarGoogle Scholar
  44. Z. Wang and C. F. Chan. 2013. HRIR customization using common factor decomposition and joint support vector regression. In Eur. Signal Process. Conf.Google ScholarGoogle Scholar
  45. E.M Wenzel, D. J Arruda, and D.J Kistler. 1993. Localization using non-individualized head-related transfer functions. In J. Acoust. Soc. Am. 94.Google ScholarGoogle ScholarCross RefCross Ref
  46. E.M Wenzel and S.H Foster. 1993. Perceptual consequences of interpolating head-related transfer functions during spatial synthesis. In Proc. of Workshop on Applications of Signal Processing to Audio and Acoustics.Google ScholarGoogle ScholarCross RefCross Ref
  47. T. Xiao and Q. H. Liu. 2003. Finite difference computation of head-related transfer function for human hearing. In J. Acoust Soc. Am.Google ScholarGoogle Scholar
  48. M. E Yumer, P Asente, R Mech, and L. B Kara. 2015. Procedural Modeling Using Autoencoder Networks. In Proc. of ACM UIST. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. D. N. Zotkin, R. Duraiswami, and L. S. Davis. 2004. Rendering localized spatial audio in a virtual auditory space. In IEEE Transactions on Multimedia, vol. 6(4). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Dmitry N. Zotkin, Ramani Duraiswami, Elena Grassi, and Nail A. Gumerov. 2006. Fast head-related transfer function measurement via reciprocity. In J. Acoust Soc. Am, Vol. 120.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Fully perceptual-based 3D spatial sound individualization with an adaptive variational autoencoder

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 36, Issue 6
      December 2017
      973 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3130800
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 November 2017
      Published in tog Volume 36, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader