Skip to main content
Top

2010 | OriginalPaper | Chapter

28. Spatial Speaker Spatial Positioning of Synthesized Speech in Java

Authors : Jaka Sodnik, Sašo Tomažič

Published in: Machine Learning and Systems Engineering

Publisher: Springer Netherlands

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we propose a “Spatial speaker”, an enhancement of a Java FreeTTS speech synthesizer with the additional function of spatial positioning of both the speaker and the listener. Our module enables the reading of an arbitrary text from a file or webpage to the user from a fixed or changing position in space through normal stereo headphones. Our solution combines the following modules: FreeTTS speech synthesizer, a custom made speech processing unit, MIT Media Lab HRTF library, JOAL positioning library and Creative X-Fi sound card. The paper gives an overview of the design of the “Spatial Speaker” and proposes three different practical applications of such a system for visually impaired and blind computer users. Some preliminary results of user studies confirmed the system’s usability and showed its great potential also in other types of applications and auditory interfaces. The entire system is developed as a single Java class which can be imported and used in any Java application.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference R.A. Cole, Survey of the State of the Art in Human Language Technology (1996) R.A. Cole, Survey of the State of the Art in Human Language Technology (1996)
5.
go back to reference E.D. Mynatt, Transforming graphical interfaces into auditory interfaces. Doctoral Dissertation, Georgia Institute of Technology, 1995 E.D. Mynatt, Transforming graphical interfaces into auditory interfaces. Doctoral Dissertation, Georgia Institute of Technology, 1995
6.
go back to reference K. Crispien, K. Fellbaum, A 3D-auditory environment for hierarchical navigation in non-visual interaction, in Proceedings of ICAD96, 1996 K. Crispien, K. Fellbaum, A 3D-auditory environment for hierarchical navigation in non-visual interaction, in Proceedings of ICAD96, 1996
7.
go back to reference J. Sodnik, S. Tomažič, Spatial auditory interface for word processing application, in Proceedings of IEEE ACHI09 2009, pp. 271–276 J. Sodnik, S. Tomažič, Spatial auditory interface for word processing application, in Proceedings of IEEE ACHI09 2009, pp. 271–276
8.
go back to reference Y. Borodin, J. Mahmud, I.V. Ramakrishnan, A. Stent, The HearSay non-visual web browser. Proc. Int. Cross Discip. Conf. Web Accessibility, 225, 128–129 (2007) Y. Borodin, J. Mahmud, I.V. Ramakrishnan, A. Stent, The HearSay non-visual web browser. Proc. Int. Cross Discip. Conf. Web Accessibility, 225, 128–129 (2007)
9.
go back to reference J.U. Mahmud, Y. Borodin, I.V. Ramakrishnan, Csurf: a context-driven non-visual web-browser, in Proceedings of 16th International Conference on World Wide Web, pp. 31–40, 2007 J.U. Mahmud, Y. Borodin, I.V. Ramakrishnan, Csurf: a context-driven non-visual web-browser, in Proceedings of 16th International Conference on World Wide Web, pp. 31–40, 2007
10.
go back to reference P. Roth, L.S. Petrucci, A. Assimacopoulos, T. Pun, Audio-haptic internet browser and associated tools for blind and visually impaired computer users, in Workshop on Friendly Exchanging Through the Net, pp. 57–62, 2000 P. Roth, L.S. Petrucci, A. Assimacopoulos, T. Pun, Audio-haptic internet browser and associated tools for blind and visually impaired computer users, in Workshop on Friendly Exchanging Through the Net, pp. 57–62, 2000
11.
go back to reference S. Goose, C. Moller, A 3D audio only interactive Web browser: using spatialization to convey hypermedia document structure, in Proc. of Seventh ACM International Conference on Multimedia, pp. 363–371, 1999 S. Goose, C. Moller, A 3D audio only interactive Web browser: using spatialization to convey hypermedia document structure, in Proc. of Seventh ACM International Conference on Multimedia, pp. 363–371, 1999
12.
go back to reference J. Sodnik, C. Dicke, S. Tomažič, M. Billinghurst, A user study of auditory versus visual interfaces for use while driving. Int. J. Human Comput. Stud. 66(5), 318–332 (2008)CrossRef J. Sodnik, C. Dicke, S. Tomažič, M. Billinghurst, A user study of auditory versus visual interfaces for use while driving. Int. J. Human Comput. Stud. 66(5), 318–332 (2008)CrossRef
13.
go back to reference S. Goose, S. Kodlahalli, W. Pechter, R. Hjelsvold, Streaming speech3: a framework for generating and streaming 3D text-to-speech and audio presentations to wireless PDAs as specified using extensions to SMIL, in Proceedings of the International Conference on World Wide Web, pp. 37–44, 2002 S. Goose, S. Kodlahalli, W. Pechter, R. Hjelsvold, Streaming speech3: a framework for generating and streaming 3D text-to-speech and audio presentations to wireless PDAs as specified using extensions to SMIL, in Proceedings of the International Conference on World Wide Web, pp. 37–44, 2002
14.
go back to reference S. Goose, J. Riedlinger, S. Kodlahalli, Conferencing3: 3D audio conferencing and archiving services for handheld wireless devices. Int. J. Wireless Mobile Comput. 1(1), 5–13 (2005)CrossRef S. Goose, J. Riedlinger, S. Kodlahalli, Conferencing3: 3D audio conferencing and archiving services for handheld wireless devices. Int. J. Wireless Mobile Comput. 1(1), 5–13 (2005)CrossRef
15.
go back to reference K. Crispien, W. Wurz, G. Weber, Using spatial audio for the enhanced presentation of synthesised speech within screen-readers for blind computer users, Computers for Handicapped Persons, vol. 860. (Springer, Berlin, Heidelberg, 1994), pp. 144–153 K. Crispien, W. Wurz, G. Weber, Using spatial audio for the enhanced presentation of synthesised speech within screen-readers for blind computer users, Computers for Handicapped Persons, vol. 860. (Springer, Berlin, Heidelberg, 1994), pp. 144–153
20.
go back to reference C.I. Cheng, G.H. Wakefield, Introduction to head-related transfer functions (HRTF’s): representations of HRTF’s in time, frequency, and space (invited tutorial). J. Audio Eng. Soc. 49(4), 231–249 (2001) C.I. Cheng, G.H. Wakefield, Introduction to head-related transfer functions (HRTF’s): representations of HRTF’s in time, frequency, and space (invited tutorial). J. Audio Eng. Soc. 49(4), 231–249 (2001)
21.
go back to reference J. Blauert, Spatial Hearing: The Psychophysics of Human Sound Localization, Revised edn. (MIT Press, Cambridge, MA, 1997) J. Blauert, Spatial Hearing: The Psychophysics of Human Sound Localization, Revised edn. (MIT Press, Cambridge, MA, 1997)
22.
go back to reference B. Arons, A review of the cocktail party effect, J. Am. Voice I/O Soc. 12, 35–50 (1992) B. Arons, A review of the cocktail party effect, J. Am. Voice I/O Soc. 12, 35–50 (1992)
Metadata
Title
Spatial Speaker Spatial Positioning of Synthesized Speech in Java
Authors
Jaka Sodnik
Sašo Tomažič
Copyright Year
2010
Publisher
Springer Netherlands
DOI
https://doi.org/10.1007/978-90-481-9419-3_28

Premium Partner