Top

Published in:

2012 | OriginalPaper | Chapter

1. Historical and Procedural Overview of Forensic Speaker Recognition as a Science

Authors : Kanae Amino, Ph.D., Takashi Osanai, Ph.D., Toshiaki Kamada, B.E., Hisanori Makinae, Ph.D., Takayuki Arai, Ph.D.

Published in: Forensic Speaker Recognition

Publisher: Springer New York

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Forensic phonetics and acoustics are nowadays widely used regarding police and legal use of acoustic samples. Among many tasks included in this area, forensic speaker recognition is considered as one of the most complex problems. Forensic speaker recognition, sometimes called forensic speaker comparison, is a process for making judgments on whether or not two speech samples are from the same speaker. This chapter introduces the historical backgrounds of forensic speaker recognition including “voiceprint” controversy, human-based visual and auditory forensic speaker recognition, and automatic forensic speaker recognition. Procedural considerations in forensic speaker recognition processes and factors that affect recognition performances are also presented. Finally, we will give a summary of the progress and developments made in the forensic automatic speaker recognition.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

next chapter Automatic Speaker Recognition for Forensic Case Assessment and Interpretation

Nolan F (1983) The phonetic basis of speaker recognition. Cambridge studies in speech science and communiation. Cambridge University Press, Cambridge

Schmidt-Nielsen A, Stern KR (1985) Identification of known voices as a function of familiarity and narrow-band coding. J Acoust Soc Am 77:658–663CrossRef

Van Lacker D, Kreiman J, Emmorey K (1985) Familiar voice recognition: patterns and parameters part 1: recognition of backward voices. J Phonetics 13:19–38

Van Lacker D, Kreiman J (1985) Familiar voice recognition: patterns and parameters part 2: recognition of rate-altered voices. J Phonetics 13:39–52

Cheney D, Seyfarth R (1980) Vocal recognition in free-ranging vervet monkeys. Anim Behav 28:362–367CrossRef

Rendall D, Rodman PS, Emond RE (1996) Vocal recognition of individuals and kin in free-ranging rhesus monkeys. Anim Behav 51:1007–1015CrossRef

Sugiura H (2001) Vocal exchange of coo calls in Japanese macaques. In: Matsuzawa T (ed) Primate origins of human cognition and behaviour. Springer, Tokyo, pp 135–154

Bricker P, Pruzansky S (1976) Speaker recognition. In: Lass N (ed) Contemporary issues in experimental phonetics. Academic Press, New York, pp 295–326

Furui S (1992) Acoustic and speech engineering (onkyo, onsei kougaku). Kindai Kagakusha Publishing Company, Tokyo

10.

National Research Council (1979) On the theory and practice of voice identification. National Academy of Science, Washington, pp 3–13

11.

Steinberg JC (1934) Application of sound measuring instruments to the study of phonetic problems. J Acoust Soc Am 6:16–24CrossRef

12.

Potter R (1945) Visible patterns of speech. Science 102:463–470CrossRef

13.

Grey CHG, Kopp GA (1944) Voiceprint identification. Bell Telephone Laboratory Annual Report, New York, pp 1–14

14.

Tosi O, Oyer H, Lashbrook W, Pedrey C, Nicol J, Nash E (1972) Experiment on voice identification. J Acoust Soc Am 51:2030–2043CrossRef

15.

Kersta L (1962) Voiceprint identification. Nature 196:1253–1257CrossRef

16.

Campbell JP, Shen W, Campbell WM, Schwartz R, Bonastre JF, Matrouf D (2009) Forensic speaker recognition. IEEE Signal Process Mag 26:95–103CrossRef

17.

Young MA, Campbell RA (1967) Effects of context on talker identification. J Acoust Soc Am 42:1250–1254CrossRef

18.

Tosi O (1968) Speaker identification through acoustic spectrography. Proc Logoped Phoniatr, pp 138–145

19.

Stevens KN, Williams CE, Carbonell JR, Woods B (1968) Speaker authentication and identification: a comparison of spectrographic and auditory presentations of speech material. J Acoust Soc Am 44:1596–1607CrossRef

20.

Bolt RH, Cooper FS, David EE Jr, Denes PB, Pickett JM, Stevens KN (1970) Speaker identification by speech spectrograms: a scientists’ view of its reliability for legal purposes. J Acoust Soc Am 47:597–612CrossRef

21.

Bolt RH, Cooper FS, David EE Jr, Denes PB, Pickett JM, Stevens KN (1973) Speaker identification by speech spectrograpms: some further observations. J Acoust Soc Am 54:531–534CrossRef

22.

Koenig BE (1986) Spectrographic voice identification: a forensic survey. J Acoust Soc Am 79:2088–2090CrossRef

23.

Shipp T, Doherty TE, Hollien H (1987) Some fundamental considerations regarding voice identification. J Acoust Soc Am 82:687–688CrossRef

24.

Koenig BE, Ritenour DV Jr, Kohus BA, Kelly S (1987) Reply to ‘Some fundamental considerations regarding voice identification’. J Acoust Soc Am 82:688–689CrossRef

25.

Lindh J (2004) Handling the voiceprint issue. Proc Fonetik, pp 72–75

26.

Poza FT, Begault DR (2005) Voice identification and elimination using sural-spectrographic protocols. Proc AES Int’l Conf, pp 1–8

27.

McGehee F (1937) The reliability of the identification of the human voice. J Gen Psychol 17:249–271CrossRef

28.

McGehee F (1944) An experimental study of voice recognition. J Gen Psychol 31:53–65CrossRef

29.

Pollack I, Pickett JM, Sumby WH (1954) On the identification of speaker by voice. J Acoust Soc Am 26:403–406CrossRef

30.

Bricker P, Pruzansky S (1966) Effects of stimulus content and duration on talker identification. J Acoust Soc Am 40:1441–1450CrossRef

31.

Clifford BR (1980) Voice identification by human listeners: on earwitness reliability. Law Human Behav 4:373–394CrossRef

32.

Papcun G, Kreiman J, Davis A (1989) Long-term memory for unfamiliar voices. J Acoust Soc Am 85:913–925CrossRef

33.

Yarmey AD, Matthys E (1992) Voice identification of an abductor. Appl Cogn Psychol 6:367–377CrossRef

34.

Yarmey AD, Yarmey AL, Yarmey M, Parliament L (2001) Commonsense beliefs and the identification of familiar voices. Appl Cogn Psychol 15:283–299CrossRef

35.

O’Shaughnessy D (2001) Speech communication—human and machine, 2nd edn. Addison-Wesley Publishing Company, New York

36.

Hollien H (2002) Forensic voice identification. Academic Press, San Diego

37.

Bonastre JF, Bimbot F, Boe LJ, Campbell JP, Reynolds DA, Magrin-Chagnolleau I (2003) Person authentication by voice: a need for caution. Proc Eurospeech, pp 1–4

38.

Denes PB, Pinson EN (1993) The speech chain, 2nd edn. Worth Publishers, New York

39.

Kuenzel H (2000) Effects of voice disguise on speaking fundamental frequency. Forensic Ling 7:149–179CrossRef

40.

Zhang C, Tan T (2007) Voice disguise and automatic speaker recognition. Forensic Sci Int 175:118–122CrossRef

41.

Reich AR, Duke JE (1979) Effects of selected vocal disguises upon speaker identification by listening. J Acoust Soc Am 66:1023–1028CrossRef

42.

Orchard TL, Yarmey AD (1995) The effects of whispers, voice-sample duration, and voice distinctiveness on criminal speaker identification. Appl Cogn Psychol 9:249–260CrossRef

43.

Sjoestroem M, Eriksson E, Zetterholm E, Sullivan KP (2006) A switch of dialect as disguise. Lund Univ. Linguistics and Phonetics Woking Papers, vol 52, pp 113–116

44.

Markham D (1999) Listeners and disguised voices: the imitation and perception of dialect accent. J Speech Lang Law 6:289–299

45.

Amino K, Arai T (2009) Dialectal characteristics of Osaka and Tokyo Japanese: analyses of phonologically identical words. Proc Interspeech, pp 2303–2306

46.

House AS, Stevens KN (1993) Speech production: thirty years after. J Acoust Soc Am 94:1763CrossRef

47.

Hollien H, Schwartz R (2000) Aural-perceptual speaker identification: problems with noncontemporary samples. Forensic Linguist 7:199–211CrossRef

48.

Hollien H, Schwartz R (2001) Speaker identification utilizing noncontemporary speech. J Forensic Sci 46:63–67

49.

Amino K, Osanai T, Kamada T, Makinae H, Arai T (2011) Effects of the phonological contents and transmission channels on forensic speaker recognition. In: Neustein A, Patil HA (eds) Advances in forensic speaker recognition. Springer

50.

Kuenzel HJ (2001) Beware of the ’telephone effect’: the influence of telephone transmission on the measurement of formant frequencies. Forensic Liguist 8:80–99CrossRef

51.

Byne C, Foulkes P (2004) The ‘mobile phone effect’ on vowel formants. J Speech Lang Law 11:1350–1771

52.

Lawrence S, Nolan F, McDougall K (2008) Acoustic and perceptual effects of telephone transmission on vowel quality. J Speech Lang Law 15:161–192

53.

Titze I (1989) Physiologic and acoustic differences between male and female voices. J Acoust Soc Am 85:1699–1707CrossRef

54.

Kent RD, Read C (2001) Acoustic analysis of speech, 2nd edn. Cengage Learning

55.

Clarke FR, Becker RW (1969) Comparison of techniques for discriminating among talkers. J Speech Hear Res 12:747–761

56.

Thompson CP (1987) A language effect in voice identification. Appl Cogn Psychol 1:121–131CrossRef

57.

Goggin J, Thompson CP, Strube G, Simental LR (1991) The role of language familiarity in voice identification. Mem Cognit 19:448–458CrossRef

58.

Koester O, Schiller NO (1997) Different influences of the native language of a listener on speaker recognition. Forensic Linguist 4:18–28

59.

Philippon AC, Cherryman J, Bull R, Vrij A (2007) Earwitness identification performances: the effect of language, target, deliberate strategies and indirect measures. Appl Cogn Psychol 21:539–550CrossRef

60.

Hashimoto M, Kitagawa S, Higuchi N (1998) Quantitative analysis of acoustic features affecting speaker identification. J Acoust Soc Jpn 54:169–178

61.

Hollien H, Majewski W, Doherty TE (1982) Perceptual identification of voices under normal, stress, and disguise speaking conditions. J Phonetics 10:139–148

62.

Ladefoged P, Ladefoged J (1980) The ability of listeners to identify voices. UCLA Working Papers Phon 49:43–89

63.

Nygaard L (2005) Perceptual integration of linguistic and nonlinguistic properties of speech. In: Pisoni DB, Remez RE (eds) The handbook of speech perception. Blackwell, Oxford, pp 390–413

64.

Roebuck R, Wilding J (1993) Effects of vowel variety and sample length on identification of a speaker in a line-up. Appl Cogn Psychol 7:475–481CrossRef

65.

Cook S, Wilding J (1997) Earwitness testimony: never mind the variety, hear the length. Appl Cogn Psychol 11:95–111CrossRef

66.

Loftus EF, Loftus GR, Messo J (1987) Some facts about weapon focus. Law Human Behav 11:55–62CrossRef

67.

Loftus EF, Miller DG, Burns HJ (1978) Semantic integration of verbal information into a visual memory. J Exp Psychol Human Learn Mem 4:19–31CrossRef

68.

Schooler JW, Engstler-Schooler TY (1990) Verbal overshadowing of visual memories: some things are better left unsaid. Cogn Psychol 22:36–71CrossRef

69.

Chin JM, Schooler JW (2008) Why do words hurt? Content, process, and criterion shift accounts of verbal overshadowing. Eur J Cogn Psychol 20:396–413CrossRef

70.

Kitagami S (2001) Disruptive effect of verbal encoding on memory and cognition of nonverbal information. Kyoto Univ Dept Edu Bull Paper 47:403–413

71.

Kasahara H, Ochi K (2008) Verbal overshadowing effect in earwitness perception. Proc Ann Conv Jpn Psychol Assoc 72:889

72.

Cook S, Wilding J (2001) Earwitness testimony: effects of exposure and attention on the face overshadowing effect. Br J Psychol 92:617–629CrossRef

73.

Kasahara H, Ochi K (2006) Effect of face presence on memory for a voice. J Jpn Acad Facial Studies 6:71–76

74.

Yarmey AD, Yarmey AL, Yarmey MJ (1994) Face and voice identifications in showups and lineups. Appl Cogn Psychol 8:453–464CrossRef

75.

Bull R, Clifford BR (1984) Earwitness voice recognition accuracy. In: Wells GL, Loftus EF (eds) Eyewitness testimony: psychological perspectives. Cambridge University Press, Cambridge, pp 92–123

76.

Kerstholt JH, Jansen N, Van Amelsvoort AG, Broeders AP (2004) Earwitnesses: effects of speech duration, retention, internal and acoustic environment. Appl Cogn Psychol 18:327–336CrossRef

77.

Van Wallendael LR, Surace A, Parsons DH, Brown M (1994) Earwitness’ voice recognition: factors affecting accuracy and impact on jurors. Appl Cogn Psychol 8:661–677CrossRef

78.

Pruzansky S (1963) Pattern-matching procedure for automatic talker recognition. J Acoust Soc Am 35:354–358CrossRef

79.

Li KP, Dammann JE, Chapman WD (1966) Experimental studies in speaker verification, using and adaptive system. J Acoust Soc Am 40:966–978CrossRef

80.

Glenn JW, Kleiner N (1967) Speaker identification based on nasal phonation. J Acoust Soc Am 43:368–372CrossRef

81.

Furui S, Itakura F, Saito S (1972) Talker recognition by the longtime averaged speech spectrum. IEICE Trans 55-A(1):549–556

82.

Wolf JJ (1971) Efficient acoustic parameters for speaker recognition. J Acoust Soc Am 51:2044–2056CrossRef

83.

Atal BS (1972) Automatic speaker recognition based on pitch contours. J Acoust Soc Am 52:1687–1697CrossRef

84.

Furui S, Itakura F (1973) Talker recognition by statistical features of speech sounds. Electron Commun Jap 56-A:62–71

85.

Atal BS (1974) Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoust Soc Am 55:1304–1312CrossRef

86.

Sambur MR (1975) Selection of acoustic features for speaker identification. IEEE Trans Acoust Speech Sig Process 23:176–182CrossRef

87.

Hollien H, Majewski W (1977) Speaker identification by long-term spectra under normal and distorted speech conditions. J Acoust Soc Am 62:975–980CrossRef

88.

Matsumoto H, Nimura T (1978) Text-independent speaker identification based on piecewise canonical discriminant analysis. Proc Int Conf Acoust Speech Sig Process, 3:291–294

89.

Markel JD, Davis SB (1979) Text-independent speaker recognition from a large linguistically unconstrained time spaced data base. IEEE Trans Acoust Speech Sig Process 27:74–82CrossRef

90.

Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Sig Process 29:254–272CrossRef

91.

Li KP, Wrench EH (1983) Text-independent speaker recognition with short utterances. Proc Int Conf Acoust Speech Sig Process, 8:555–558

92.

Soong F, Rosenberg A, Rabiner L, Juang BH (1985) A vector quantization approach to speaker recognition. Proc Int Conf Acoust Speech Sig Process, 387–390

93.

Rosenberg A, Soong F (1986) Evaluation of a vector quantisation talker recognition system in text independent and text dependent modes. Proc Int Conf Acoust Speech Sig Process, 11:873–876

94.

Shirai K, Mano K, Ishige D (1987) Speaker identification based on frequency distribution of vector-quantised spectra. IEICE Trans 70-D:1181–1188

95.

Rosenberg A, Lee CH, Soong F (1990) Sub-word unit talker verification using Hidden Markov Models. Proc Int Conf Acoust Speech Sig Process, 1:269–272

96.

Higgins A, Bahler L, Porter J (1991) Speaker verification using randomized phrase prompting. Digit Signal Process 1:89–106

97.

Tishby NZ (1991) On the application of mixture AR Hidden Markov Models to text-independent speaker recognition. IEEE Trans Acoust Speech Sig Process 39:563–570

98.

Reynolds AD, Carlson B (1995) Text-dependent speaker verification using decoupled and integrated speaker and speech recognizers. Proc Eurospeech, pp 647–650

99.

Reynolds AD, Rose R (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audi Process 3:72–83CrossRef

100.

Che C, Lin Q (1995) Speaker recognition using HMM with experiments on the YOHO database. Proc Eurospeech, pp 625–628

101.

NIST webpage. http://www.nist.gov/index.html

102.

NIST-SRE. http://www.itl.nist.gov/iad/mig//tests/sre/

103.

Doddington GR, Przybocki MA, Martin AF, Reynolds DA (2000) The NIST speaker recognition evaluation—overview, methodology, systems, results, perspective. Speech Commun 31:225–254CrossRef

104.

Nakasone H, Beck SD (2001) Forensic automatic speaker recognition. Proc A Speaker Odyssey—the speaker recognition workshop, pp 139–142

105.

Drygajlo A (2007) Forensic automatic speaker recognition. IEEE Signal Process Mag 24:132–135CrossRef

106.

Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. Proc Eurospeech, pp 1895–1898

107.

Bimbot F, Bonastre JF, Fredouille C, Gravier G, Magrin-Chagnolleau I, Meignier S, Merlin T, Ortega-Garcia J, Petrovska-Delacretaz D, Reynolds DA (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Signal Process 4:430–451

108.

Noda H, Darada K, Kawaguchi E, Sawai H (1998) A context-dependent approach for speaker verification using sequential decision. Proc Int Conf Spoken Lang Process

109.

Ortega-Garcia J, Cruz-Llanas S, Gonzalez-Rodriguez J (1998) Quantitative influence of speech variability factors for automatic speaker verification in forensic tasks. Proc Int Conf Spoken Lang Process

110.

Gonzalez-Rodriguez J, Ortega-Garcia J, Lucena-Molina JJ (2001) On the application of the Bayesian approach to real forensic conditions with GMM-based systems. Proc a speaker odyssey—the speaker recognition workshop, pp 135–138

111.

Meuwly D, Drygajlo A (2001) Forensic speaker recognition based on a Bayesian framework and Gaussian Mixture Modelling (GMM). Proc a speaker odyssey—the speaker recognition workshop, pp 145–150

112.

Alexander A, Botti F, Drygajlo A (2004) Handling mismatch in corpus-based forensic speaker recognition. Proc odyssey04 the speaker and language recognition workshop, pp 69–74

113.

Ramos D, Gonzalez-Rodriguez J, Gonzalez-Dominguez J, Lucena-Molina JJ (2008) Addressing database mismatch in forensic speaker recognition with Ahumada III: A public real-casework database in Spanish Proc Interspeech, pp 1493–1496

114.

Thiruvaran T, Ambikairajah E, Epps J (2008) FM features for automatic forensic speaker recognition. Proc Interspeech, pp 1497–1500

115.

Becker T, Jessen M, Grigoras C (2008) Forensic speaker verification using formant features and Gaussian Mixture Models. Proc Interspeech, pp 1505–1508

116.

Becker T, Jessen M, Alsbach S, Bross F, Meier T (2010) SPES: The BKA forensic automatic voice comparison system. Proc Odyssey—the Speaker and Language Recognition Workshop, pp 58–62

117.

Hermansky H (1989) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87:1738–1752CrossRef

118.

Paul JE, Rabinowitz AS, Riganati JP, Richardson JM (1975) Semi-automatic speaker identification system (SASIS)—analytical studies. Final Report C74–11841501, Rockwell International

119.

Bunge E (1977) Speaker recognition by computer. Philips Tech. Review 37(8):207–219

120.

Nakasone H, Melvin C (1989) C.A.V.I.S.: (Computer assisted voice identification system). Final Report 85-IJ-CX-0024. National Institute of Justice

121.

Falcone M, De Sairo N (1994) A PC speaker identification system for forensic use: IDEM. Proc ESCA workshop on automatic speaker recognition, identification and verification, pp 169–172

122.

Gonzalez-Rodriguez J, Ortega-Garcia J, Lucena-Molina JJ (2001) IdentiVox: a PC-Windows tool for text-independent speaker recognition in forensic environments. Prob Forensic Sci 47:246–253

123.

Drygajlo A, Meuwly D, Alexander A (2003) Statistical methods and Bayesian interpretation of evidence in forensic automatic speaker recognition. Proc Eurospeech, pp 689–692

124.

Agnitio, Sociedad Limitada. http://www.agnitio.es/index.php

125.

Morrison GS (2009) Forensic voice comparison and the paradigm shift. Sci Justice 49:298–308CrossRef

Title: Historical and Procedural Overview of Forensic Speaker Recognition as a Science
Authors: Kanae Amino, Ph.D.
Takashi Osanai, Ph.D.
Toshiaki Kamada, B.E.
Hisanori Makinae, Ph.D.
Takayuki Arai, Ph.D.
Publisher: Springer New York
Book: Forensic Speaker Recognition
Print ISBN: 978-1-4614-0262-6

Electronic ISBN: 978-1-4614-0263-3

Copyright Year: 2012
DOI: https://doi.org/10.1007/978-1-4614-0263-3_1

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"