Skip to main content
Top
Published in: International Journal of Speech Technology 4/2019

29-10-2019

nameGist: a novel phonetic algorithm with bilingual support

Authors: Shahidul Islam Khan, Md. Mahmudul Hasan, Mohammad Imran Hossain, Abu Sayed Md. Latiful Hoque

Published in: International Journal of Speech Technology | Issue 4/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Phonetic algorithm plays an essential role in many applications including name-matching, database record linkage, spelling correction, search recommendations, etc. Since 1918, many phonetic algorithms have been proposed by the researchers. Soundex, Match Rating Codex, NYSIIS, Metaphone, and Double Metaphone are among the frequently used phonetic algorithms. These algorithms were primarily developed for English phonetics, and they perform well for their intended purposes. Above algorithms do not support Bengali Language and show poor performance for Bengali phonetic representation in the English language. Some phonetic algorithms, e.g., NameSignifcance, Modified NameSignifcance, etc., have been proposed recently by researchers to deal with Bengali phonetic names but their performances are not up to the mark for English names. Besides, these algorithms do not support names written in the Bengali Language, i.e., Bengali Unicode. Bengali language, also known as Bangla among natives, is counted as the seventh most spoken language in the world. More than 250 million people, around the world, speak in Bengali. Use of Bengali Unicode is increasing in Bangladesh and around the globe with the increasing use of computers everywhere. For example, in different healthcare systems, a patient’s name can be stored both in English representation of Bengali or Bengali Unicode. Being unable to process Bengali Unicode leads to failure of linking information of the same patient from multiple databases. This creates a problem in record linkage or entity matching. In this paper, we proposed a novel phonetic algorithm—nameGist which can efficiently encode Bengali phonetic names in English representation, Bengali Unicode names and English phonetic names. We have tested nameGist in various datasets which contains Bengali Phonetic names, Bengali Unicode names, English Phonetic (American or British) names and a mixture of these types. In each case, our proposed algorithm, nameGist, performed better than other algorithms in terms of accuracy and F-measure. NameGist can be used to solve record linkage and entity resolution problems for Bengali, English, and mixed names effectively.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Christen, P. (2012). Data matching: Concepts and techniques for record linkage, entity resolution, and duplicate detection. New York: Springer.CrossRef Christen, P. (2012). Data matching: Concepts and techniques for record linkage, entity resolution, and duplicate detection. New York: Springer.CrossRef
go back to reference De Brou, D., & Olsen, M. (1986). The guth algorithm and the nominal record linkage of multi-ethnic populations. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 19(1), 20–24.CrossRef De Brou, D., & Olsen, M. (1986). The guth algorithm and the nominal record linkage of multi-ethnic populations. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 19(1), 20–24.CrossRef
go back to reference Khan, A. B. A., Ghazanfar, M. S., & Khan, S.I. (2017). Application of phonetic encoding for analyzing similarity of patient’s data: Bangladesh perspective. In 2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), (pp. 664–667). IEEE. Khan, A. B. A., Ghazanfar, M. S., & Khan, S.I. (2017). Application of phonetic encoding for analyzing similarity of patient’s data: Bangladesh perspective. In 2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), (pp. 664–667). IEEE.
go back to reference Khan, S. I., & Hoque, A. S. M. L. (2016). An analysis of the problems for health data integration in Bangladesh. In 2016 International Conference on Innovations in Science, Engineering and Technology (ICISET), (pp. 1–4). Khan, S. I., & Hoque, A. S. M. L. (2016). An analysis of the problems for health data integration in Bangladesh. In 2016 International Conference on Innovations in Science, Engineering and Technology (ICISET), (pp. 1–4).
go back to reference Khan, S. I., & Hoque, A. S. M. L. (2016). Similarity analysis of patients’ data: Bangladesh perspective. In 2016 International Conference on Medical Engineering, Health Informatics and Technology (MediTec), (pp. 1–5). IEEE. Khan, S. I., & Hoque, A. S. M. L. (2016). Similarity analysis of patients’ data: Bangladesh perspective. In 2016 International Conference on Medical Engineering, Health Informatics and Technology (MediTec), (pp. 1–5). IEEE.
go back to reference Khan, S. I., & Hoque, A. S. M. L. (2016). Towards development of national health data warehouse for knowledge discovery. Intelligent Systems Technologies and Applications, Advances in Intelligent Systems and Computing (Vol. 385, pp. 413–421). New York: Springer. Khan, S. I., & Hoque, A. S. M. L. (2016). Towards development of national health data warehouse for knowledge discovery. Intelligent Systems Technologies and Applications, Advances in Intelligent Systems and Computing (Vol. 385, pp. 413–421). New York: Springer.
go back to reference Khan, S. I., Hoque, A. S. M. L., & Ullah, M. (2016). National health data warehouse bangladesh for remote health monitoring: Features, problems and privacy issues. In Remote Health Monitoring Workshop. Khan, S. I., Hoque, A. S. M. L., & Ullah, M. (2016). National health data warehouse bangladesh for remote health monitoring: Features, problems and privacy issues. In Remote Health Monitoring Workshop.
go back to reference Lewis, M. P. (2018). Ethnologue: Languages of the world. Dallas: SIL International. Lewis, M. P. (2018). Ethnologue: Languages of the world. Dallas: SIL International.
go back to reference Peled, O., Fire, M., Lior, R., & Yuval, E. (2016). Matching entities across online social networks. Neurocomputing, 210, 61–106.CrossRef Peled, O., Fire, M., Lior, R., & Yuval, E. (2016). Matching entities across online social networks. Neurocomputing, 210, 61–106.CrossRef
go back to reference Philips, L. (1990). Hanging on the metaphone. Computer Language, 7(12), 39–43. Philips, L. (1990). Hanging on the metaphone. Computer Language, 7(12), 39–43.
go back to reference Philips, L. (2000). The double metaphone search algorithm. C/C++ Users Journal, 18(6), 38–43. Philips, L. (2000). The double metaphone search algorithm. C/C++ Users Journal, 18(6), 38–43.
go back to reference UzZaman, N., & Khan, M. (2004). A bangla phonetic encoding for better spelling suggesions. Tech. rep., BRAC University. UzZaman, N., & Khan, M. (2004). A bangla phonetic encoding for better spelling suggesions. Tech. rep., BRAC University.
go back to reference UzZaman, N., & Khan, M. (2005). A double metaphone encoding for bangla and its application in spelling checker. Tech. rep., BRAC University. UzZaman, N., & Khan, M. (2005). A double metaphone encoding for bangla and its application in spelling checker. Tech. rep., BRAC University.
go back to reference Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1–2), 69–90.CrossRef Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1–2), 69–90.CrossRef
Metadata
Title
nameGist: a novel phonetic algorithm with bilingual support
Authors
Shahidul Islam Khan
Md. Mahmudul Hasan
Mohammad Imran Hossain
Abu Sayed Md. Latiful Hoque
Publication date
29-10-2019
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 4/2019
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-019-09653-2

Other articles of this Issue 4/2019

International Journal of Speech Technology 4/2019 Go to the issue