Skip to main content
Top

2019 | OriginalPaper | Chapter

ANDMC: An Algorithm for Author Name Disambiguation Based on Molecular Cross Clustering

Authors : Siyang Zhang, Xinhua E, Tao Huang, Fan Yang

Published in: Database Systems for Advanced Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the rapid development of information technology, the problem of name ambiguity has become one of the main problems in the fields of information retrieval, data mining and scientific measurement, which inevitably affects the accuracy of information calculations, reduces the credibility of the literature retrieval system, and affect the quality of information. To deal with this, name disambiguation technology has been proposed, which maps virtual relational networks to real social networks. However, most existing related work did not consider the problem of name coreference and the inability to correctly match due to the different writing formats between two same strings. This paper mainly proposes an algorithm for Author Name Disambiguation based on Molecular Cross Clustering (ANDMC) considering name coreference. Meanwhile, we explored the string matching algorithm called Improved Levenshtein Distance (ILD), which solves the problem of matching between two same strings with different writing format. The experimental results show that our algorithm outperforms the baseline method. (F1-score 9.48% 21.45% higher than SC and HAC).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Hussain, I., Asghar, S.: A survey of author name disambiguation techniques. Knowl. Eng. Rev. 32, 1–24 (2018) Hussain, I., Asghar, S.: A survey of author name disambiguation techniques. Knowl. Eng. Rev. 32, 1–24 (2018)
2.
go back to reference Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J.: Swoosh: a generic approach to entity resolution. The VLDB J. 18, 255–276 (2008)CrossRef Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J.: Swoosh: a generic approach to entity resolution. The VLDB J. 18, 255–276 (2008)CrossRef
3.
go back to reference Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data 1 (2007) Article no. 5 Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data 1 (2007) Article no. 5
4.
go back to reference Li, X., Morie, P., Roth, D.: Identification and tracing of ambiguous names: discriminative and generative approaches. In: Proceedings of 19th National Conference on Artificial Intelligence (AAAI 2004), pp. 419–424 (2004) Li, X., Morie, P., Roth, D.: Identification and tracing of ambiguous names: discriminative and generative approaches. In: Proceedings of 19th National Conference on Artificial Intelligence (AAAI 2004), pp. 419–424 (2004)
5.
go back to reference Shen, Q., Wu, T., Yang, H., Wu, Y., Qu, H., Cui, W.: NameClarifier: a visual analytics system for author name disambiguation. IEEE Trans. Vis. Comput. Graph. 23(1), 141–150 (2017)CrossRef Shen, Q., Wu, T., Yang, H., Wu, Y., Qu, H., Cui, W.: NameClarifier: a visual analytics system for author name disambiguation. IEEE Trans. Vis. Comput. Graph. 23(1), 141–150 (2017)CrossRef
6.
go back to reference Kim, K., Khabsa, M., Giles, C.L.: Random Forest DBSCAN for USPTO inventor name disambiguation, pp. 269–270 (2016) Kim, K., Khabsa, M., Giles, C.L.: Random Forest DBSCAN for USPTO inventor name disambiguation, pp. 269–270 (2016)
8.
go back to reference Xu, X., Li, Y., Liptrott, M., Bessis, N.: NDFMF: an author name disambiguation algorithm based on the fusion of multiple features. In: IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Tokyo 2018, pp. 187–190 (2018) Xu, X., Li, Y., Liptrott, M., Bessis, N.: NDFMF: an author name disambiguation algorithm based on the fusion of multiple features. In: IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Tokyo 2018, pp. 187–190 (2018)
9.
go back to reference Ferreira, A., Goncalves, M.A., Laender, A.H.: A brief survey of automatic methods for author name disambiguation. ACM Sigmod Rec. 41(2), 15–26 (2012)CrossRef Ferreira, A., Goncalves, M.A., Laender, A.H.: A brief survey of automatic methods for author name disambiguation. ACM Sigmod Rec. 41(2), 15–26 (2012)CrossRef
10.
go back to reference Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)CrossRef Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)CrossRef
11.
go back to reference Han, H., Giles, L., Zha, H., et al.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of JCDL (2004) Han, H., Giles, L., Zha, H., et al.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of JCDL (2004)
13.
go back to reference Quan, L., Bo, W., Yuan, D.U., Wang, X., Yuhua, L.I.: Disambiguating authors by pairwise classification. Tsinghua Sci. Technol. 15(6), 668–677 (2010)CrossRef Quan, L., Bo, W., Yuan, D.U., Wang, X., Yuhua, L.I.: Disambiguating authors by pairwise classification. Tsinghua Sci. Technol. 15(6), 668–677 (2010)CrossRef
14.
go back to reference Malin, B.: Unsupervised name disambiguation via social network similarity. In: SIAM SDM Workshop on Link Analysis, Counterterrorism and Security (2005) Malin, B.: Unsupervised name disambiguation via social network similarity. In: SIAM SDM Workshop on Link Analysis, Counterterrorism and Security (2005)
16.
go back to reference Cen, L., Dragut, E.C., Si, L., Ouzzani, M.: Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: SIGIR 2013, 28 July–1 August 2013 Cen, L., Dragut, E.C., Si, L., Ouzzani, M.: Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: SIGIR 2013, 28 July–1 August 2013
17.
go back to reference Evans, M.D.: A new approach to journal and conference name disambiguation through k-means clustering of internet and document surrogates (2013) Evans, M.D.: A new approach to journal and conference name disambiguation through k-means clustering of internet and document surrogates (2013)
18.
go back to reference Shin, D., Kim, T., Jung, H., et al.: Automatic method for author name disambiguation using social networks. In: IEEE International Conference on Advanced Information NETWORKING and Applications, Aina 2010, Perth, Australia, 20–13 April. DBLP, pp. 1263–1270 (2010) Shin, D., Kim, T., Jung, H., et al.: Automatic method for author name disambiguation using social networks. In: IEEE International Conference on Advanced Information NETWORKING and Applications, Aina 2010, Perth, Australia, 20–13 April. DBLP, pp. 1263–1270 (2010)
19.
go back to reference Fan, X., Wang, J., Pu, X., et al.: On graph-based name disambiguation. J. Data Inf. Qual. 2(2), 10 (2011) Fan, X., Wang, J., Pu, X., et al.: On graph-based name disambiguation. J. Data Inf. Qual. 2(2), 10 (2011)
20.
go back to reference Kang, I.-S., et al.: On co-authorship for author disambiguation. Inf. Process. Manag. 45(1), 84–97 (2009)CrossRef Kang, I.-S., et al.: On co-authorship for author disambiguation. Inf. Process. Manag. 45(1), 84–97 (2009)CrossRef
21.
go back to reference Tang, J., Fong, A.C.M., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2012)CrossRef Tang, J., Fong, A.C.M., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2012)CrossRef
22.
go back to reference Tang, J., Lu, Q., Wang, T., Wang, J., Li, W.: A bipartite graph based social network splicing method for person name disambiguation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011). ACM, New York, pp. 1233–1234 (2011) Tang, J., Lu, Q., Wang, T., Wang, J., Li, W.: A bipartite graph based social network splicing method for person name disambiguation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011). ACM, New York, pp. 1233–1234 (2011)
23.
go back to reference Tan, Y.F., Kan, M.Y., Lee, D.: Search engine driven author disambiguation. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, JCDL, Chapel Hill, NC, USA, 11–15 June, pp. 314–315 (2006) Tan, Y.F., Kan, M.Y., Lee, D.: Search engine driven author disambiguation. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, JCDL, Chapel Hill, NC, USA, 11–15 June, pp. 314–315 (2006)
24.
go back to reference Zepeda-Mendoza, M.L., Resendis-Antonio, O.: Hierarchical agglomerative clustering. Encycl. Syst. Biol. 43(1), 886–887 (2013)CrossRef Zepeda-Mendoza, M.L., Resendis-Antonio, O.: Hierarchical agglomerative clustering. Encycl. Syst. Biol. 43(1), 886–887 (2013)CrossRef
Metadata
Title
ANDMC: An Algorithm for Author Name Disambiguation Based on Molecular Cross Clustering
Authors
Siyang Zhang
Xinhua E
Tao Huang
Fan Yang
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-18590-9_12

Premium Partner