Skip to main content

2017 | OriginalPaper | Buchkapitel

A Novel Approach for Author Name Disambiguation Using Ranking Confidence

verfasst von : Xueqin Lin, Jia Zhu, Yong Tang, Fen Yang, Bo Peng, Weiling Li

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In digital libraries, ambiguous author names may occur because of the existence of multiple authors with the same name or different name variations for the same person. In recent years, name disambiguation has become a major challenge when integrating data from multiple sources in bibliographic digital libraries. Most of the previous works solve this issue by using many attributes, such as coauthors, title of articles/publications, topics of articles, and years of publications. However, in most cases, we can only get the coauthor and title attributes. In this paper, we propose an approach which is based on Hierarchical Agglomerative Clustering (HAC) and only use the coauthor and title attributes, but can more effectively identify the disambiguation authors. The whole algorithm can divide into two stages. In the first stage, we employ a pair-wise grouping algorithm which is based on coauthors’name to group records into clusters. Then, we merge two clusters if the similarity of the article titles from two clusters reach the threshold. Here, we use three kinds of similarity algorithms such as Jaccard Similarity, Cosine Similarity and Euclidean Distance to compare the similarity between the titles of two clusters. To minimize the risk of using only one similarity metric, we design the concept of ranking confidence to measure the confidence of different similarity meausrements. The ranking confidence decides which similarity measure to use when merging clusters. In the experiments, we use PairPresicion, PairRecall and PairF1 score to evaluate our method and compare with other methods. Experimental results indicate that our method significantly outperforms the baseline methods: HAC, K-means and SACluster when only use coauthor and title attributes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Arif, T., Ali, R., Asger, M.: Author name disambiguation using vector space model and hybrid similarity measures. In: International Conference on Contemporary Computing-IC, pp. 135–140 (2014) Arif, T., Ali, R., Asger, M.: Author name disambiguation using vector space model and hybrid similarity measures. In: International Conference on Contemporary Computing-IC, pp. 135–140 (2014)
2.
Zurück zum Zitat Bishop, T.A., Dudewicz, E.J.: Complete ranking of reliability-related distributions. IEEE Trans. Reliab. R–26(5), 362–365 (1977)CrossRefMATH Bishop, T.A., Dudewicz, E.J.: Complete ranking of reliability-related distributions. IEEE Trans. Reliab. R–26(5), 362–365 (1977)CrossRefMATH
3.
Zurück zum Zitat Cen, L., Dragut, E.C., Si, L., Ouzzani, M.: Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: International ACM SIGIR Conference on Research and Development in Information Retrieval (2013) Cen, L., Dragut, E.C., Si, L., Ouzzani, M.: Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: International ACM SIGIR Conference on Research and Development in Information Retrieval (2013)
4.
Zurück zum Zitat Cota, R.G., Ferreira, A.A., Nascimento, C., Goncalves, M.A., Laender, A.H.F.: An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. J. Am. Soc. Inf. Sci. Technol. 61(9), 1853–1870 (2010)CrossRef Cota, R.G., Ferreira, A.A., Nascimento, C., Goncalves, M.A., Laender, A.H.F.: An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. J. Am. Soc. Inf. Sci. Technol. 61(9), 1853–1870 (2010)CrossRef
5.
Zurück zum Zitat Han, H., Giles, L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the Joint ACM/IEEE Conference on Digital Libraries, pp. 296–305 (2004) Han, H., Giles, L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the Joint ACM/IEEE Conference on Digital Libraries, pp. 296–305 (2004)
6.
Zurück zum Zitat Han, H., Zha, H., Giles, C.L.: A model-based k-means algorithm for name disambiguation. In: International Semantic Web Conference (2003) Han, H., Zha, H., Giles, C.L.: A model-based k-means algorithm for name disambiguation. In: International Semantic Web Conference (2003)
7.
Zurück zum Zitat Han, H., Zha, H., Giles, C.L.: Name disambiguation spectral in author citations using a k-way clustering method. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, JCDL, Denver, CO, USA, 7–11 June, pp. 334–343 (2005) Han, H., Zha, H., Giles, C.L.: Name disambiguation spectral in author citations using a k-way clustering method. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, JCDL, Denver, CO, USA, 7–11 June, pp. 334–343 (2005)
8.
Zurück zum Zitat Kang, I.S., Na, S.H., Lee, S., Jung, H., Kim, P., Sung, W.K., Lee, J.H.: On co-authorship for author disambiguation. Inf. Process. Manag. 45(1), 84–97 (2009)CrossRef Kang, I.S., Na, S.H., Lee, S., Jung, H., Kim, P., Sung, W.K., Lee, J.H.: On co-authorship for author disambiguation. Inf. Process. Manag. 45(1), 84–97 (2009)CrossRef
9.
Zurück zum Zitat Li, S., Cong, G., Miao, C.: Author name disambiguation using a new categorical distribution similarity. In: Flach, P.A., Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7523, pp. 569–584. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33460-3_42 CrossRef Li, S., Cong, G., Miao, C.: Author name disambiguation using a new categorical distribution similarity. In: Flach, P.A., Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7523, pp. 569–584. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-33460-3_​42 CrossRef
10.
Zurück zum Zitat Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967) Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
11.
Zurück zum Zitat Mann, G.S., Yarowsky, D.: Unsupervised personal name disambiguation, pp. 33–40 (2004) Mann, G.S., Yarowsky, D.: Unsupervised personal name disambiguation, pp. 33–40 (2004)
12.
Zurück zum Zitat Nadimi, M.H., Mosakhani, M.: A more accurate clustering method by using co-author social networks for author name disambiguation. J. Comput. Secur. 1, 307–317 (2015) Nadimi, M.H., Mosakhani, M.: A more accurate clustering method by using co-author social networks for author name disambiguation. J. Comput. Secur. 1, 307–317 (2015)
13.
Zurück zum Zitat On, B.W.: Social network analysis on name disambiguation and more. In: International Conference on Convergence and Hybrid Information Technology, pp. 1081–1088 (2008) On, B.W.: Social network analysis on name disambiguation and more. In: International Conference on Convergence and Hybrid Information Technology, pp. 1081–1088 (2008)
14.
Zurück zum Zitat On, B.W., Lee, I.: Meta similarity. Appl. Intell. 35(3), 359–374 (2011)CrossRef On, B.W., Lee, I.: Meta similarity. Appl. Intell. 35(3), 359–374 (2011)CrossRef
15.
Zurück zum Zitat Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. In: NIPS, pp. 1425–1432 (2003) Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. In: NIPS, pp. 1425–1432 (2003)
16.
Zurück zum Zitat Quan, L., Bo, W., Yuan, D.U., Wang, X., Yuhua, L.I.: Disambiguating authors by pairwise classification. Tsinghua Sci. Technol. 15(6), 668–677 (2010)CrossRef Quan, L., Bo, W., Yuan, D.U., Wang, X., Yuhua, L.I.: Disambiguating authors by pairwise classification. Tsinghua Sci. Technol. 15(6), 668–677 (2010)CrossRef
17.
Zurück zum Zitat Tan, Y.F., Kan, M.Y., Lee, D.: Search engine driven author disambiguation. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, JCDL, Chapel Hill, NC, USA, 11–15, June, pp. 314–315 (2006) Tan, Y.F., Kan, M.Y., Lee, D.: Search engine driven author disambiguation. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, JCDL, Chapel Hill, NC, USA, 11–15, June, pp. 314–315 (2006)
18.
Zurück zum Zitat Tang, J., Fong, A.C.M., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2011)CrossRef Tang, J., Fong, A.C.M., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2011)CrossRef
19.
Zurück zum Zitat Yang, K.-H., Peng, H.-T., Jiang, J.-Y., Lee, H.-M., Ho, J.-M.: Author name disambiguation for citations using topic and web correlation. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 185–196. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87599-4_19 CrossRef Yang, K.-H., Peng, H.-T., Jiang, J.-Y., Lee, H.-M., Ho, J.-M.: Author name disambiguation for citations using topic and web correlation. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 185–196. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-87599-4_​19 CrossRef
20.
Zurück zum Zitat Yin, X., Han, J., Yu, P.S.: Object distinction: distinguishing objects with identical names. In: International Conference on Data Engineering, ICDE, The Marmara Hotel, Istanbul, Turkey, April, pp. 1242–1246 (2007) Yin, X., Han, J., Yu, P.S.: Object distinction: distinguishing objects with identical names. In: International Conference on Data Engineering, ICDE, The Marmara Hotel, Istanbul, Turkey, April, pp. 1242–1246 (2007)
21.
Zurück zum Zitat Zepeda-Mendoza, M.L., Resendis-Antonio, O.: Hierarchical agglomerative clustering. Encycl. Syst. Biol. 43(1), 886–887 (2013)CrossRef Zepeda-Mendoza, M.L., Resendis-Antonio, O.: Hierarchical agglomerative clustering. Encycl. Syst. Biol. 43(1), 886–887 (2013)CrossRef
22.
Zurück zum Zitat Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)CrossRef Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)CrossRef
23.
Zurück zum Zitat Zhu, J., Fung, G., Wang, L.: Efficient name disambiguation in digital libraries. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) WAIM 2011. LNCS, vol. 6897, pp. 430–441. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23535-1_37 CrossRef Zhu, J., Fung, G., Wang, L.: Efficient name disambiguation in digital libraries. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) WAIM 2011. LNCS, vol. 6897, pp. 430–441. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-23535-1_​37 CrossRef
24.
Zurück zum Zitat Zhu, J., Cheong Fung, G.P., Zhou, X.: Anddy: a system for author name disambiguation in digital library. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5982, pp. 444–447. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12098-5_46 CrossRef Zhu, J., Cheong Fung, G.P., Zhou, X.: Anddy: a system for author name disambiguation in digital library. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5982, pp. 444–447. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-12098-5_​46 CrossRef
25.
Zurück zum Zitat Zhu, J., Zhou, X., Fung, G.P.C.: A term-based driven clustering approach for name disambiguation. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM -2009. LNCS, vol. 5446, pp. 320–331. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00672-2_29 CrossRef Zhu, J., Zhou, X., Fung, G.P.C.: A term-based driven clustering approach for name disambiguation. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM -2009. LNCS, vol. 5446, pp. 320–331. Springer, Heidelberg (2009). doi:10.​1007/​978-3-642-00672-2_​29 CrossRef
Metadaten
Titel
A Novel Approach for Author Name Disambiguation Using Ranking Confidence
verfasst von
Xueqin Lin
Jia Zhu
Yong Tang
Fen Yang
Bo Peng
Weiling Li
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-55705-2_13