Skip to main content

2016 | OriginalPaper | Buchkapitel

Person Name Disambiguation for Building University Knowledge Base

verfasst von : Piotr Andruszkiewicz, Szymon Szepietowski

Erschienen in: Intelligent Information and Database Systems

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper we propose a new algorithm for person name disambiguation within authors of scientific publications. The algorithm is effective, elastic, and tailored to a scientific knowledge base. Besides the common properties of publication; namely, title, venue, author and co-authors names, it also exploits references. One of the reasons is that we decided to enrich the University Knowledge Base with connections between publications, not only references represented by a reference (i.e. author’s name, title, etc.). Our algorithm utilises the unsupervised approach which does not require creating a training set, which is time and resources consuming. However, we want to leverage additional information available from crowd sourcing or authorised users which confirms authorship and citation relations between papers. By utilising this information default parameters of the unsupervised algorithm can be optimised for a given case by means of a genetic algorithm in order to increase the accuracy. The proposed method can be applied for three tasks: assigning a publication to a specific researcher, indicating that a new author is yet unknown to the database and clustering a set of publications into clusters that contain papers of one researcher. Validation results confirm high accuracy of the new algorithm and its usefulness in the process of populating a scientific knowledge base.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Koperwas, J., Skonieczny, Ł., Kozłowski, M., Andruszkiewicz, P., Rybiński, H., Struk, W.: AI platform for building university research knowledge base. In: Andreasen, T., Christiansen, H., Cubero, J.-C., Raś, Z.W. (eds.) ISMIS 2014. LNCS, vol. 8502, pp. 405–414. Springer, Heidelberg (2014) Koperwas, J., Skonieczny, Ł., Kozłowski, M., Andruszkiewicz, P., Rybiński, H., Struk, W.: AI platform for building university research knowledge base. In: Andreasen, T., Christiansen, H., Cubero, J.-C., Raś, Z.W. (eds.) ISMIS 2014. LNCS, vol. 8502, pp. 405–414. Springer, Heidelberg (2014)
2.
Zurück zum Zitat Smalheiser, N.R., Torvik, V.I.: Author name disambiguation. ARIST 43(1), 1–43 (2009) Smalheiser, N.R., Torvik, V.I.: Author name disambiguation. ARIST 43(1), 1–43 (2009)
3.
Zurück zum Zitat Ferreira, A.A., Gonçalves, M.A., Laender, A.H.F.: A brief survey of automatic methods for author name disambiguation. SIGMOD Rec. 41(2), 15–26 (2012)CrossRef Ferreira, A.A., Gonçalves, M.A., Laender, A.H.F.: A brief survey of automatic methods for author name disambiguation. SIGMOD Rec. 41(2), 15–26 (2012)CrossRef
4.
Zurück zum Zitat Han, H., Giles, C.L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Chen, H., Wactlar, H.D., Chen, C., Lim, E., Christel, M.G. (eds.) Proceedings of ACM/IEEE Joint Conference on Digital Libraries, JCDL 2004, Tucson, AZ, USA, 7–11 June 2004, pp. 296–305. ACM (2004) Han, H., Giles, C.L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Chen, H., Wactlar, H.D., Chen, C., Lim, E., Christel, M.G. (eds.) Proceedings of ACM/IEEE Joint Conference on Digital Libraries, JCDL 2004, Tucson, AZ, USA, 7–11 June 2004, pp. 296–305. ACM (2004)
5.
Zurück zum Zitat Ferreira, A.A., Veloso, A., Gonçalves, M.A., Laender, A.H.: Effective self-training author name disambiguation in scholarly digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 39–48. ACM (2010) Ferreira, A.A., Veloso, A., Gonçalves, M.A., Laender, A.H.: Effective self-training author name disambiguation in scholarly digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 39–48. ACM (2010)
6.
Zurück zum Zitat Veloso, A., Ferreira, A.A., Gonçalves, M.A., Laender, A.H.F., Meira, Jr. W.: Cost-effective on-demand associative author name disambiguation. Inf. Process. Manage. vol. 48(4), pp. 680–967 (2012) Veloso, A., Ferreira, A.A., Gonçalves, M.A., Laender, A.H.F., Meira, Jr. W.: Cost-effective on-demand associative author name disambiguation. Inf. Process. Manage. vol. 48(4), pp. 680–967 (2012)
7.
Zurück zum Zitat Tang, J., Yao, L., Zhang, D., Zhang, J.: A combination approach to web user profiling. ACM Trans. Knowl. Discov. Data 5(1), 2: 1–2: 44 (2010)CrossRef Tang, J., Yao, L., Zhang, D., Zhang, J.: A combination approach to web user profiling. ACM Trans. Knowl. Discov. Data 5(1), 2: 1–2: 44 (2010)CrossRef
8.
Zurück zum Zitat Tang, J., Fong, A.C.M., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2012)CrossRef Tang, J., Fong, A.C.M., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2012)CrossRef
9.
Zurück zum Zitat Li, S., Cong, G., Miao, C.: Author name disambiguation using a new categorical distribution similarity. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 569–584. Springer, Heidelberg (2012)CrossRef Li, S., Cong, G., Miao, C.: Author name disambiguation using a new categorical distribution similarity. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 569–584. Springer, Heidelberg (2012)CrossRef
10.
Zurück zum Zitat Liu, Y., Li, W., Huang, Z., Fang, Q.: A fast method based on multiple clustering for name disambiguation in bibliographic citations. JASIST 66(3), 634–644 (2015) Liu, Y., Li, W., Huang, Z., Fang, Q.: A fast method based on multiple clustering for name disambiguation in bibliographic citations. JASIST 66(3), 634–644 (2015)
11.
Zurück zum Zitat Yin, X., Han, J., Yu, P.S.: Object distinction: Distinguishing objects with identical names. In: Chirkova, R., Dogac, A., Özsu, M.T., Sellis, T.K., (eds.) Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, 15–20 April 2007, pp. 1242–1246. IEEE (2007) Yin, X., Han, J., Yu, P.S.: Object distinction: Distinguishing objects with identical names. In: Chirkova, R., Dogac, A., Özsu, M.T., Sellis, T.K., (eds.) Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, 15–20 April 2007, pp. 1242–1246. IEEE (2007)
12.
Zurück zum Zitat Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a k-way spectral clustering method. In: Marlino, M., Sumner, T., III, F.M.S., (eds.) Proceedings of ACM/IEEE Joint Conference on Digital Libraries, JCDL 2005, Denver, CO, USA, 7–11 June 2005, pp. 334–343. ACM (2005) Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a k-way spectral clustering method. In: Marlino, M., Sumner, T., III, F.M.S., (eds.) Proceedings of ACM/IEEE Joint Conference on Digital Libraries, JCDL 2005, Denver, CO, USA, 7–11 June 2005, pp. 334–343. ACM (2005)
13.
Zurück zum Zitat Cota, R.G., Ferreira, A.A., Nascimento, C., Gonçalves, M.A., Laender, A.H.F.: An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. JASIST 61(9), 1853–1870 (2010)CrossRef Cota, R.G., Ferreira, A.A., Nascimento, C., Gonçalves, M.A., Laender, A.H.F.: An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. JASIST 61(9), 1853–1870 (2010)CrossRef
14.
Zurück zum Zitat Pereira, D.A., Ribeiro-Neto, B.A., Ziviani, N., Laender, A.H.F., Gonçalves, M.A., Ferreira, A.A.: Using web information for author name disambiguation. In: Heath, F., Rice-Lively, M.L., Furuta, R., (eds.) Proceedings of the 2009 Joint International Conference on Digital Libraries, JCDL 2009, Austin, TX, USA, 15–19 June 2009, pp. 49–58. ACM (2009) Pereira, D.A., Ribeiro-Neto, B.A., Ziviani, N., Laender, A.H.F., Gonçalves, M.A., Ferreira, A.A.: Using web information for author name disambiguation. In: Heath, F., Rice-Lively, M.L., Furuta, R., (eds.) Proceedings of the 2009 Joint International Conference on Digital Libraries, JCDL 2009, Austin, TX, USA, 15–19 June 2009, pp. 49–58. ACM (2009)
15.
Zurück zum Zitat Peng, H., Lu, C., Hsu, W., Ho, J.: Disambiguating authors in citations on the web and authorship correlations. Expert Syst. Appl. 39(12), 10521–10532 (2012)CrossRef Peng, H., Lu, C., Hsu, W., Ho, J.: Disambiguating authors in citations on the web and authorship correlations. Expert Syst. Appl. 39(12), 10521–10532 (2012)CrossRef
16.
Zurück zum Zitat de Souza, E.A., Ferreira, A.A., Gonçalves, M.A.: Combining classifiers and user feedback for disambiguating author names. In: II, P.L.B., Allard, S., Mercer, H., Beck, M., Cunningham, S.J., Goh, D.H., Henry, G., (eds.) Proceedings of the 15th ACM/IEEE-CE on Joint Conference on Digital Libraries, Knoxville, TN, USA, 21–25 June 2015, pp. 259–260. ACM (2015) de Souza, E.A., Ferreira, A.A., Gonçalves, M.A.: Combining classifiers and user feedback for disambiguating author names. In: II, P.L.B., Allard, S., Mercer, H., Beck, M., Cunningham, S.J., Goh, D.H., Henry, G., (eds.) Proceedings of the 15th ACM/IEEE-CE on Joint Conference on Digital Libraries, Knoxville, TN, USA, 21–25 June 2015, pp. 259–260. ACM (2015)
Metadaten
Titel
Person Name Disambiguation for Building University Knowledge Base
verfasst von
Piotr Andruszkiewicz
Szymon Szepietowski
Copyright-Jahr
2016
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-49381-6_26