Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2015

01.12.2015 | Original Article

Name disambiguation from link data in a collaboration graph using temporal and topological features

verfasst von: Tanay Kumar Saha, Baichuan Zhang, Mohammad Al Hasan

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error lead to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple persons with the objective that each partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from timestamped link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
http://dblp.org/search/index.php.
 
2
http://arnetminer.org.
 
Literatur
Zurück zum Zitat Allison P, Long JS (1987) Interuniversity mobility of academic scientists. Am. Sociol. Rev. 52(5):643–652CrossRef Allison P, Long JS (1987) Interuniversity mobility of academic scientists. Am. Sociol. Rev. 52(5):643–652CrossRef
Zurück zum Zitat Bhattacharya I, Getoor L (2004) Deduplication and group detection using links. In: Proceedings of the ACM SIGKDD Workshop on Link Analysis and Group Detection (LinkKDD) Bhattacharya I, Getoor L (2004) Deduplication and group detection using links. In: Proceedings of the ACM SIGKDD Workshop on Link Analysis and Group Detection (LinkKDD)
Zurück zum Zitat Bhattacharya I, Getoor L (2006) A latent dirichlet model for unsupervised entity resolution. In: Proceedings of the SIAM international conference on data mining, pp 47–58 Bhattacharya I, Getoor L (2006) A latent dirichlet model for unsupervised entity resolution. In:  Proceedings of the SIAM international conference on data mining, pp 47–58
Zurück zum Zitat Bunescu R, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp 9–16 Bunescu R, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp 9–16
Zurück zum Zitat Cen L, Dragut EC, Si L, Ouzzani M (2013) Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: Proceedings of the 36th International ACM SIGIR conference on Research and development in information retrieval, pp 741–744 Cen L, Dragut EC, Si L, Ouzzani M (2013) Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: Proceedings of the 36th International ACM SIGIR conference on Research and development in information retrieval, pp 741–744
Zurück zum Zitat Chin WS, Juan YC, et al (2013) Effective string processing and matching for author disambiguation. In: Proceedings of the KDD Cup 2013 Workshop, pp 71–79 Chin WS, Juan YC, et al (2013) Effective string processing and matching for author disambiguation. In: Proceedings of the KDD Cup 2013 Workshop, pp 71–79
Zurück zum Zitat Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp 708–716 Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp 708–716
Zurück zum Zitat Han H, Giles L, Zha H, Li C, Tsioutsiouliklis K (2004) Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, pp 334–343 Han H, Giles L, Zha H, Li C, Tsioutsiouliklis K (2004) Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, pp 334–343
Zurück zum Zitat Han H, Zha H, Giles CL (2005) Name disambiguation in author citations using a k-way spectral clustering method. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, pp 334–343 Han H, Zha H, Giles CL (2005) Name disambiguation in author citations using a k-way spectral clustering method. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, pp 334–343
Zurück zum Zitat Hermansson L, Kerola T, Johansson F, Jethava V, Dubhashi D (2013) Entity disambiguation in anonymized graphs using graph kernels. In: Proceedings of the 22nd ACM international conference on information knowledge management, pp 1037–1046 Hermansson L, Kerola T, Johansson F, Jethava V, Dubhashi D (2013) Entity disambiguation in anonymized graphs using graph kernels. In: Proceedings of the 22nd ACM international conference on information knowledge management, pp 1037–1046
Zurück zum Zitat Jackson MO (2008) Social and economic networks. Princeton University Press, Princeton Jackson MO (2008) Social and economic networks. Princeton University Press, Princeton
Zurück zum Zitat Kataria SS, Kumar KS, Rastogi RR, Sen P, Sengamedu SH (2011) Entity disambiguation with hierarchical topic models. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1037–1045 Kataria SS, Kumar KS, Rastogi RR, Sen P, Sengamedu SH (2011) Entity disambiguation with hierarchical topic models. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1037–1045
Zurück zum Zitat Li Y, Wang C, Han F, Han J, Roth D, Yan X (2013) Mining evidences for named entity disambiguation. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1070–1078 Li Y, Wang C, Han F, Han J, Roth D, Yan X (2013) Mining evidences for named entity disambiguation. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1070–1078
Zurück zum Zitat Liu J, Lei KH, Liu JY, Wang C, Han J (2013) Ranking-based name matching for author disambiguation in bibliographic data. In: Proceedings of the KDD Cup 2013 Workshop, pp 81–88 Liu J, Lei KH, Liu JY, Wang C, Han J (2013) Ranking-based name matching for author disambiguation in bibliographic data. In: Proceedings of the KDD Cup 2013 Workshop, pp 81–88
Zurück zum Zitat Malin B (2005) Unsupervised name disambiguation via social network similarity. In: Proceedings of the SIAM Workshop on Link Analysis, Counterterrorism, and Security, pp 93–102 Malin B (2005) Unsupervised name disambiguation via social network similarity. In: Proceedings of the SIAM Workshop on Link Analysis, Counterterrorism, and Security, pp 93–102
Zurück zum Zitat Minkov E, Cohen WW, Ng AY (2006) Contextual search and name disambiguation in email using graphs. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 27–34 Minkov E, Cohen WW, Ng AY (2006) Contextual search and name disambiguation in email using graphs. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 27–34
Zurück zum Zitat Newman MEJ (2006) Modularity and community structure in networks. In: Proceedings of the National Academy of Sciences, pp 8577–8582 Newman MEJ (2006) Modularity and community structure in networks. In: Proceedings of the National Academy of Sciences, pp 8577–8582
Zurück zum Zitat Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 269–278 Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining,  pp 269–278
Zurück zum Zitat Sen P (2012) Collective context-aware topic models for entity disambiguation. In: Proceedings of the 21st international conference on World Wide Web, pp 729–738 Sen P (2012) Collective context-aware topic models for entity disambiguation. In: Proceedings of the 21st international conference on World Wide Web, pp 729–738
Zurück zum Zitat Tan YF, Kan MY, Lee D (2006) Search engine driven author disambiguation. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pp 314–315 Tan YF, Kan MY, Lee D (2006) Search engine driven author disambiguation. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pp 314–315
Zurück zum Zitat Tang J, Fong ACM, Wang B, Zhang J (2012) A unified probabilistic framework for name disambiguation in digital library. IEEE Trans Knowl Data Eng 24(6):975–987CrossRef Tang J, Fong ACM, Wang B, Zhang J (2012) A unified probabilistic framework for name disambiguation in digital library. IEEE Trans Knowl Data Eng 24(6):975–987CrossRef
Zurück zum Zitat Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395-416 Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395-416
Zurück zum Zitat Wang F, Li J, Tang J, Zhang J, Wang K (2008) Name disambiguation using atomic clusters. In: Proceedings of the 9th international conference on Web-Age information management, pp 357–364 Wang F, Li J, Tang J, Zhang J, Wang K (2008) Name disambiguation using atomic clusters. In: Proceedings of the 9th international conference on Web-Age information management, pp 357–364
Zurück zum Zitat Wang X, Tang J, Cheng H, Yu PS (2011) Adana: active name disambiguation. In: Proceedings of the IEEE 11th international conference on data mining, pp 794–803 Wang X, Tang J, Cheng H, Yu PS (2011) Adana: active name disambiguation. In: Proceedings of the IEEE 11th international conference on data mining, pp 794–803
Zurück zum Zitat Whang SE, Garcia-Molina H (2010) Entity resolution with evolving rules. In: Proceedings of the VLDB Endowment 3(1–2):1326–1337 Whang SE, Garcia-Molina H (2010) Entity resolution with evolving rules. In: Proceedings of the VLDB Endowment 3(1–2):1326–1337
Zurück zum Zitat Whang SE, Menestrina D, Koutrika G, Theobald M, Garcia-Molina H (2009) Entity resolution with iterative blocking. In: Proceedings of the ACM SIGMOD International Conference on Management of data, pp 219-232 Whang SE, Menestrina D, Koutrika G, Theobald M, Garcia-Molina H (2009) Entity resolution with iterative blocking. In: Proceedings of the ACM SIGMOD International Conference on Management of data, pp 219-232
Zurück zum Zitat Yin X, Han J, Yu P (2007) Object distinction: distinguishing objects with identical names. Data Eng 1242–1246 Yin X, Han J, Yu P (2007) Object distinction: distinguishing objects with identical names. Data Eng 1242–1246
Zurück zum Zitat Zhang B, Saha TK, Hasan MA (2014) Name disambiguation from link data in a collaboration graph. In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining, pp 81–84 Zhang B, Saha TK, Hasan MA (2014) Name disambiguation from link data in a collaboration graph. In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining, pp 81–84
Zurück zum Zitat Zhang D, Tang J, Li J, Wang K (2007) A constraint-based probabilistic framework for name disambiguation. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp 1019–22 Zhang D, Tang J, Li J, Wang K (2007) A constraint-based probabilistic framework for name disambiguation. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp 1019–22
Metadaten
Titel
Name disambiguation from link data in a collaboration graph using temporal and topological features
verfasst von
Tanay Kumar Saha
Baichuan Zhang
Mohammad Al Hasan
Publikationsdatum
01.12.2015
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2015
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-015-0249-1

Weitere Artikel der Ausgabe 1/2015

Social Network Analysis and Mining 1/2015 Zur Ausgabe