Skip to main content

2017 | OriginalPaper | Buchkapitel

Doctoral Advisor or Medical Condition: Towards Entity-Specific Rankings of Knowledge Base Properties

verfasst von : Simon Razniewski, Vevake Balaraman, Werner Nutt

Erschienen in: Advanced Data Mining and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In knowledge bases such as Wikidata, it is possible to assert a large set of properties for entities, ranging from generic ones such as name and place of birth to highly profession-specific or background-specific ones such as doctoral advisor or medical condition. Determining a preference or ranking in this large set is a challenge in tasks such as prioritisation of edits or natural-language generation. Most previous approaches to ranking knowledge base properties are purely data-driven, that is, as we show, mistake frequency for interestingness. In this work, we have developed a human-annotated dataset of 350 preference judgments among pairs of knowledge base properties for fixed entities. From this set, we isolate a subset of pairs for which humans show a high level of agreement (87.5% on average). We show, however, that baseline and state-of-the-art techniques achieve only 61.3% precision in predicting human preferences for this subset. We then develop a technique based on a combination of general frequency, applicability to similar entities and semantic similarity that achieves 74% precision. The preference dataset is available at https://​www.​kaggle.​com/​srazniewski/​wikidataproperty​ranking.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
3
For instance, as of March 21st, 2016, there were 2202 properties, while as of February 7, 2017, there are 2719 according to https://​tools.​wmflabs.​org/​hay/​propbrowse/​.
 
6
Not to be mixed with ensemble learning, a machine learning approach where consecutive instances of the same classifier are trained especially on records that previous instances predicted wrongly. Ensemble learning requires a sufficient amount of labeled training data, which is not available in our case.
 
Literatur
1.
Zurück zum Zitat Abedjan, Z., Naumann, F.: Improving RDF data through association rule mining. Datenbank-Spektrum 13(2), 111–120 (2013)CrossRef Abedjan, Z., Naumann, F.: Improving RDF data through association rule mining. Datenbank-Spektrum 13(2), 111–120 (2013)CrossRef
2.
Zurück zum Zitat Ahmeti, A., Razniewski, S., Polleres, A.: Assessing the completeness of entities in knowledge bases. In: ESWC P&D (2017) Ahmeti, A., Razniewski, S., Polleres, A.: Assessing the completeness of entities in knowledge bases. In: ESWC P&D (2017)
3.
Zurück zum Zitat Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data (2007) Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data (2007)
4.
Zurück zum Zitat Bast, H., Buchhold, B., Haussmann, E.: Relevance scores for triples from type-like relations. In: SIGIR, pp. 243–252. New York (2015) Bast, H., Buchhold, B., Haussmann, E.: Relevance scores for triples from type-like relations. In: SIGIR, pp. 243–252. New York (2015)
5.
Zurück zum Zitat Blei, D.M., Ng, Y.M., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, Y.M., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
6.
Zurück zum Zitat Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: ICML, pp. 129–136 (2007) Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: ICML, pp. 129–136 (2007)
7.
Zurück zum Zitat Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 16–27. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78646-7_5 CrossRef Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 16–27. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-78646-7_​5 CrossRef
8.
Zurück zum Zitat de Condorcet, M.: Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie Royale, Paris (1785) de Condorcet, M.: Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie Royale, Paris (1785)
9.
Zurück zum Zitat Deerwester, S.: Improving information retrieval with latent semantic indexing (1988) Deerwester, S.: Improving information retrieval with latent semantic indexing (1988)
10.
Zurück zum Zitat Dessi, A., Atzori, M.: A machine-learning approach to ranking RDF properties. FGCS 54, 366–377 (2016)CrossRef Dessi, A., Atzori, M.: A machine-learning approach to ranking RDF properties. FGCS 54, 366–377 (2016)CrossRef
11.
Zurück zum Zitat Fatma, N., Chinnakotla, M., Shrivastava, M.: The unusual suspects: deep learning based mining of interesting entity trivia from knowledge graphs. In: AAAI 2017 (2017) Fatma, N., Chinnakotla, M., Shrivastava, M.: The unusual suspects: deep learning based mining of interesting entity trivia from knowledge graphs. In: AAAI 2017 (2017)
12.
Zurück zum Zitat Gassler, W., Zangerle, E., Specht, G.: Guided curation of semistructured data in collaboratively-built knowledge bases. FGCS 31, 111–119 (2014)CrossRef Gassler, W., Zangerle, E., Specht, G.: Guided curation of semistructured data in collaboratively-built knowledge bases. FGCS 31, 111–119 (2014)CrossRef
13.
Zurück zum Zitat Heindorf, S., Potthast, M., Bast, H., Buchhold, Haussmann, E.: WSDM cup 2017: vandalism detection and triple scoring (2017) Heindorf, S., Potthast, M., Bast, H., Buchhold, Haussmann, E.: WSDM cup 2017: vandalism detection and triple scoring (2017)
14.
Zurück zum Zitat Jones, N., Brun, A., Boyer, A.: Comparisons instead of ratings: towards more stable preferences. In: WI-IAT, pp. 451–456 (2011) Jones, N., Brun, A., Boyer, A.: Comparisons instead of ratings: towards more stable preferences. In: WI-IAT, pp. 451–456 (2011)
15.
Zurück zum Zitat Kalloori, S., Ricci, F., Tkalcic, M.: Pairwise preferences based matrix factorization and nearest neighbor recommendation techniques (2016) Kalloori, S., Ricci, F., Tkalcic, M.: Pairwise preferences based matrix factorization and nearest neighbor recommendation techniques (2016)
16.
Zurück zum Zitat Langer, P., Schulze, P., George, S., Kohnen, M., Metzke, T., Abedjan, Z., Kasneci, G.: Assigning global relevance scores to dbpedia facts. In: ICDE Workshops, pp. 248–253 (2014) Langer, P., Schulze, P., George, S., Kohnen, M., Metzke, T., Abedjan, Z., Kasneci, G.: Assigning global relevance scores to dbpedia facts. In: ICDE Workshops, pp. 248–253 (2014)
17.
Zurück zum Zitat Li, H.: A short introduction to learning to rank. In: IEICE Transactions (2011) Li, H.: A short introduction to learning to rank. In: IEICE Transactions (2011)
18.
Zurück zum Zitat Mousavi, H., Gao, S., Zaniolo, C.: IBminer: a text mining tool for constructing and populating infobox databases and knowledge bases. In: VLDB (2013) Mousavi, H., Gao, S., Zaniolo, C.: IBminer: a text mining tool for constructing and populating infobox databases and knowledge bases. In: VLDB (2013)
19.
Zurück zum Zitat Pan, S.J., Yang, Q.: A survey on transfer learning. In: TKDE (2010) Pan, S.J., Yang, Q.: A survey on transfer learning. In: TKDE (2010)
20.
Zurück zum Zitat Prakash, A., Chinnakotla, M.K., Patel, D., Garg, P.: Did you know?-mining interesting trivia for entities from wikipedia (2015) Prakash, A., Chinnakotla, M.K., Patel, D., Garg, P.: Did you know?-mining interesting trivia for entities from wikipedia (2015)
21.
Zurück zum Zitat Razniewski, S., Suchanek, F.M., Nutt, W.: But what do we actually know. In: AKBC, pp. 40–44 (2016) Razniewski, S., Suchanek, F.M., Nutt, W.: But what do we actually know. In: AKBC, pp. 40–44 (2016)
22.
Zurück zum Zitat Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007) Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)
23.
Zurück zum Zitat Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base. CACM 57(10), 78–85 (2014)CrossRef Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base. CACM 57(10), 78–85 (2014)CrossRef
24.
Zurück zum Zitat Zangerle, E., Gassler, W., Pichl, M., Steinhauser, S., Specht, G.: An empirical evaluation of property recommender systems for Wikidata and collaborative knowledge bases. In: Opensym (2016) Zangerle, E., Gassler, W., Pichl, M., Steinhauser, S., Specht, G.: An empirical evaluation of property recommender systems for Wikidata and collaborative knowledge bases. In: Opensym (2016)
Metadaten
Titel
Doctoral Advisor or Medical Condition: Towards Entity-Specific Rankings of Knowledge Base Properties
verfasst von
Simon Razniewski
Vevake Balaraman
Werner Nutt
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-69179-4_37

Premium Partner