Skip to main content
Top

2018 | OriginalPaper | Chapter

Representativeness of Knowledge Bases with the Generalized Benford’s Law

Authors : Arnaud Soulet, Arnaud Giacometti, Béatrice Markhoff, Fabian M. Suchanek

Published in: The Semantic Web – ISWC 2018

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Knowledge bases (KBs) such as DBpedia, Wikidata, and YAGO contain a huge number of entities and facts. Several recent works induce rules or calculate statistics on these KBs. Most of these methods are based on the assumption that the data is a representative sample of the studied universe. Unfortunately, KBs are biased because they are built from crowdsourcing and opportunistic agglomeration of available databases. This paper aims at approximating the representativeness of a relation within a knowledge base. For this, we use the generalized Benford’s law, which indicates the distribution expected by the facts of a relation. We then compute the minimum number of facts that have to be added in order to make the KB representative of the real world. Experiments show that our unsupervised method applies to a large number of relations. For numerical relations where ground truths exist, the estimated representativeness proves to be a reliable indicator.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Different from \(\alpha \), the representativeness varies only between 0 and 1.
 
Literature
1.
go back to reference Alam, M., Buzmakov, A., Codocedo, V., Napoli, A.: Mining definitions from RDF annotations using formal concept analysis. In: IJCAI (2015) Alam, M., Buzmakov, A., Codocedo, V., Napoli, A.: Mining definitions from RDF annotations using formal concept analysis. In: IJCAI (2015)
4.
go back to reference Benford, F.: The law of anomalous numbers. In: Proceedings of the American Philosophical Society, pp. 551–572 (1938) Benford, F.: The law of anomalous numbers. In: Proceedings of the American Philosophical Society, pp. 551–572 (1938)
5.
go back to reference Broder, A., et al.: Graph structure in the web. Comput. Netw. 33(1–6), 309–320 (2000)CrossRef Broder, A., et al.: Graph structure in the web. Comput. Netw. 33(1–6), 309–320 (2000)CrossRef
6.
go back to reference Callahan, E.S., Herring, S.C.: Cultural bias in Wikipedia content on famous persons. J. Assoc. Inf. Sci. Technol. 62(10), 1899–1915 (2011)CrossRef Callahan, E.S., Herring, S.C.: Cultural bias in Wikipedia content on famous persons. J. Assoc. Inf. Sci. Technol. 62(10), 1899–1915 (2011)CrossRef
7.
go back to reference de la Croix, D., Licandro, O.: The longevity of famous people from Hammurabi to Einstein. J. Econ. Growth 20(3), 263–303 (2015)CrossRef de la Croix, D., Licandro, O.: The longevity of famous people from Hammurabi to Einstein. J. Econ. Growth 20(3), 263–303 (2015)CrossRef
9.
go back to reference Efron, B.: The Jackknife, the Bootstrap, and Other Resampling Plans, vol. 38. SIAM, Philadelphia (1982)CrossRef Efron, B.: The Jackknife, the Bootstrap, and Other Resampling Plans, vol. 38. SIAM, Philadelphia (1982)CrossRef
10.
go back to reference Galárraga, L., Hose, K., Razniewski, S.: Enabling completeness-aware querying in SPARQL. In: Proceedings of the 20th International Workshop on the Web and Databases, pp. 19–22. ACM (2017) Galárraga, L., Hose, K., Razniewski, S.: Enabling completeness-aware querying in SPARQL. In: Proceedings of the 20th International Workshop on the Web and Databases, pp. 19–22. ACM (2017)
11.
go back to reference Galárraga, L., Razniewski, S., Amarilli, A., Suchanek, F.M.: Predicting completeness in knowledge bases. In: WSDM, pp. 375–383. ACM (2017) Galárraga, L., Razniewski, S., Amarilli, A., Suchanek, F.M.: Predicting completeness in knowledge bases. In: WSDM, pp. 375–383. ACM (2017)
12.
go back to reference Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: Fast rule mining in ontological knowledge bases with AMIE++. VLDB J. 24(6), 707–730 (2015)CrossRef Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: Fast rule mining in ontological knowledge bases with AMIE++. VLDB J. 24(6), 707–730 (2015)CrossRef
14.
go back to reference Hellmann, S., Lehmann, J., Auer, S.: Learning of OWL class descriptions on very large knowledge bases. Int. J. Semant. Web Inf. Syst. 5, 25–48 (2009)CrossRef Hellmann, S., Lehmann, J., Auer, S.: Learning of OWL class descriptions on very large knowledge bases. Int. J. Semant. Web Inf. Syst. 5, 25–48 (2009)CrossRef
15.
go back to reference Hürlimann, W.: A first digit theorem for powers of perfect powers. Commun. Math. Appl. 5(3), 91–99 (2014) Hürlimann, W.: A first digit theorem for powers of perfect powers. Commun. Math. Appl. 5(3), 91–99 (2014)
16.
go back to reference Hürlimann, W.: Benford’s law in scientific research. Int. J. Sci. Eng. Res. 6(7), 143–148 (2015) Hürlimann, W.: Benford’s law in scientific research. Int. J. Sci. Eng. Res. 6(7), 143–148 (2015)
17.
go back to reference Lajus, J., Suchanek, F.M.: Are all people married? Determining obligatory attributes in knowledge bases. In: WWW (2018) Lajus, J., Suchanek, F.M.: Are all people married? Determining obligatory attributes in knowledge bases. In: WWW (2018)
18.
go back to reference Levy, A.Y.: Obtaining complete answers from incomplete databases. In: VLDB (1996) Levy, A.Y.: Obtaining complete answers from incomplete databases. In: VLDB (1996)
19.
go back to reference Mebane Jr., W.R.: Election forensics: Vote counts and Benford’s law. In: Summer Meeting of the Political Methodology Society, UC-Davis, July, pp. 20–22 (2006) Mebane Jr., W.R.: Election forensics: Vote counts and Benford’s law. In: Summer Meeting of the Political Methodology Society, UC-Davis, July, pp. 20–22 (2006)
20.
go back to reference Morzy, M., Kajdanowicz, T., Szymański, B.K.: Benford’s distribution in complex networks. Sci. Rep. 6, Article no. 34917 (2016) Morzy, M., Kajdanowicz, T., Szymański, B.K.: Benford’s distribution in complex networks. Sci. Rep. 6, Article no. 34917 (2016)
21.
22.
go back to reference Nigrini, M.: Benford’s Law: Applications for Forensic Accounting, Auditing, and Fraud Detection, vol. 586. Wiley, Hoboken (2012)CrossRef Nigrini, M.: Benford’s Law: Applications for Forensic Accounting, Auditing, and Fraud Detection, vol. 586. Wiley, Hoboken (2012)CrossRef
23.
go back to reference Nigrini, M.J.: A taxpayer compliance application of Benford’s law. J. Am. Tax. Assoc. 18(1), 72 (1996) Nigrini, M.J.: A taxpayer compliance application of Benford’s law. J. Am. Tax. Assoc. 18(1), 72 (1996)
24.
go back to reference Razniewski, S., Korn, F., Nutt, W., Srivastava, D.: Identifying the extent of completeness of query answers over partially complete databases. In: SIGMOD (2015) Razniewski, S., Korn, F., Nutt, W., Srivastava, D.: Identifying the extent of completeness of query answers over partially complete databases. In: SIGMOD (2015)
25.
go back to reference Razniewski, S., Suchanek, F., Nutt, W.: But what do we actually know? In: Proceedings of the 5th Workshop on Automated Knowledge Base Construction, pp. 40–44 (2016) Razniewski, S., Suchanek, F., Nutt, W.: But what do we actually know? In: Proceedings of the 5th Workshop on Automated Knowledge Base Construction, pp. 40–44 (2016)
26.
go back to reference Rebele, T., Nekoei, A., Suchanek, F.M.: Using YAGO for the humanities. In: WHISE workshop (2017) Rebele, T., Nekoei, A., Suchanek, F.M.: Using YAGO for the humanities. In: WHISE workshop (2017)
27.
go back to reference Schich, M., et al.: A network framework of cultural history. Science 345(6196), 558–562 (2014)CrossRef Schich, M., et al.: A network framework of cultural history. Science 345(6196), 558–562 (2014)CrossRef
28.
go back to reference Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706. ACM (2007) Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706. ACM (2007)
29.
go back to reference Suchanek, F.M., Preda, N.: Semantic culturomics. Proc. VLDB Endow. 7(12), 1215–1218 (2014)CrossRef Suchanek, F.M., Preda, N.: Semantic culturomics. Proc. VLDB Endow. 7(12), 1215–1218 (2014)CrossRef
32.
go back to reference Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)CrossRef Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)CrossRef
33.
go back to reference Wagner, C., Garcia, D., Jadidi, M., Strohmaier, M.: It’s a man’s Wikipedia? Assessing gender inequality in an online encyclopedia. In: ICWSM, pp. 454–463 (2015) Wagner, C., Garcia, D., Jadidi, M., Strohmaier, M.: It’s a man’s Wikipedia? Assessing gender inequality in an online encyclopedia. In: ICWSM, pp. 454–463 (2015)
34.
go back to reference Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)CrossRef Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)CrossRef
Metadata
Title
Representativeness of Knowledge Bases with the Generalized Benford’s Law
Authors
Arnaud Soulet
Arnaud Giacometti
Béatrice Markhoff
Fabian M. Suchanek
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-00671-6_22

Premium Partner