Skip to main content
Top

2016 | OriginalPaper | Chapter

Large Scale Semantic Relation Discovery: Toward Establishing the Missing Link Between Wikipedia and Semantic Network

Authors : Xianpei Han, Xiliang Song, Le Sun

Published in: Knowledge Graph and Semantic Computing: Semantic, Knowledge, and Linked Big Data

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Wikipedia has been the largest knowledge repository on the Web. However, most of the semantic knowledge in Wikipedia is documented in natural language, which is mostly only human readable and incomprehensible for computer processing. To establish the missing link from Wikipedia to semantic network, this paper proposes a relation discovery method, which can: (1) discover and characterize a large collection of relations from Wikipedia by exploiting the relation pattern regularity, the relation distribution regularity and the relation instance redundancy; and (2) annotate the hyperlinks between Wikipedia articles with the discovered semantic relations. Finally we discover 14,299 relations, 105,661 relation patterns and 5,214,175 relation instances from Wikipedia, and this will be a valuable resource for many NLP and AI tasks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM, New York (2000) Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM, New York (2000)
go back to reference Amigo, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Ident. Common Mol. Subsequences 12, 461–486 (2009) Amigo, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Ident. Common Mol. Subsequences 12, 461–486 (2009)
go back to reference Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_52 CrossRef Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). doi:10.​1007/​978-3-540-76298-0_​52 CrossRef
go back to reference Baker, C.F., Charles, J.F., John, B.L.: The Berkeley framenet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 86–90. Association for Computational Linguistics, Stroudsburg (1998) Baker, C.F., Charles, J.F., John, B.L.: The Berkeley framenet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 86–90. Association for Computational Linguistics, Stroudsburg (1998)
go back to reference Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, vol. 7, pp. 2670–2676 (2007) Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, vol. 7, pp. 2670–2676 (2007)
go back to reference Bunescu, R., Mooney, R.: A shortest path dependency kernel for relation extraction. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 724–731. Association for Computational Linguistics, Stroudsburg (2005) Bunescu, R., Mooney, R.: A shortest path dependency kernel for relation extraction. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 724–731. Association for Computational Linguistics, Stroudsburg (2005)
go back to reference Brin, S.: Extracting patterns and relations from the world wide web. In: International Workshop on the World Wide Web and Databases, pp. 172–183 (1999) Brin, S.: Extracting patterns and relations from the world wide web. In: International Workshop on the World Wide Web and Databases, pp. 172–183 (1999)
go back to reference Carlson, A., Betteridge, J., et al.: Toward an architecture for never-ending language learning. In: Proceedings of the Conference on Artificial Intelligence (AAAI 2010), p. 3. AAAI Press, Palo Alto (2010) Carlson, A., Betteridge, J., et al.: Toward an architecture for never-ending language learning. In: Proceedings of the Conference on Artificial Intelligence (AAAI 2010), p. 3. AAAI Press, Palo Alto (2010)
go back to reference Chan, Y.S., Roth, D.: Exploiting syntactico-semantic structures for relation extraction. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 551–560 (2011) Chan, Y.S., Roth, D.: Exploiting syntactico-semantic structures for relation extraction. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 551–560 (2011)
go back to reference Chen, H., Benson, E., et al.: In-domain relation discovery with meta-constraints via posterior regularization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 530–540. Association for Computational Linguistics, Stroudsburg (2011) Chen, H., Benson, E., et al.: In-domain relation discovery with meta-constraints via posterior regularization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 530–540. Association for Computational Linguistics, Stroudsburg (2011)
go back to reference De Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. Technical report, Stanford University, pp. 338–345 (2008) De Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. Technical report, Stanford University, pp. 338–345 (2008)
go back to reference Doddington, G., et al.: The automatic content extraction (ACE) program–tasks, data, and evaluation. In: Proceedings of LREC (2004) Doddington, G., et al.: The automatic content extraction (ACE) program–tasks, data, and evaluation. In: Proceedings of LREC (2004)
go back to reference Etzioni, O., Banko, M., et al.: Open information extraction from the web. Commun. ACM 51, 68–74 (2008)CrossRef Etzioni, O., Banko, M., et al.: Open information extraction from the web. Commun. ACM 51, 68–74 (2008)CrossRef
go back to reference Etzioni, O., et al.: Web-scale information extraction in knowitall: (preliminary results). In: Proceedings of the 13th International Conference on World Wide Web, pp. 100–110. ACM, New York (2004) Etzioni, O., et al.: Web-scale information extraction in knowitall: (preliminary results). In: Proceedings of the 13th International Conference on World Wide Web, pp. 100–110. ACM, New York (2004)
go back to reference Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of the 16th International Conference on Computational Linguistics, pp. 466–471 (1996) Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of the 16th International Conference on Computational Linguistics, pp. 466–471 (1996)
go back to reference Han, X., Sun, L.: An entity-topic model for entity linking. In: Proceedings of EMNLP-CoNLL, pp. 105–115. Association for Computational Linguistics, Stroudsburg (2012) Han, X., Sun, L.: An entity-topic model for entity linking. In: Proceedings of EMNLP-CoNLL, pp. 105–115. Association for Computational Linguistics, Stroudsburg (2012)
go back to reference Li, P., Jiang, J., et al.: Generating templates of entity summaries with an entity-aspect model and pattern mining. In: Proceedings of ACL, pp. 640–649. Association for Computational Linguistics, Stroudsburg (2010) Li, P., Jiang, J., et al.: Generating templates of entity summaries with an entity-aspect model and pattern mining. In: Proceedings of ACL, pp. 640–649. Association for Computational Linguistics, Stroudsburg (2010)
go back to reference Matuszek, C., Cabral, J., Witbrock, M., DeOliveira, J.: An introduction to the syntax and content of Cyc. In: Proceedings of the 2006 AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and its Applications to Knowledge Representation and Question Answering, pp. 44–49. AAAI Press, Palo Alto (2006) Matuszek, C., Cabral, J., Witbrock, M., DeOliveira, J.: An introduction to the syntax and content of Cyc. In: Proceedings of the 2006 AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and its Applications to Knowledge Representation and Question Answering, pp. 44–49. AAAI Press, Palo Alto (2006)
go back to reference Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38, 39–41 (1995)CrossRef Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38, 39–41 (1995)CrossRef
go back to reference Mintz, M., Bills, S., Snow, R., Jurafsky D.: Distant supervision for relation extraction without labeled data. In: Proceedings ACL-IJCNLP, pp. 1003—1011. Association for Computational Linguistics, Stroudsburg (2009) Mintz, M., Bills, S., Snow, R., Jurafsky D.: Distant supervision for relation extraction without labeled data. In: Proceedings ACL-IJCNLP, pp. 1003—1011. Association for Computational Linguistics, Stroudsburg (2009)
go back to reference Mohamed, T.P., Hruschka, J.E.R., et al.: Discovering relations between noun categories. In: Proceedings of EMNLP, pp. 1447–1455. Association for Computational Linguistics, Stroudsburg (2011) Mohamed, T.P., Hruschka, J.E.R., et al.: Discovering relations between noun categories. In: Proceedings of EMNLP, pp. 1447–1455. Association for Computational Linguistics, Stroudsburg (2011)
go back to reference Nakashole, N., Weikum, G., Suchanek, F.: PATTY: a taxonomy of relational patterns with semantic types. In: Proceedings of EMNLP, pp. 1135–1145 (2012) Nakashole, N., Weikum, G., Suchanek, F.: PATTY: a taxonomy of relational patterns with semantic types. In: Proceedings of EMNLP, pp. 1135–1145 (2012)
go back to reference Ponzetto, S.P., Navigli, R.: Large-scale taxonomy mapping for restructuring and integrating Wikipedia. In: Proceedings of the 21th IJCAI, pp. 2083–2088. AAAI Press, Palo Alto (2009) Ponzetto, S.P., Navigli, R.: Large-scale taxonomy mapping for restructuring and integrating Wikipedia. In: Proceedings of the 21th IJCAI, pp. 2083–2088. AAAI Press, Palo Alto (2009)
go back to reference Suchanek, F.M., Kasneci, G., et al.: Yago: a large ontology from wikipedia and wordnet. Web Semant.: Sci. Serv. Agents World Wide Web 6, 203–217 (2008)CrossRef Suchanek, F.M., Kasneci, G., et al.: Yago: a large ontology from wikipedia and wordnet. Web Semant.: Sci. Serv. Agents World Wide Web 6, 203–217 (2008)CrossRef
go back to reference Wang, C., Kalyanpur, A., et al.: Relation extraction and scoring in DeepQA. IBM J. Res. Dev. 56, 9:1–9:12 (2012)CrossRef Wang, C., Kalyanpur, A., et al.: Relation extraction and scoring in DeepQA. IBM J. Res. Dev. 56, 9:1–9:12 (2012)CrossRef
go back to reference Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proceedings of CIKM, pp. 41–50. ACM, New York (2007) Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proceedings of CIKM, pp. 41–50. ACM, New York (2007)
go back to reference Yates, A., et al.: TextRunner: open information extraction on the web. In: Proceedings of HLT-NAACL, pp. 25–26. Association for Computational Linguistics, Stroudsburg (2007) Yates, A., et al.: TextRunner: open information extraction on the web. In: Proceedings of HLT-NAACL, pp. 25–26. Association for Computational Linguistics, Stroudsburg (2007)
Metadata
Title
Large Scale Semantic Relation Discovery: Toward Establishing the Missing Link Between Wikipedia and Semantic Network
Authors
Xianpei Han
Xiliang Song
Le Sun
Copyright Year
2016
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-3168-7_6

Premium Partner