Skip to main content

2018 | OriginalPaper | Buchkapitel

Exploiting Wikipedia-Based Information-Rich Taxonomy for Extracting Location, Creator and Membership Related Information for ConceptNet Expansion

verfasst von : Marek Krawczyk, Rafal Rzepka, Kenji Araki

Erschienen in: Human Language Technology. Challenges for Computer Science and Linguistics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper we present a method for extracting IsA assertions (hyponymy relations), AtLocation assertions (informing of the location of an object or place), LocatedNear assertions (informing of neighboring locations), CreatedBy assertions (informing of the creator of an object) and MemberOf assertions (informing of group membership) automatically from Japanese Wikipedia XML dump files. We use the Hyponymy extraction tool v1.0, which analyses definition, category and hierarchy structures of Wikipedia articles to extract IsA assertions and produce information-rich taxonomy. From this taxonomy we extract additional information, in this case AtLocation, LocatedNear, CreatedBy and MemberOf types of assertions, using our original method. The presented experiments prove that both methods produce satisfactory results: we were able to acquire 5,866,680 IsA assertions with 96.0% reliability, 131,760 AtLocation assertion pairs with 93.5% reliability, 6,217 LocatedNear assertion pairs with 98.5% reliability, 270,230 CreatedBy assertion pairs with 78.5% reliability and 21,053 MemberOf assertions with 87.0% reliability. Our method surpassed the baseline system in terms of both precision and the number of acquired assertions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
7
Curly brackets were used to mark the tags’ representations.
 
8
To measure the agreement level between judges, we used Randolph’s free marginal multirater kappa instead of Fleiss’ fixed-marginal multirater kappa, due to high agreement low kappa paradox.
 
9
We adjusted the number of evaluated pairs to balance the proportion between the total number of pairs and the test sample.
 
Literatur
1.
Zurück zum Zitat Lenat, D.B.: CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11), 33–38 (1995)CrossRef Lenat, D.B.: CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11), 33–38 (1995)CrossRef
2.
Zurück zum Zitat Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706 (2007) Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706 (2007)
3.
Zurück zum Zitat Liu, H., Singh, P.: ConceptNet? A practical commonsense reasoning tool-kit. BT Technol. J. 22(4), 211–226 (2004)CrossRef Liu, H., Singh, P.: ConceptNet? A practical commonsense reasoning tool-kit. BT Technol. J. 22(4), 211–226 (2004)CrossRef
4.
Zurück zum Zitat Singh, P., Lin, T., Mueller, E.T., Lim, G., Perkins, T., Zhu, W.L.: Open mind common sense: knowledge acquisition from the general public. In: On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE, pp. 1223–1237 (2002)CrossRef Singh, P., Lin, T., Mueller, E.T., Lim, G., Perkins, T., Zhu, W.L.: Open mind common sense: knowledge acquisition from the general public. In: On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE, pp. 1223–1237 (2002)CrossRef
5.
Zurück zum Zitat Speer, R.H., Havasi, C., Treadway, K.N., Lieberman, H.: Finding your way in a multi-dimensional semantic space with Luminoso. In: Proceedings of the 15th International Conference on Intelligent User Interfaces, pp. 385–388 (2010) Speer, R.H., Havasi, C., Treadway, K.N., Lieberman, H.: Finding your way in a multi-dimensional semantic space with Luminoso. In: Proceedings of the 15th International Conference on Intelligent User Interfaces, pp. 385–388 (2010)
6.
Zurück zum Zitat Cambria, E., Hussain, A., Havasi, C., Eckl, C.: SenticSpace: visualizing opinions and sentiments in a multi-dimensional vector space. In: Knowledge-Based and Intelligent Information and Engineering Systems, pp. 385–393 (2010)CrossRef Cambria, E., Hussain, A., Havasi, C., Eckl, C.: SenticSpace: visualizing opinions and sentiments in a multi-dimensional vector space. In: Knowledge-Based and Intelligent Information and Engineering Systems, pp. 385–393 (2010)CrossRef
7.
Zurück zum Zitat Korner, S.J., Brumm, T.: RESI - a natural language specification improver. In: IEEE International Conference on Semantic Computing, pp. 1–8 (2009) Korner, S.J., Brumm, T.: RESI - a natural language specification improver. In: IEEE International Conference on Semantic Computing, pp. 1–8 (2009)
8.
Zurück zum Zitat Nakahara, K., Yamada, S.: Development and evaluation of a web-based game for common-sense knowledge acquisition in Japan. Unisys Technol. Rev. 107, 295–305 (2011) Nakahara, K., Yamada, S.: Development and evaluation of a web-based game for common-sense knowledge acquisition in Japan. Unisys Technol. Rev. 107, 295–305 (2011)
9.
Zurück zum Zitat Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI, vol. 5, p. 3 (2010) Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI, vol. 5, p. 3 (2010)
10.
Zurück zum Zitat Schubert, L.: Can we derive general world knowledge from texts? In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 94–97 (2002) Schubert, L.: Can we derive general world knowledge from texts? In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 94–97 (2002)
11.
Zurück zum Zitat Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8 (2011) Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8 (2011)
12.
Zurück zum Zitat Krawczyk, M., Rzepka, R., Araki, K.: Extracting ConceptNet knowledge triplets from Japanese Wikipedia. In: Proceedings of the 21st Annual Meeting of the Association for Natural Language Processing, pp. 1052–1055 (2015) Krawczyk, M., Rzepka, R., Araki, K.: Extracting ConceptNet knowledge triplets from Japanese Wikipedia. In: Proceedings of the 21st Annual Meeting of the Association for Natural Language Processing, pp. 1052–1055 (2015)
13.
Zurück zum Zitat Sumida, A., Torisawa, K.: Hacking Wikipedia for hyponymy relation acquisition. In: IJCNLP, vol. 8, pp. 883–888 (2008) Sumida, A., Torisawa, K.: Hacking Wikipedia for hyponymy relation acquisition. In: IJCNLP, vol. 8, pp. 883–888 (2008)
14.
Zurück zum Zitat Sumida, A., Yoshinaga, N., Torisawa, K.: Boosting precision and recall of hyponymy relation acquisition from hierarchical layouts in Wikipedia. In: LREC (2008) Sumida, A., Yoshinaga, N., Torisawa, K.: Boosting precision and recall of hyponymy relation acquisition from hierarchical layouts in Wikipedia. In: LREC (2008)
15.
Zurück zum Zitat Yamada, I., Hashimoto, C., Oh, J., Torisawa, K., Kuroda, K., De Saeger, S., Tsuchida, M., Kazama, J.: Generating information-rich taxonomy from Wikipedia. In: 4th International Universal Communication Symposium (IUCS), pp. 97–104 (2010) Yamada, I., Hashimoto, C., Oh, J., Torisawa, K., Kuroda, K., De Saeger, S., Tsuchida, M., Kazama, J.: Generating information-rich taxonomy from Wikipedia. In: 4th International Universal Communication Symposium (IUCS), pp. 97–104 (2010)
16.
Zurück zum Zitat Randolph, J.J.: Free-Marginal Multirater Kappa (multirater K [free]): An Alternative to Fleiss’ Fixed-Marginal Multirater Kappa (2005). Online Submission Randolph, J.J.: Free-Marginal Multirater Kappa (multirater K [free]): An Alternative to Fleiss’ Fixed-Marginal Multirater Kappa (2005). Online Submission
Metadaten
Titel
Exploiting Wikipedia-Based Information-Rich Taxonomy for Extracting Location, Creator and Membership Related Information for ConceptNet Expansion
verfasst von
Marek Krawczyk
Rafal Rzepka
Kenji Araki
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-93782-3_19