Skip to main content

2018 | OriginalPaper | Buchkapitel

QA4IE: A Question Answering Based Framework for Information Extraction

verfasst von : Lin Qiu, Hao Zhou, Yanru Qu, Weinan Zhang, Suoheng Li, Shu Rong, Dongyu Ru, Lihua Qian, Kewei Tu, Yong Yu

Erschienen in: The Semantic Web – ISWC 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Information Extraction (IE) refers to automatically extracting structured relation tuples from unstructured texts. Common IE solutions, including Relation Extraction (RE) and open IE systems, can hardly handle cross-sentence tuples, and are severely restricted by limited relation types as well as informal relation specifications (e.g., free-text based relation tuples). In order to overcome these weaknesses, we propose a novel IE framework named QA4IE, which leverages the flexible question answering (QA) approaches to produce high quality relation triples across sentences. Based on the framework, we develop a large IE benchmark with high quality human evaluation. This benchmark contains 293K documents, 2M golden relation triples, and 636 relation types. We compare our system with some IE baselines on our benchmark and the results show that our system achieves great improvements.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016) Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)
2.
Zurück zum Zitat Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM (2000) Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM (2000)
3.
Zurück zum Zitat Angeli, G., Premkumar, M.J.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: ACL, vol. 1, pp. 344–354 (2015) Angeli, G., Premkumar, M.J.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: ACL, vol. 1, pp. 344–354 (2015)
5.
Zurück zum Zitat Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (ICLR) (2015) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (ICLR) (2015)
6.
Zurück zum Zitat Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Semant. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)CrossRef Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Semant. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)CrossRef
7.
Zurück zum Zitat Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading Wikipedia to answer open-domain questions. In: ACL, vol. 1, pp. 1870–1879 (2017) Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading Wikipedia to answer open-domain questions. In: ACL, vol. 1, pp. 1870–1879 (2017)
8.
Zurück zum Zitat Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: Proceedings of International Conference on World Wide Web, pp. 355–366 (2013) Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: Proceedings of International Conference on World Wide Web, pp. 355–366 (2013)
9.
Zurück zum Zitat Gupta, R., Halevy, A., Wang, X., Whang, S.E., Wu, F.: Biperpedia: an ontology for search applications. Proc. VLDB Endow. 7(7), 505–516 (2014)CrossRef Gupta, R., Halevy, A., Wang, X., Whang, S.E., Wu, F.: Biperpedia: an ontology for search applications. Proc. VLDB Endow. 7(7), 505–516 (2014)CrossRef
10.
Zurück zum Zitat He, L., Lewis, M., Zettlemoyer, L.: Question-answer driven semantic role labeling: using natural language to annotate natural language. In: EMNLP, pp. 643–653 (2015) He, L., Lewis, M., Zettlemoyer, L.: Question-answer driven semantic role labeling: using natural language to annotate natural language. In: EMNLP, pp. 643–653 (2015)
11.
Zurück zum Zitat Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015) Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)
12.
Zurück zum Zitat Hewlett, D., et al.: WikiReading: a novel large-scale language understanding task over Wikipedia. In: ACL, vol. 1, pp. 1535–1545 (2016) Hewlett, D., et al.: WikiReading: a novel large-scale language understanding task over Wikipedia. In: ACL, vol. 1, pp. 1535–1545 (2016)
13.
Zurück zum Zitat Hill, F., Bordes, A., Chopra, S., Weston, J.: The goldilocks principle: reading children’s books with explicit memory representations. arXiv preprint arXiv:1511.02301 (2015) Hill, F., Bordes, A., Chopra, S., Weston, J.: The goldilocks principle: reading children’s books with explicit memory representations. arXiv preprint arXiv:​1511.​02301 (2015)
14.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRef
15.
Zurück zum Zitat Hu, M., Peng, Y., Qiu, X.: Reinforced mnemonic reader for machine comprehension. CoRR, abs/1705.02798 (2017) Hu, M., Peng, Y., Qiu, X.: Reinforced mnemonic reader for machine comprehension. CoRR, abs/1705.02798 (2017)
16.
Zurück zum Zitat Ji, H., Grishman, R.: Knowledge base population: successful approaches and challenges. In: ACL, pp. 1148–1158 (2011) Ji, H., Grishman, R.: Knowledge base population: successful approaches and challenges. In: ACL, pp. 1148–1158 (2011)
17.
Zurück zum Zitat Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: AAAI, pp. 2741–2749 (2016) Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: AAAI, pp. 2741–2749 (2016)
18.
Zurück zum Zitat Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, pp. 260–270 (2016) Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, pp. 260–270 (2016)
19.
Zurück zum Zitat Le, Q.V., Jaitly, N., Hinton, G.E.: A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941 (2015) Le, Q.V., Jaitly, N., Hinton, G.E.: A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:​1504.​00941 (2015)
20.
Zurück zum Zitat Lee, T., Wang, Z., Wang, H., Hwang, S.W.: Attribute extraction and scoring: a probabilistic approach. In: 29th International Conference on Data Engineering, pp. 194–205 (2013) Lee, T., Wang, Z., Wang, H., Hwang, S.W.: Attribute extraction and scoring: a probabilistic approach. In: 29th International Conference on Data Engineering, pp. 194–205 (2013)
21.
Zurück zum Zitat Levy, O., Seo, M., Choi, E., Zettlemoyer, L.: Zero-shot relation extraction via reading comprehension. In: CoNLL, pp. 333–342 (2017) Levy, O., Seo, M., Choi, E., Zettlemoyer, L.: Zero-shot relation extraction via reading comprehension. In: CoNLL, pp. 333–342 (2017)
22.
Zurück zum Zitat Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, vol. 15, pp. 2181–2187 (2015) Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, vol. 15, pp. 2181–2187 (2015)
23.
Zurück zum Zitat Ling, X., Weld, D.S.: Fine-grained entity recognition. In: AAAI (2012) Ling, X., Weld, D.S.: Fine-grained entity recognition. In: AAAI (2012)
24.
Zurück zum Zitat Liu, X., Shen, Y., Duh, K., Gao, J.: Stochastic answer networks for machine reading comprehension. arXiv preprint arXiv:1712.03556 (2017) Liu, X., Shen, Y., Duh, K., Gao, J.: Stochastic answer networks for machine reading comprehension. arXiv preprint arXiv:​1712.​03556 (2017)
25.
Zurück zum Zitat Luo, G., Huang, X., Lin, C.Y., Nie, Z.: Joint entity recognition and disambiguation. In: EMNLP, pp. 879–888 (2015) Luo, G., Huang, X., Lin, C.Y., Nie, Z.: Joint entity recognition and disambiguation. In: EMNLP, pp. 879–888 (2015)
26.
Zurück zum Zitat Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: ACL, vol. 1, pp. 1064–1074 (2016) Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: ACL, vol. 1, pp. 1064–1074 (2016)
27.
Zurück zum Zitat Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL, pp. 1003–1011 (2009) Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL, pp. 1003–1011 (2009)
28.
Zurück zum Zitat Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. TACL 2, 231–244 (2014) Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. TACL 2, 231–244 (2014)
29.
Zurück zum Zitat Pan, B., Li, H., Zhao, Z., Cao, B., Cai, D., He, X.: MEMEN: multi-layer embedding with memory networks for machine comprehension. arXiv preprint arXiv:1707.09098 (2017) Pan, B., Li, H., Zhao, Z., Cao, B., Cai, D., He, X.: MEMEN: multi-layer embedding with memory networks for machine comprehension. arXiv preprint arXiv:​1707.​09098 (2017)
30.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
31.
Zurück zum Zitat Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., Salakoski, T.: BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinform. 8(1), 50 (2007)CrossRef Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., Salakoski, T.: BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinform. 8(1), 50 (2007)CrossRef
32.
Zurück zum Zitat Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: EMNLP, pp. 2383–2392 (2016) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: EMNLP, pp. 2383–2392 (2016)
33.
Zurück zum Zitat Ren, X., et al.: CoType: joint extraction of typed entities and relations with knowledge bases. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1015–1024 (2017) Ren, X., et al.: CoType: joint extraction of typed entities and relations with knowledge bases. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1015–1024 (2017)
35.
Zurück zum Zitat Roth, B., Conforti, C., Poerner, N., Karn, S., Schütze, H.: Neural architectures for open-type relation argument extraction. arXiv preprint arXiv:1803.01707 (2018) Roth, B., Conforti, C., Poerner, N., Karn, S., Schütze, H.: Neural architectures for open-type relation argument extraction. arXiv preprint arXiv:​1803.​01707 (2018)
36.
Zurück zum Zitat Schmitz, M., Bart, R., Soderland, S., Etzioni, O., et al.: Open language learning for information extraction. In: EMNLP, pp. 523–534 (2012) Schmitz, M., Bart, R., Soderland, S., Etzioni, O., et al.: Open language learning for information extraction. In: EMNLP, pp. 523–534 (2012)
37.
Zurück zum Zitat Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016) Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:​1611.​01603 (2016)
38.
Zurück zum Zitat Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2), 443–460 (2015)CrossRef Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2), 443–460 (2015)CrossRef
39.
Zurück zum Zitat Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH
41.
Zurück zum Zitat Stanovsky, G., Dagan, I.: Creating a large benchmark for open information extraction. In: EMNLP, pp. 2300–2305 (2016) Stanovsky, G., Dagan, I.: Creating a large benchmark for open information extraction. In: EMNLP, pp. 2300–2305 (2016)
42.
Zurück zum Zitat Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, pp. 2440–2448 (2015) Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, pp. 2440–2448 (2015)
43.
Zurück zum Zitat Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of NAACL-HLT, pp. 142–147 (2003) Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of NAACL-HLT, pp. 142–147 (2003)
44.
Zurück zum Zitat Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Advances in Neural Information Processing Systems, pp. 2692–2700 (2015) Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Advances in Neural Information Processing Systems, pp. 2692–2700 (2015)
45.
Zurück zum Zitat Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)CrossRef Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)CrossRef
46.
47.
Zurück zum Zitat Wang, W., Yang, N., Wei, F., Chang, B., Zhou, M.: Gated self-matching networks for reading comprehension and question answering. In: ACL, vol. 1, pp. 189–198 (2017) Wang, W., Yang, N., Wei, F., Chang, B., Zhou, M.: Gated self-matching networks for reading comprehension and question answering. In: ACL, vol. 1, pp. 189–198 (2017)
48.
Zurück zum Zitat Xu, K., Feng, Y., Huang, S., Zhao, D.: Semantic relation classification via convolutional neural networks with simple negative sampling. In: EMNLP, pp. 536–540 (2015) Xu, K., Feng, Y., Huang, S., Zhao, D.: Semantic relation classification via convolutional neural networks with simple negative sampling. In: EMNLP, pp. 536–540 (2015)
49.
Zurück zum Zitat Yahya, M., Whang, S., Gupta, R., Halevy, A.: ReNoun: fact extraction for nominal attributes. In: EMNLP, pp. 325–335 (2014) Yahya, M., Whang, S., Gupta, R., Halevy, A.: ReNoun: fact extraction for nominal attributes. In: EMNLP, pp. 325–335 (2014)
51.
Zurück zum Zitat Zeng, D., Liu, K., Chen, Y., Zhao, J.: Distant supervision for relation extraction via piecewise convolutional neural networks. In: EMNLP, pp. 1753–1762 (2015) Zeng, D., Liu, K., Chen, Y., Zhao, J.: Distant supervision for relation extraction via piecewise convolutional neural networks. In: EMNLP, pp. 1753–1762 (2015)
52.
Zurück zum Zitat Zheng, S., Wang, F., Bao, H., Hao, Y., Zhou, P., Xu, B.: Joint extraction of entities and relations based on a novel tagging scheme. In: ACL, vol. 1, pp. 1227–1236 (2017) Zheng, S., Wang, F., Bao, H., Hao, Y., Zhou, P., Xu, B.: Joint extraction of entities and relations based on a novel tagging scheme. In: ACL, vol. 1, pp. 1227–1236 (2017)
Metadaten
Titel
QA4IE: A Question Answering Based Framework for Information Extraction
verfasst von
Lin Qiu
Hao Zhou
Yanru Qu
Weinan Zhang
Suoheng Li
Shu Rong
Dongyu Ru
Lihua Qian
Kewei Tu
Yong Yu
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-00671-6_12

Premium Partner