Skip to main content
Erschienen in:
Buchtitelbild

2019 | OriginalPaper | Buchkapitel

Neural Article Pair Modeling for Wikipedia Sub-article Matching

verfasst von : Muhao Chen, Changping Meng, Gang Huang, Carlo Zaniolo

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nowadays, editors tend to separate different subtopics of a long Wiki-pedia article into multiple sub-articles. This separation seeks to improve human readability. However, it also has a deleterious effect on many Wikipedia-based tasks that rely on the article-as-concept assumption, which requires each entity (or concept) to be described solely by one article. This underlying assumption significantly simplifies knowledge representation and extraction, and it is vital to many existing technologies such as automated knowledge base construction, cross-lingual knowledge alignment, semantic search and data lineage of Wikipedia entities. In this paper we provide an approach to match the scattered sub-articles back to their corresponding main-articles, with the intent of facilitating automated Wikipedia curation and processing. The proposed model adopts a hierarchical learning structure that combines multiple variants of neural document pair encoders with a comprehensive set of explicit features. A large crowdsourced dataset is created to support the evaluation and feature extraction for the task. Based on the large dataset, the proposed model achieves promising results of cross-validation and significantly outperforms previous approaches. Large-scale serving on the entire English Wikipedia also proves the practicability and scalability of the proposed model by effectively extracting a vast collection of newly paired main and sub-articles. Code related to this paper is available at: https://​github.​com/​muhaochen/​subarticle.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ackerman, M.S., Dachtera, J., et al.: Sharing knowledge and expertise: the CSCW view of knowledge management. CSCW 22, 531–573 (2013) Ackerman, M.S., Dachtera, J., et al.: Sharing knowledge and expertise: the CSCW view of knowledge management. CSCW 22, 531–573 (2013)
3.
Zurück zum Zitat Cai, Z., Zhao, K., et al.: Wikification via link co-occurrence. In: CIKM (2013) Cai, Z., Zhao, K., et al.: Wikification via link co-occurrence. In: CIKM (2013)
4.
Zurück zum Zitat Chen, D., Fisch, A., et al.: Reading Wikipedia to answer open-domain questions. In: ACL (2017) Chen, D., Fisch, A., et al.: Reading Wikipedia to answer open-domain questions. In: ACL (2017)
5.
Zurück zum Zitat Chen, M., Tian, Y., et al.: Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In: IJCAI (2017) Chen, M., Tian, Y., et al.: Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In: IJCAI (2017)
6.
Zurück zum Zitat Chen, M., Tian, Y., et al.: Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In: IJCAI (2018) Chen, M., Tian, Y., et al.: Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In: IJCAI (2018)
7.
Zurück zum Zitat Chen, M., Tian, Y., et al.: On2Vec: embedding-based relation prediction for ontology population. In: SDM (2018) Chen, M., Tian, Y., et al.: On2Vec: embedding-based relation prediction for ontology population. In: SDM (2018)
8.
Zurück zum Zitat Chen, M., Zaniolo, C.: Learning multi-faceted knowledge graph embeddings for natural language processing. In: IJCAI (2017) Chen, M., Zaniolo, C.: Learning multi-faceted knowledge graph embeddings for natural language processing. In: IJCAI (2017)
9.
Zurück zum Zitat Chung, J., Gulcehre, C., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv (2014) Chung, J., Gulcehre, C., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv (2014)
10.
Zurück zum Zitat Cilibrasi, R.L., Vitanyi, P.M.: The Google similarity distance. TKDE 19(3), 370–383 (2007) Cilibrasi, R.L., Vitanyi, P.M.: The Google similarity distance. TKDE 19(3), 370–383 (2007)
11.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Comm. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Comm. ACM 51(1), 107–113 (2008)CrossRef
12.
Zurück zum Zitat Dhingra, B., Liu, H., et al.: Gated-attention readers for text comprehension. In: ACL (2017) Dhingra, B., Liu, H., et al.: Gated-attention readers for text comprehension. In: ACL (2017)
14.
Zurück zum Zitat Féraud, R., Clérot, F.: A methodology to explain neural network classification. Neural Netw. 15(2), 237–246 (2002)CrossRef Féraud, R., Clérot, F.: A methodology to explain neural network classification. Neural Netw. 15(2), 237–246 (2002)CrossRef
15.
Zurück zum Zitat Gabrilovich, E., Markovitch, S., et al.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: IJCAI (2007) Gabrilovich, E., Markovitch, S., et al.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: IJCAI (2007)
16.
Zurück zum Zitat Gouws, S., Bengio, Y., Corrado, G.: Bilbowa: fast bilingual distributed representations without word alignments. In: ICML (2015) Gouws, S., Bengio, Y., Corrado, G.: Bilbowa: fast bilingual distributed representations without word alignments. In: ICML (2015)
17.
Zurück zum Zitat Hecht, B., Carton, S.H., et al.: Explanatory semantic relatedness and explicit spatialization for exploratory search. In: SIGIR (2012) Hecht, B., Carton, S.H., et al.: Explanatory semantic relatedness and explicit spatialization for exploratory search. In: SIGIR (2012)
18.
Zurück zum Zitat Hu, B., Lu, Z., et al.: Convolutional neural network architectures for matching natural language sentences. In: NIPS, pp. 2042–2050 (2014) Hu, B., Lu, Z., et al.: Convolutional neural network architectures for matching natural language sentences. In: NIPS, pp. 2042–2050 (2014)
19.
Zurück zum Zitat Jozefowicz, R., Zaremba, W., et al.: An empirical exploration of recurrent network architectures. In: ICML (2015) Jozefowicz, R., Zaremba, W., et al.: An empirical exploration of recurrent network architectures. In: ICML (2015)
20.
Zurück zum Zitat Kadlec, R., Schmid, M., et al.: Text understanding with the attention sum reader network. In: ACL, vol. 1 (2016) Kadlec, R., Schmid, M., et al.: Text understanding with the attention sum reader network. In: ACL, vol. 1 (2016)
21.
Zurück zum Zitat Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014) Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014)
22.
Zurück zum Zitat Kittur, A., Kraut, R.E.: Beyond Wikipedia: coordination and conflict in online production groups. In: CSCW (2010) Kittur, A., Kraut, R.E.: Beyond Wikipedia: coordination and conflict in online production groups. In: CSCW (2010)
23.
Zurück zum Zitat Lascarides, A., Asher, N.: Temporal interpretation, discourse relations and commonsense entailment. Linguist. Philos. 16(5), 437–493 (1993)CrossRef Lascarides, A., Asher, N.: Temporal interpretation, discourse relations and commonsense entailment. Linguist. Philos. 16(5), 437–493 (1993)CrossRef
24.
Zurück zum Zitat Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., et al.: DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Seman. Web 6(2), 167–195 (2015) Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., et al.: DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Seman. Web 6(2), 167–195 (2015)
25.
Zurück zum Zitat Lin, C.Y., Hovy, E.: From single to multi-document summarization: a prototype system and its evaluation. In: ACL (2002) Lin, C.Y., Hovy, E.: From single to multi-document summarization: a prototype system and its evaluation. In: ACL (2002)
26.
Zurück zum Zitat Lin, Y., Yu, B., et al.: Problematizing and addressing the article-as-concept assumption in Wikipedia. In: CSCW (2017) Lin, Y., Yu, B., et al.: Problematizing and addressing the article-as-concept assumption in Wikipedia. In: CSCW (2017)
27.
Zurück zum Zitat Liu, X., Xia, T., et al.: Cross social media recommendation. In: ICWSM (2016) Liu, X., Xia, T., et al.: Cross social media recommendation. In: ICWSM (2016)
28.
Zurück zum Zitat Mahdisoltani, F., Biega, J., Suchanek, F., et al.: Yago3: a knowledge base from multilingual Wikipedias. In: CIDR (2015) Mahdisoltani, F., Biega, J., Suchanek, F., et al.: Yago3: a knowledge base from multilingual Wikipedias. In: CIDR (2015)
29.
Zurück zum Zitat Meij, E., Balog, K., Odijk, D.: Entity linking and retrieval for semantic search. In: WSDM (2014) Meij, E., Balog, K., Odijk, D.: Entity linking and retrieval for semantic search. In: WSDM (2014)
30.
Zurück zum Zitat Mikolov, T., Sutskever, I., et al.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013) Mikolov, T., Sutskever, I., et al.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)
31.
Zurück zum Zitat Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: CIKM (2008) Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: CIKM (2008)
32.
Zurück zum Zitat Mousavi, H., Atzori, M., et al.: Text-mining, structured queries, and knowledge management on web document Corpora. SIGMOD Rec. 43(3), 48–54 (2014)CrossRef Mousavi, H., Atzori, M., et al.: Text-mining, structured queries, and knowledge management on web document Corpora. SIGMOD Rec. 43(3), 48–54 (2014)CrossRef
33.
Zurück zum Zitat Ni, Y., Xu, Q.K., et al.: Semantic documents relatedness using concept graph representation. In: WSDM (2016) Ni, Y., Xu, Q.K., et al.: Semantic documents relatedness using concept graph representation. In: WSDM (2016)
34.
Zurück zum Zitat Olden, J.D., Jackson, D.A.: Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks. Ecol. Model. 154(1–2), 135–150 (2002)CrossRef Olden, J.D., Jackson, D.A.: Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks. Ecol. Model. 154(1–2), 135–150 (2002)CrossRef
35.
Zurück zum Zitat Poria, S., Cambria, E., et al.: Deep convolutional neural network textual features and multiple Kernel learning for utterance-level multimodal sentiment analysis. In: EMNLP (2015) Poria, S., Cambria, E., et al.: Deep convolutional neural network textual features and multiple Kernel learning for utterance-level multimodal sentiment analysis. In: EMNLP (2015)
36.
Zurück zum Zitat Rocktäschel, T., Grefenstette, E., et al.: Reasoning about entailment with neural attention (2016) Rocktäschel, T., Grefenstette, E., et al.: Reasoning about entailment with neural attention (2016)
37.
Zurück zum Zitat Schuhmacher, M., Ponzetto, S.P.: Knowledge-based graph document modeling. In: WSDM (2014) Schuhmacher, M., Ponzetto, S.P.: Knowledge-based graph document modeling. In: WSDM (2014)
38.
Zurück zum Zitat Severyn, A., Moschitti, A.: Twitter sentiment analysis with deep convolutional neural networks. In: SIGIR (2015) Severyn, A., Moschitti, A.: Twitter sentiment analysis with deep convolutional neural networks. In: SIGIR (2015)
39.
Zurück zum Zitat Sha, L., Chang, B., et al.: Reading and thinking: re-read LSTM unit for textual entailment recognition. In: COLING (2016) Sha, L., Chang, B., et al.: Reading and thinking: re-read LSTM unit for textual entailment recognition. In: COLING (2016)
40.
Zurück zum Zitat Strube, M., Ponzetto, S.P.: Wikirelate! Computing semantic relatedness using Wikipedia. In: AAAI (2006) Strube, M., Ponzetto, S.P.: Wikirelate! Computing semantic relatedness using Wikipedia. In: AAAI (2006)
41.
Zurück zum Zitat Suchanek, F.M., Abiteboul, S., et al.: Paris: probabilistic alignment of relations, instances, and schema. In: PVLDB (2011) Suchanek, F.M., Abiteboul, S., et al.: Paris: probabilistic alignment of relations, instances, and schema. In: PVLDB (2011)
42.
Zurück zum Zitat Tsai, C.T., Roth, D.: Cross-lingual Wikification using multilingual embeddings. In: NAACL (2016) Tsai, C.T., Roth, D.: Cross-lingual Wikification using multilingual embeddings. In: NAACL (2016)
43.
Zurück zum Zitat Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: WWW (2012) Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: WWW (2012)
44.
Zurück zum Zitat Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base. Comm. ACM 57(10), 78–85 (2014)CrossRef Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base. Comm. ACM 57(10), 78–85 (2014)CrossRef
45.
Zurück zum Zitat Wang, Z., Li, J., et al.: Cross-lingual knowledge linking across Wiki knowledge bases. In: WWW (2012) Wang, Z., Li, J., et al.: Cross-lingual knowledge linking across Wiki knowledge bases. In: WWW (2012)
46.
Zurück zum Zitat Xie, R., Liu, Z., et al.: Representation learning of knowledge graphs with entity descriptions. In: AAAI (2016) Xie, R., Liu, Z., et al.: Representation learning of knowledge graphs with entity descriptions. In: AAAI (2016)
47.
Zurück zum Zitat Yamada, I., Shindo, H., et al.: Joint learning of the embedding of words and entities for named entity disambiguation. In: CoNLL (2016) Yamada, I., Shindo, H., et al.: Joint learning of the embedding of words and entities for named entity disambiguation. In: CoNLL (2016)
48.
Zurück zum Zitat Yin, W., Schütze, H.: Convolutional neural network for paraphrase identification. In: NAACL (2015) Yin, W., Schütze, H.: Convolutional neural network for paraphrase identification. In: NAACL (2015)
49.
Zurück zum Zitat Yin, W., Schütze, H., et al.: Abcnn: Attention-based convolutional neural network for modeling sentence pairs. TACL 4(1), 259–272 (2016) Yin, W., Schütze, H., et al.: Abcnn: Attention-based convolutional neural network for modeling sentence pairs. TACL 4(1), 259–272 (2016)
50.
Zurück zum Zitat Zou, L., Huang, R., et al.: Natural language question answering over RDF: a graph data driven approach. In: SIGMOD (2014) Zou, L., Huang, R., et al.: Natural language question answering over RDF: a graph data driven approach. In: SIGMOD (2014)
Metadaten
Titel
Neural Article Pair Modeling for Wikipedia Sub-article Matching
verfasst von
Muhao Chen
Changping Meng
Gang Huang
Carlo Zaniolo
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-10997-4_1