Skip to main content
Top
Published in: Journal of Intelligent Information Systems 1/2019

04-06-2018

EVE: explainable vector based embedding technique using Wikipedia

Authors: M. Atif Qureshi, Derek Greene

Published in: Journal of Intelligent Information Systems | Issue 1/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We present an unsupervised explainable vector embedding technique, called EVE, which is built upon the structure of Wikipedia. The proposed model defines the dimensions of a semantic vector representing a concept using human-readable labels, thereby it is readily interpretable. Specifically, each vector is constructed using the Wikipedia category graph structure together with the Wikipedia article link structure. To test the effectiveness of the proposed model, we consider its usefulness in three fundamental tasks: 1) intruder detection—to evaluate its ability to identify a non-coherent vector from a list of coherent vectors, 2) ability to cluster—to evaluate its tendency to group related vectors together while keeping unrelated vectors in separate clusters, and 3) sorting relevant items first—to evaluate its ability to rank vectors (items) relevant to the query in the top order of the result. For each task, we also propose a strategy to generate a task-specific human-interpretable explanation from the model. These demonstrate the overall effectiveness of the explainable embeddings generated by EVE. Finally, we compare EVE with the Word2Vec, FastText, and GloVe embedding techniques across the three tasks, and report improvements over the state-of-the-art.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
2
Besides the basic similarity with PageRank
 
3
With an intuition to penalize untrusted pages (or spam)
 
4
This can be an exact match or a partial best match using an information retrieval algorithm
 
5
This dimension is the most relevant dimension defining the concept which is the article itself.
 
6
In case of best match strategy, where more than one article is mapped to a concept i.e., Aconcept1,Aconcept2,... the score computed is further scaled by the relevance score of each article for the top-k articles, then reduced by the vector addition, and normalized again.
 
7
In case of the partial best match it is the relevance score returned by BM25 algorithm.
 
8
By simply, 1—normalized similarity score over each dimension
 
9
The full set of experimental visualizations is available at http://​mlg.​ucd.​ie/​eve/​
 
Literature
go back to reference Adler, P., Falk, C., Friedler, S.A, Rybeck, G., Scheidegger, C., Smith, B., Venkatasubramanian, S. (2016). Auditing black-box models for indirect influence. In 2016 IEEE 16th international conference on data mining (ICDM) (pp. 1–10). IEEE. Adler, P., Falk, C., Friedler, S.A, Rybeck, G., Scheidegger, C., Smith, B., Venkatasubramanian, S. (2016). Auditing black-box models for indirect influence. In 2016 IEEE 16th international conference on data mining (ICDM) (pp. 1–10). IEEE.
go back to reference Agirre, E., & Soroa, A. (2009). Personalizing pagerank for word sense disambiguation. In Proceedings of the 12th conference of the European chapter of the association for computational linguistics, association for computational linguistics (pp. 33–41). Agirre, E., & Soroa, A. (2009). Personalizing pagerank for word sense disambiguation. In Proceedings of the 12th conference of the European chapter of the association for computational linguistics, association for computational linguistics (pp. 33–41).
go back to reference Arora, S., Li, Y., Liang, Y., Ma, T., Risteski, A. (2016). A latent variable model approach to pmi-based word embeddings. Transactions of the Association for Computational Linguistics, 4, 385–399.CrossRef Arora, S., Li, Y., Liang, Y., Ma, T., Risteski, A. (2016). A latent variable model approach to pmi-based word embeddings. Transactions of the Association for Computational Linguistics, 4, 385–399.CrossRef
go back to reference Baroni, M., & Lenci, A. (2010). Distributional memory: a general framework for corpus-based semantics. Computational Linguistics, 36 (4), 673–721.CrossRef Baroni, M., & Lenci, A. (2010). Distributional memory: a general framework for corpus-based semantics. Computational Linguistics, 36 (4), 673–721.CrossRef
go back to reference Baroni, M., Dinu, G., Kruszewski, G. (2014). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In ACL (Vol. 1, pp. 238–247). Baroni, M., Dinu, G., Kruszewski, G. (2014). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In ACL (Vol. 1, pp. 238–247).
go back to reference Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C. (2003). A neural probabilistic language model. JMLR, 3, 1137–1155.MATH Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C. (2003). A neural probabilistic language model. JMLR, 3, 1137–1155.MATH
go back to reference Bhargava, P., Phan, T., Zhou, J., Lee, J. (2015). Who, what, when, and where: multi-dimensional collaborative recommendations using tensor factorization on sparse user-generated data. In Proceedings of the 24th international conference on world wide web (pp. 130–140). ACM. Bhargava, P., Phan, T., Zhou, J., Lee, J. (2015). Who, what, when, and where: multi-dimensional collaborative recommendations using tensor factorization on sparse user-generated data. In Proceedings of the 24th international conference on world wide web (pp. 130–140). ACM.
go back to reference Bian, J., Gao, B., Liu, T.Y. (2014). Knowledge-powered deep learning for word embedding. In Joint European conference on machine learning and knowledge discovery in databases (pp. 132–148). Springer. Bian, J., Gao, B., Liu, T.Y. (2014). Knowledge-powered deep learning for word embedding. In Joint European conference on machine learning and knowledge discovery in databases (pp. 132–148). Springer.
go back to reference Bojanowski, P., Grave, E., Joulin, A., Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:160704606. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:160704606.
go back to reference Bordes, A., Weston, J., Collobert, R., Bengio, Y. (2011). Learning structured embeddings of knowledge bases. In Conference on artificial intelligence, EPFL-CONF-192344. Bordes, A., Weston, J., Collobert, R., Bengio, Y. (2011). Learning structured embeddings of knowledge bases. In Conference on artificial intelligence, EPFL-CONF-192344.
go back to reference Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32 (1), 13–47.CrossRefMATH Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32 (1), 13–47.CrossRefMATH
go back to reference Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-Theory and Methods, 3 (1), 1–27.MathSciNetCrossRefMATH Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-Theory and Methods, 3 (1), 1–27.MathSciNetCrossRefMATH
go back to reference Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the ICML’2008 (pp. 160–167). ACM. Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the ICML’2008 (pp. 160–167). ACM.
go back to reference Datta, A., Sen, S., Zick, Y. (2016). Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP) (pp. 598–617). IEEE. Datta, A., Sen, S., Zick, Y. (2016). Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP) (pp. 598–617). IEEE.
go back to reference Deerwester, S. (1988). Improving information retrieval with latent semantic indexing. In Proceedings of the 51st annual meeting of the American Society for information science (Vol. 25, pp. 36–40). Deerwester, S. (1988). Improving information retrieval with latent semantic indexing. In Proceedings of the 51st annual meeting of the American Society for information science (Vol. 25, pp. 36–40).
go back to reference Diao, Q., Qiu, M., Wu, C.Y., Smola, A.J., Jiang, J., Wang, C. (2014). Jointly modeling aspects, ratings and sentiments for movie recommendation (jmars). In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 193–202). ACM. Diao, Q., Qiu, M., Wu, C.Y., Smola, A.J., Jiang, J., Wang, C. (2014). Jointly modeling aspects, ratings and sentiments for movie recommendation (jmars). In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 193–202). ACM.
go back to reference Diaz, F., Mitra, B., Craswell, N. (2016). Query expansion with locally-trained word embeddings. In Association for computational linguistics (pp. 367–377). Diaz, F., Mitra, B., Craswell, N. (2016). Query expansion with locally-trained word embeddings. In Association for computational linguistics (pp. 367–377).
go back to reference Everitt, B., Landau, S., Leese, M. (2001). Cluster analysis. Wiley: Hodder Arnold Publication.MATH Everitt, B., Landau, S., Leese, M. (2001). Cluster analysis. Wiley: Hodder Arnold Publication.MATH
go back to reference Faruqui, M., Dodge, J., Jauhar, S.K, Dyer, C., Hovy, E., Smith, N.A. (2014). Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:14114166. Faruqui, M., Dodge, J., Jauhar, S.K, Dyer, C., Hovy, E., Smith, N.A. (2014). Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:14114166.
go back to reference Firth, J. (1957). A synopsis of linguistic theory 1930–1955. In Studies in linguistic analysis (pp. 1–32). Firth, J. (1957). A synopsis of linguistic theory 1930–1955. In Studies in linguistic analysis (pp. 1–32).
go back to reference Fu, X., Wang, T., Li, J., Yu, C., Liu, W. (2016). Improving distributed word representation and topic model by word-topic mixture model. In Proceedings of the 8th Asian conference on machine learning (pp. 190–205). Fu, X., Wang, T., Li, J., Yu, C., Liu, W. (2016). Improving distributed word representation and topic model by word-topic mixture model. In Proceedings of the 8th Asian conference on machine learning (pp. 190–205).
go back to reference Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the IJCAI’07 (Vol. 7, pp. 1606–1611). Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the IJCAI’07 (Vol. 7, pp. 1606–1611).
go back to reference Gallant, S.I., Caid, W.R., Carleton, J., Hecht-Nielsen, R., Qing, K.P., Sudbeck, D. (1992). Hnc’s matchplus system. In ACM SIGIR Forum (Vol. 26, pp. 34–38). ACM. Gallant, S.I., Caid, W.R., Carleton, J., Hecht-Nielsen, R., Qing, K.P., Sudbeck, D. (1992). Hnc’s matchplus system. In ACM SIGIR Forum (Vol. 26, pp. 34–38). ACM.
go back to reference Ganguly, D., Roy, D., Mitra, M., Jones, G.J. (2015). Word embedding based generalized language model for information retrieval. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (pp. 795–798). ACM. Ganguly, D., Roy, D., Mitra, M., Jones, G.J. (2015). Word embedding based generalized language model for information retrieval. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (pp. 795–798). ACM.
go back to reference Ganitkevitch, J., Van Durme, B., Callison-Burch, C. (2013). Ppdb: the paraphrase database. In HLT-NAACL (pp. 758–764). Ganitkevitch, J., Van Durme, B., Callison-Burch, C. (2013). Ppdb: the paraphrase database. In HLT-NAACL (pp. 758–764).
go back to reference Globerson, A., Chechik, G., Pereira, F., Tishby, N. (2007). Euclidean embedding of co-occurrence data. JMLR, 8, 2265–2295.MathSciNetMATH Globerson, A., Chechik, G., Pereira, F., Tishby, N. (2007). Euclidean embedding of co-occurrence data. JMLR, 8, 2265–2295.MathSciNetMATH
go back to reference Goodman, B., & Flaxman, S. (2016). European union regulations on algorithmic decision-making and a “right to explanation”. arXiv preprint arXiv:160608813. Goodman, B., & Flaxman, S. (2016). European union regulations on algorithmic decision-making and a “right to explanation”. arXiv preprint arXiv:160608813.
go back to reference Gyöngyi, Z., Garcia-Molina, H., Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the thirtieth international conference on very large data bases. VLDB Endowment (Vol. 30, pp. 576–587). Gyöngyi, Z., Garcia-Molina, H., Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the thirtieth international conference on very large data bases. VLDB Endowment (Vol. 30, pp. 576–587).
go back to reference Harris, Z.S. (1954). Distributional structure. Word, 10 (2–3), 146–162.CrossRef Harris, Z.S. (1954). Distributional structure. Word, 10 (2–3), 146–162.CrossRef
go back to reference Harris, Z.S. (1968). Mathematical structures of language. New York: Wiley.MATH Harris, Z.S. (1968). Mathematical structures of language. New York: Wiley.MATH
go back to reference Henelius, A., Puolamäki, K., Boström, H., Asker, L., Papapetrou, P. (2014). A peek into the black box: exploring classifiers by randomization. Data Mining and Knowledge Discovery, 28 (5–6), 1503.MathSciNetCrossRef Henelius, A., Puolamäki, K., Boström, H., Asker, L., Papapetrou, P. (2014). A peek into the black box: exploring classifiers by randomization. Data Mining and Knowledge Discovery, 28 (5–6), 1503.MathSciNetCrossRef
go back to reference Hoffart, J., Seufert, S., Nguyen, D.B., Theobald, M., Weikum, G. (2012). Kore: keyphrase overlap relatedness for entity disambiguation. In Proceedings of the 21st ACM international conference on information and knowledge management (pp. 545–554). Hoffart, J., Seufert, S., Nguyen, D.B., Theobald, M., Weikum, G. (2012). Kore: keyphrase overlap relatedness for entity disambiguation. In Proceedings of the 21st ACM international conference on information and knowledge management (pp. 545–554).
go back to reference Hunt, J., & Price, C. (1988). Explaining qualitative diagnosis. Engineering Applications of Artificial Intelligence, 1 (3), 161–169.CrossRef Hunt, J., & Price, C. (1988). Explaining qualitative diagnosis. Engineering Applications of Artificial Intelligence, 1 (3), 161–169.CrossRef
go back to reference Jarmasz, M. (2012). Roget’s thesaurus as a lexical resource for natural language processing. arXiv preprint arXiv:12040140. Jarmasz, M. (2012). Roget’s thesaurus as a lexical resource for natural language processing. arXiv preprint arXiv:12040140.
go back to reference Jiang, Y., Zhang, X., Tang, Y., Nie, R. (2015). Feature-based approaches to semantic similarity assessment of concepts using wikipedia. Info Processing & Management, 51 (3), 215–234.CrossRef Jiang, Y., Zhang, X., Tang, Y., Nie, R. (2015). Feature-based approaches to semantic similarity assessment of concepts using wikipedia. Info Processing & Management, 51 (3), 215–234.CrossRef
go back to reference Kuzi, S., Shtok, A., Kurland, O. (2016). Query expansion using word embeddings. In Proceedings of the 25th ACM international on conference on information and knowledge management (pp. 1929–1932). ACM. Kuzi, S., Shtok, A., Kurland, O. (2016). Query expansion using word embeddings. In Proceedings of the 25th ACM international on conference on information and knowledge management (pp. 1929–1932). ACM.
go back to reference Landauer, T.K., Foltz, P.W, Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25 (2–3), 259–284.CrossRef Landauer, T.K., Foltz, P.W, Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25 (2–3), 259–284.CrossRef
go back to reference Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Proceedings of the NIPS’2014 (pp. 2177–2185). Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Proceedings of the NIPS’2014 (pp. 2177–2185).
go back to reference Levy, O., Goldberg, Y., Ramat-Gan, I. (2014). Linguistic regularities in sparse and explicit word representations. In CoNLL (pp. 171–180). Levy, O., Goldberg, Y., Ramat-Gan, I. (2014). Linguistic regularities in sparse and explicit word representations. In CoNLL (pp. 171–180).
go back to reference Levy, O., Goldberg, Y., Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.CrossRef Levy, O., Goldberg, Y., Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.CrossRef
go back to reference Lipton, Z.C. (2016). The mythos of model interpretability. arXiv preprint arXiv:160603490. Lipton, Z.C. (2016). The mythos of model interpretability. arXiv preprint arXiv:160603490.
go back to reference Liu, Y., Liu, Z., Chua, T.S., Sun, M. (2015). Topical word embeddings. In AAAI (pp. 2418–2424). Liu, Y., Liu, Z., Chua, T.S., Sun, M. (2015). Topical word embeddings. In AAAI (pp. 2418–2424).
go back to reference Lopez-Suarez, A., & Kamel, M. (1994). Dykor: a method for generating the content of explanations in knowledge systems. Knowledge-Based Systems, 7 (3), 177–188.CrossRef Lopez-Suarez, A., & Kamel, M. (1994). Dykor: a method for generating the content of explanations in knowledge systems. Knowledge-Based Systems, 7 (3), 177–188.CrossRef
go back to reference Manning, C.D., Raghavan, P., Schütze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.CrossRefMATH Manning, C.D., Raghavan, P., Schütze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.CrossRefMATH
go back to reference Metzler, D., Dumais, S., Meek, C. (2007). Similarity measures for short segments of text. In European conference on information retrieval (pp. 16–27). Springer. Metzler, D., Dumais, S., Meek, C. (2007). Similarity measures for short segments of text. In European conference on information retrieval (pp. 16–27). Springer.
go back to reference Mihalcea, R., & Tarau, P. (2004). Textrank: bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing. Mihalcea, R., & Tarau, P. (2004). Textrank: bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing.
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781. Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781.
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S, Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Proceedings of the NIPS’2013 (pp. 3111–3119). Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S, Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Proceedings of the NIPS’2013 (pp. 3111–3119).
go back to reference Nikfarjam, A., Sarker, A., O’Connor, K., Ginn, R., Gonzalez, G. (2015). Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. Journal of the American Medical Informatics Association, 22, 671–681. Nikfarjam, A., Sarker, A., O’Connor, K., Ginn, R., Gonzalez, G. (2015). Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. Journal of the American Medical Informatics Association, 22, 671–681.
go back to reference Niu, L., Dai, X., Zhang, J., Chen, J. (2015). Topic2vec: learning distributed representations of topics. In 2015 International conference on asian language processing (IALP) (pp. 193–196). IEEE. Niu, L., Dai, X., Zhang, J., Chen, J. (2015). Topic2vec: learning distributed representations of topics. In 2015 International conference on asian language processing (IALP) (pp. 193–196). IEEE.
go back to reference Page, L., Brin, S., Motwani, R., Winograd, T. (1999). The pagerank citation ranking: bringing order to the web. Tech. rep., Stanford InfoLab. Page, L., Brin, S., Motwani, R., Winograd, T. (1999). The pagerank citation ranking: bringing order to the web. Tech. rep., Stanford InfoLab.
go back to reference Pennington, J., Socher, R., Manning, C.D. (2014). Glove: global vectors for word representation. In Empirical methods in natural language processing (EMNLP) (pp. 1532–1543). Pennington, J., Socher, R., Manning, C.D. (2014). Glove: global vectors for word representation. In Empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
go back to reference Qureshi, M.A. (2015). Utilising wikipedia for text mining applications. PhD thesis, National University of Ireland Galway. Qureshi, M.A. (2015). Utilising wikipedia for text mining applications. PhD thesis, National University of Ireland Galway.
go back to reference Ren, Z., Liang, S., Li, P., Wang, S., de Rijke, M. (2017). Social collaborative viewpoint regression with explainable recommendations. In Proceedings of the tenth ACM international conference on web search and data mining (pp. 485–494). ACM. Ren, Z., Liang, S., Li, P., Wang, S., de Rijke, M. (2017). Social collaborative viewpoint regression with explainable recommendations. In Proceedings of the tenth ACM international conference on web search and data mining (pp. 485–494). ACM.
go back to reference Ribeiro, M.T., Singh, S., Guestrin, C. (2016). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135–1144). ACM. Ribeiro, M.T., Singh, S., Guestrin, C. (2016). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135–1144). ACM.
go back to reference Salton, G., & McGill, M.J. (1986). Introduction to modern information retrieval. New York: McGraw-Hill, Inc.MATH Salton, G., & McGill, M.J. (1986). Introduction to modern information retrieval. New York: McGraw-Hill, Inc.MATH
go back to reference Sari, Y., & Stevenson, M. (2016). Exploring word embeddings and character n-grams for author clustering. In Working notes. CEUR Workshop Proceedings, CLEF. Sari, Y., & Stevenson, M. (2016). Exploring word embeddings and character n-grams for author clustering. In Working notes. CEUR Workshop Proceedings, CLEF.
go back to reference Schütze, H. (1992). Word space. In Proceedings of the NIPS’1992 (pp. 895–902). Schütze, H. (1992). Word space. In Proceedings of the NIPS’1992 (pp. 895–902).
go back to reference Sherkat, E., & Milios, E.E. (2017). Vector embedding of wikipedia concepts and entities. In International conference on applications of natural language to information systems (pp. 418–428). Springer. Sherkat, E., & Milios, E.E. (2017). Vector embedding of wikipedia concepts and entities. In International conference on applications of natural language to information systems (pp. 418–428). Springer.
go back to reference Socher, R., Chen, D., Manning, C.D, Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. In Proceedings of the NIPS’2013 (pp. 926–934). Socher, R., Chen, D., Manning, C.D, Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. In Proceedings of the NIPS’2013 (pp. 926–934).
go back to reference Strube, M., & Ponzetto, S.P. (2006). Wikirelate! Computing semantic relatedness using wikipedia. In Proceedings of the 21st national conference on artificial intelligence (pp. 1419–1424). Strube, M., & Ponzetto, S.P. (2006). Wikirelate! Computing semantic relatedness using wikipedia. In Proceedings of the 21st national conference on artificial intelligence (pp. 1419–1424).
go back to reference Tintarev, N., & Masthoff, J. (2015). Explaining recommendations: design and evaluation. In Recommender systems handbook (pp. 353–382). Springer. Tintarev, N., & Masthoff, J. (2015). Explaining recommendations: design and evaluation. In Recommender systems handbook (pp. 353–382). Springer.
go back to reference van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. JMLR, 9, 2579–2605.MATH van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. JMLR, 9, 2579–2605.MATH
go back to reference Wang, Z., Zhang, J., Feng, J., Chen, Z. (2014). Knowledge graph and text jointly embedding. In EMNLP, Citeseer (Vol. 14, pp. 1591–1601). Wang, Z., Zhang, J., Feng, J., Chen, Z. (2014). Knowledge graph and text jointly embedding. In EMNLP, Citeseer (Vol. 14, pp. 1591–1601).
go back to reference Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L, Hao, H. (2016). Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing, 174, 806–814.CrossRef Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L, Hao, H. (2016). Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing, 174, 806–814.CrossRef
go back to reference Wick, M.R, & Thompson, W.B. (1992). Reconstructive expert system explanation. Artificial Intelligence, 54 (1–2), 33–70.CrossRef Wick, M.R, & Thompson, W.B. (1992). Reconstructive expert system explanation. Artificial Intelligence, 54 (1–2), 33–70.CrossRef
go back to reference Witten, I., & Milne, D. (2008). An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In AAAI workshop on wikipedia and artificial intelligence: an evolving synergy (pp. 25–30). Witten, I., & Milne, D. (2008). An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In AAAI workshop on wikipedia and artificial intelligence: an evolving synergy (pp. 25–30).
go back to reference Wu, F., Song, J., Yang, Y., Li, X., Zhang, Z.M, Zhuang, Y. (2015). Structured embedding via pairwise relations and long-range interactions in knowledge base. In AAAI (pp. 1663–1670). Wu, F., Song, J., Yang, Y., Li, X., Zhang, Z.M, Zhuang, Y. (2015). Structured embedding via pairwise relations and long-range interactions in knowledge base. In AAAI (pp. 1663–1670).
go back to reference Xu, C., Bai, Y., Bian, J., Gao, B., Wang, G., Liu, X., Liu, T.Y. (2014). Rc-net: a general framework for incorporating knowledge into word representations. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 1219–1228). Xu, C., Bai, Y., Bian, J., Gao, B., Wang, G., Liu, X., Liu, T.Y. (2014). Rc-net: a general framework for incorporating knowledge into word representations. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 1219–1228).
go back to reference Yeh, E., Ramage, D., Manning, C.D, Agirre, E., Soroa, A. (2009). Wikiwalk: random walks on wikipedia for semantic relatedness. In Proceedings of the 2009 workshop on graph-based methods for natural language processing (pp. 41–49). Yeh, E., Ramage, D., Manning, C.D, Agirre, E., Soroa, A. (2009). Wikiwalk: random walks on wikipedia for semantic relatedness. In Proceedings of the 2009 workshop on graph-based methods for natural language processing (pp. 41–49).
go back to reference Yu, M., & Dredze, M. (2014). Improving lexical embeddings with semantic knowledge. In ACL (Vol. 2, pp. 545–550). Yu, M., & Dredze, M. (2014). Improving lexical embeddings with semantic knowledge. In ACL (Vol. 2, pp. 545–550).
go back to reference Zesch, T., & Gurevych, I. (2007). Analysis of the wikipedia category graph for nlp applications. In Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007) (pp. 1–8). Zesch, T., & Gurevych, I. (2007). Analysis of the wikipedia category graph for nlp applications. In Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007) (pp. 1–8).
go back to reference Zhang, Y., Lai, G., Zhang, M., Zhang, Y., Liu, Y., Ma, S. (2014). Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval (pp. 83–92). ACM. Zhang, Y., Lai, G., Zhang, M., Zhang, Y., Liu, Y., Ma, S. (2014). Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval (pp. 83–92). ACM.
go back to reference Zheng, G., & Callan, J. (2015). Learning to reweight terms with distributed representations. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (pp. 575–584). ACM. Zheng, G., & Callan, J. (2015). Learning to reweight terms with distributed representations. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (pp. 575–584). ACM.
go back to reference Zuccon, G., Koopman, B., Bruza, P., Azzopardi, L. (2015). Integrating and evaluating neural word embeddings in information retrieval. In Proceedings of the 20th Australasian document computing symposium (p. 12). ACM. Zuccon, G., Koopman, B., Bruza, P., Azzopardi, L. (2015). Integrating and evaluating neural word embeddings in information retrieval. In Proceedings of the 20th Australasian document computing symposium (p. 12). ACM.
Metadata
Title
EVE: explainable vector based embedding technique using Wikipedia
Authors
M. Atif Qureshi
Derek Greene
Publication date
04-06-2018
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 1/2019
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-018-0511-x

Other articles of this Issue 1/2019

Journal of Intelligent Information Systems 1/2019 Go to the issue

Premium Partner