Skip to main content
Top

2019 | OriginalPaper | Chapter

8. Building Chatbot Thesaurus

Author : Boris Galitsky

Published in: Developing Enterprise Chatbots

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We implement a scalable mechanism to build a thesaurus of entities which is intended to improve the relevance of a chatbot. The thesaurus construction process starts from the seed entities and mines available source domains for new entities associated with these seed entities. New entities are formed by applying the machine learning of syntactic parse trees (their generalizations) to the search results for existing entities to form commonalities between them. These commonality expressions then form parameters of existing entities, and are turned into new entities at the next learning iteration. To match natural language expressions between source and target domains, we use syntactic generalization, an operation that finds a set of maximal common sub-trees of the parse trees of these expressions.
Thesaurus and syntactic generalization are applied to relevance improvement in search and text similarity assessment. We conduct an evaluation of the search relevance improvement in vertical and horizontal domains and observe significant contribution of the learned thesaurus in the former, and a noticeable contribution of a hybrid system in the latter domain. We also perform industrial evaluation of thesaurus and syntactic generalization-based text relevance assessment and conclude that a proposed algorithm for automated thesaurus learning is suitable for integration into chatbots. The proposed algorithm is implemented as a component of Apache OpenNLP project.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Alani H, Brewster C (2005) Ontology ranking based on the analysis of concept structures. K-CAP’05 Proceedings of the 3rd international conference on knowledge capture, pp 51–58 Alani H, Brewster C (2005) Ontology ranking based on the analysis of concept structures. K-CAP’05 Proceedings of the 3rd international conference on knowledge capture, pp 51–58
go back to reference Allen JF (1987) Natural language understanding. Benjamin Cummings, Menlo ParkMATH Allen JF (1987) Natural language understanding. Benjamin Cummings, Menlo ParkMATH
go back to reference Amiridze N, Kutsia T (2018) Anti-unification and natural language processing fifth workshop on natural language and computer science, NLCS’18, EasyChair Preprint no. 203 Amiridze N, Kutsia T (2018) Anti-unification and natural language processing fifth workshop on natural language and computer science, NLCS’18, EasyChair Preprint no. 203
go back to reference Blanco-Fernández Y, López-Nores M, Pazos-Arias JJ, García-Duque J (2011) An improvement for semantics-based recommender systems grounded on attaching temporal information to ontologies and user profiles. Eng Appl Artif Intell 24(8):1385–1397CrossRef Blanco-Fernández Y, López-Nores M, Pazos-Arias JJ, García-Duque J (2011) An improvement for semantics-based recommender systems grounded on attaching temporal information to ontologies and user profiles. Eng Appl Artif Intell 24(8):1385–1397CrossRef
go back to reference Buitelaar P, Olejnik D, Sintek M (2003) A proteg’e´ plug-in for ontology extraction from text based on linguistic analysis. In: Proceedings of the international semantic web conference (ISWC) Buitelaar P, Olejnik D, Sintek M (2003) A proteg’e´ plug-in for ontology extraction from text based on linguistic analysis. In: Proceedings of the international semantic web conference (ISWC)
go back to reference Chu B-H, Lee C-E, Ho C-S (2008) An ontology-supported database refurbishing technique and its application in mining actionable troubleshooting rules from real-life databases. Eng Appl Artif Intell 21(8):1430–1442CrossRef Chu B-H, Lee C-E, Ho C-S (2008) An ontology-supported database refurbishing technique and its application in mining actionable troubleshooting rules from real-life databases. Eng Appl Artif Intell 21(8):1430–1442CrossRef
go back to reference Cimiano P, Pivk A, Schmidt-Thieme L, Staab S (2004) Learning taxonomic relations from heterogeneous sources of evidence. In: Buitelaar P, Cimiano P, Magnini B (eds) Ontology learning from text: methods, evaluation and applications. IOS Press, Amsterdam/Berlin Cimiano P, Pivk A, Schmidt-Thieme L, Staab S (2004) Learning taxonomic relations from heterogeneous sources of evidence. In: Buitelaar P, Cimiano P, Magnini B (eds) Ontology learning from text: methods, evaluation and applications. IOS Press, Amsterdam/Berlin
go back to reference De la Rosa JL, Rovira M, Beer M, Montaner M, Gibovic D (2010) Reducing administrative burden by online information and referral services. In: Reddick Austin CG (ed) Citizens and E-government: evaluating policy and management. IGI Global, Hershey, pp 131–157CrossRef De la Rosa JL, Rovira M, Beer M, Montaner M, Gibovic D (2010) Reducing administrative burden by online information and referral services. In: Reddick Austin CG (ed) Citizens and E-government: evaluating policy and management. IGI Global, Hershey, pp 131–157CrossRef
go back to reference Dzikovska M, Swift M, Allen J, de Beaumont W (2005) Generic parsing for multi-domain semantic interpretation. International workshop on parsing technologies (Iwpt05), Vancouver BC Dzikovska M, Swift M, Allen J, de Beaumont W (2005) Generic parsing for multi-domain semantic interpretation. International workshop on parsing technologies (Iwpt05), Vancouver BC
go back to reference Galitsky B (2003) Natural language question answering system: technique of semantic headers. Advanced Knowledge International, Magill Galitsky B (2003) Natural language question answering system: technique of semantic headers. Advanced Knowledge International, Magill
go back to reference Galitsky B (2005) Disambiguation via default rules under answering complex questions. Int J AI Tools 14(1–2):157–175. World ScientificCrossRef Galitsky B (2005) Disambiguation via default rules under answering complex questions. Int J AI Tools 14(1–2):157–175. World ScientificCrossRef
go back to reference Galitsky B (2013) Machine learning of syntactic parse trees for search and classification of text. Eng Appl AI 26(3):1072–1091 Galitsky B (2013) Machine learning of syntactic parse trees for search and classification of text. Eng Appl AI 26(3):1072–1091
go back to reference Galitsky B (2016) Generalization of parse trees for iterative taxonomy learning. Inf Sci 329:125–143CrossRef Galitsky B (2016) Generalization of parse trees for iterative taxonomy learning. Inf Sci 329:125–143CrossRef
go back to reference Galitsky B (2017) Improving relevance in a content pipeline via syntactic generalization. Eng Appl Artif Intell 58:1–26CrossRef Galitsky B (2017) Improving relevance in a content pipeline via syntactic generalization. Eng Appl Artif Intell 58:1–26CrossRef
go back to reference Galitsky B, Kovalerchuk B (2006) Mining the blogosphere for contributors’ sentiments. AAAI Spring symposium: computational approaches to analyzing weblogs, pp 37–39 Galitsky B, Kovalerchuk B (2006) Mining the blogosphere for contributors’ sentiments. AAAI Spring symposium: computational approaches to analyzing weblogs, pp 37–39
go back to reference Galitsky B, Kovalerchuk B (2014) Improving web search relevance with learning structure of domain concepts. Clust Order Trees Methods Appl 92:341–376MathSciNet Galitsky B, Kovalerchuk B (2014) Improving web search relevance with learning structure of domain concepts. Clust Order Trees Methods Appl 92:341–376MathSciNet
go back to reference Galitsky B, Lebedeva N (2015) Recognizing documents versus meta-documents by tree kernel learning. FLAIRS conference, pp 540–545 Galitsky B, Lebedeva N (2015) Recognizing documents versus meta-documents by tree kernel learning. FLAIRS conference, pp 540–545
go back to reference Galitsky B, McKenna EW (2017) Sentiment extraction from consumer reviews for providing product recommendations. US Patent App. 15/489,059 Galitsky B, McKenna EW (2017) Sentiment extraction from consumer reviews for providing product recommendations. US Patent App. 15/489,059
go back to reference Galitsky B, Dobrocsi G, de la Rosa JL, Kuznetsov SO (2010) From generalization of syntactic parse trees to conceptual graphs. ICCS 2010:185–190 Galitsky B, Dobrocsi G, de la Rosa JL, Kuznetsov SO (2010) From generalization of syntactic parse trees to conceptual graphs. ICCS 2010:185–190
go back to reference Galitsky B, Kovalerchuk B, de la Rosa JL (2011a) Assessing plausibility of explanation and meta-explanation in inter-human conflicts. A special issue on semantic-based information and engineering systems. Eng Appl Artif Intell 24(8):1472–1486CrossRef Galitsky B, Kovalerchuk B, de la Rosa JL (2011a) Assessing plausibility of explanation and meta-explanation in inter-human conflicts. A special issue on semantic-based information and engineering systems. Eng Appl Artif Intell 24(8):1472–1486CrossRef
go back to reference Galitsky B, Dobrocsi G, de la Rosa JL, Kuznetsov SO (2011b) Using generalization of syntactic parse trees for taxonomy capture on the web. ICCS 2011:104–117 Galitsky B, Dobrocsi G, de la Rosa JL, Kuznetsov SO (2011b) Using generalization of syntactic parse trees for taxonomy capture on the web. ICCS 2011:104–117
go back to reference Galitsky B, Dobrocsi G, de la Rosa JL (2012) Inferring semantic properties of sentences mining syntactic parse trees. Data Knowl Eng 81:21–45CrossRef Galitsky B, Dobrocsi G, de la Rosa JL (2012) Inferring semantic properties of sentences mining syntactic parse trees. Data Knowl Eng 81:21–45CrossRef
go back to reference Grefenstette G (1994) Explorations in automatic thesaurus discovery. Kluwer Academic, Boston/London/DordrechtCrossRef Grefenstette G (1994) Explorations in automatic thesaurus discovery. Kluwer Academic, Boston/London/DordrechtCrossRef
go back to reference Harris Z (1968) Mathematical structures of language. Wiley, New YorkMATH Harris Z (1968) Mathematical structures of language. Wiley, New YorkMATH
go back to reference Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th international conference on computational linguistics, pp 539–545 Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th international conference on computational linguistics, pp 539–545
go back to reference Howard RW (1992) Classifying types of concept and conceptual structure: some thesauri. J Cogn Psychol 4(2):81–111CrossRef Howard RW (1992) Classifying types of concept and conceptual structure: some thesauri. J Cogn Psychol 4(2):81–111CrossRef
go back to reference Justo AV, dos Reis JC, Calado I, Rodrigues Jensen F (2018) Exploring ontologies to improve the empathy of interactive BotsE. IEEE 27th international conference on enabling technologies: infrastructure for collaborative enterprises (WETICE) Justo AV, dos Reis JC, Calado I, Rodrigues Jensen F (2018) Exploring ontologies to improve the empathy of interactive BotsE. IEEE 27th international conference on enabling technologies: infrastructure for collaborative enterprises (WETICE)
go back to reference Kerschberg L,Kim W, Scime A (2003) A semantic thesaurus-based personalizable meta-search agent. In: Truszkowski W (ed) Innovative concepts for agent-based aystems, vol. LNAI 2564, Lecture notes in artificial intelligence. Springer, Heidelberg, pp 3–31 Kerschberg L,Kim W, Scime A (2003) A semantic thesaurus-based personalizable meta-search agent. In: Truszkowski W (ed) Innovative concepts for agent-based aystems, vol. LNAI 2564, Lecture notes in artificial intelligence. Springer, Heidelberg, pp 3–31
go back to reference Kozareva Z, Hovy E, Riloff E (2009) Learning and evaluating the content and structure of a term thesaurus. Learning by reading and learning to read AAAI Spring symposium 2009. Stanford, CA Kozareva Z, Hovy E, Riloff E (2009) Learning and evaluating the content and structure of a term thesaurus. Learning by reading and learning to read AAAI Spring symposium 2009. Stanford, CA
go back to reference Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of COLING-ACL98, vol 2, pp 768–773 Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of COLING-ACL98, vol 2, pp 768–773
go back to reference Liu J, Birnbaum L (2007) Measuring semantic similarity between named entities by searching the web directory. Web Intell 2007:461–465 Liu J, Birnbaum L (2007) Measuring semantic similarity between named entities by searching the web directory. Web Intell 2007:461–465
go back to reference Liu J, Birnbaum L (2008) What do they think?: aggregating local views about news events and topics. WWW 2008:1021–1022CrossRef Liu J, Birnbaum L (2008) What do they think?: aggregating local views about news events and topics. WWW 2008:1021–1022CrossRef
go back to reference Makhalova T, Dmitry A, Ilvovsky, Galitsky B (2015) News clustering approach based on discourse text structure. In: Proceedings of the first workshop on computing news storylines @ACL Makhalova T, Dmitry A, Ilvovsky, Galitsky B (2015) News clustering approach based on discourse text structure. In: Proceedings of the first workshop on computing news storylines @ACL
go back to reference Moreno A, Valls A, Isern D, Marin L, Borràs J (2012) SigTur/E-destination: ontology-based personalized recommendation of tourism and leisure activities. Eng Appl Artif Intell. Available online 17 Mar 2012 Moreno A, Valls A, Isern D, Marin L, Borràs J (2012) SigTur/E-destination: ontology-based personalized recommendation of tourism and leisure activities. Eng Appl Artif Intell. Available online 17 Mar 2012
go back to reference Moschitti A (2006) Efficient convolution kernels for dependency and constituent syntactic trees. In: Proceedings of the 17th European conference on machine learning, Berlin, Germany Moschitti A (2006) Efficient convolution kernels for dependency and constituent syntactic trees. In: Proceedings of the 17th European conference on machine learning, Berlin, Germany
go back to reference Nissan E (2014) Narratives, formalism, computational tools, and nonlinearity. In: Dershowitz N, Nissan E (eds) Language, culture, computation. Computing of the humanities, law, and narratives. Lecture notes in computer science, vol 8002. Springer, Berlin/Heidelberg Nissan E (2014) Narratives, formalism, computational tools, and nonlinearity. In: Dershowitz N, Nissan E (eds) Language, culture, computation. Computing of the humanities, law, and narratives. Lecture notes in computer science, vol 8002. Springer, Berlin/Heidelberg
go back to reference Pan SJ, Qiang Yang A (2010) Survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef Pan SJ, Qiang Yang A (2010) Survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef
go back to reference Poesio M, Ishikawa T, Schulte im Walde S, Viera R (2002) Acquiring lexical knowledge for anaphora resolution. In: Proceedings of the 3rd conference on language resources and evaluation (LREC) Poesio M, Ishikawa T, Schulte im Walde S, Viera R (2002) Acquiring lexical knowledge for anaphora resolution. In: Proceedings of the 3rd conference on language resources and evaluation (LREC)
go back to reference Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: Proceedings of 24th international conference on machine learning, pp 759–766 Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: Proceedings of 24th international conference on machine learning, pp 759–766
go back to reference Ravichandran D, Hovy E (2002) Learning surface text patterns for a question answering system. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA Ravichandran D, Hovy E (2002) Learning surface text patterns for a question answering system. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA
go back to reference Reinberger ML, Spyns P (2005) Generating and evaluating triples for modelling a virtual environment. OTM workshops, pp 1205–1214 Reinberger ML, Spyns P (2005) Generating and evaluating triples for modelling a virtual environment. OTM workshops, pp 1205–1214
go back to reference Resnik P, Lin J (2010) Evaluation of NLP systems. In: Clark A, Fox C, Lappin S (eds) The handbook of computational linguistics and natural language processing. Wiley-Blackwell, Oxford Resnik P, Lin J (2010) Evaluation of NLP systems. In: Clark A, Fox C, Lappin S (eds) The handbook of computational linguistics and natural language processing. Wiley-Blackwell, Oxford
go back to reference Roth C (2006) Compact, evolving community thesauri using concept lattices ICCS 14 – July 17–21, 2006, Aalborg, DK Roth C (2006) Compact, evolving community thesauri using concept lattices ICCS 14 – July 17–21, 2006, Aalborg, DK
go back to reference Sánchez D (2010) A methodology to learn ontological attributes from the web. Data Knowl Eng 69(6):573–597CrossRef Sánchez D (2010) A methodology to learn ontological attributes from the web. Data Knowl Eng 69(6):573–597CrossRef
go back to reference Sánchez D, Moreno A (2008) Pattern-based automatic thesaurus learning from the web. AI Commun 21(1):27–48MathSciNetMATH Sánchez D, Moreno A (2008) Pattern-based automatic thesaurus learning from the web. AI Commun 21(1):27–48MathSciNetMATH
go back to reference Sano AVD, Imanuel TD, Calista MI, Nindito H, Condrobimo AR (2018) The application of AGNES algorithm to optimize knowledge base for tourism chatbot. International conference on information management and technology (ICIMTech) Sano AVD, Imanuel TD, Calista MI, Nindito H, Condrobimo AR (2018) The application of AGNES algorithm to optimize knowledge base for tourism chatbot. International conference on information management and technology (ICIMTech)
go back to reference Saxena N, Tiwari NK, Husain M (2014) A web search survey: a study for fusion of different sources to determine relevance. 2014 international conference on computing for sustainable global development (INDIACom) Saxena N, Tiwari NK, Husain M (2014) A web search survey: a study for fusion of different sources to determine relevance. 2014 international conference on computing for sustainable global development (INDIACom)
go back to reference Sidorov G (2013) Syntactic dependency based N-grams in rule based automatic English as second language grammar correction. Int J Comput Linguist Appl 4(2):169–188 Sidorov G (2013) Syntactic dependency based N-grams in rule based automatic English as second language grammar correction. Int J Comput Linguist Appl 4(2):169–188
go back to reference Trias A, de la Rosa JL (2013) Survey of social search from the perspective of the village paradigm and online social networks. J Inf Sci 39(5):688–707CrossRef Trias A, de la Rosa JL (2013) Survey of social search from the perspective of the village paradigm and online social networks. J Inf Sci 39(5):688–707CrossRef
go back to reference Trias A, de la Rosa JL, Galitsky B, Drobocsi G (2010) Automation of social networks with QA agents (extended abstract). In: van der Hoek, Kaminka L, Luck, Sen (eds) Proceedings of 9th international conference on autonomous agents and multi-agent systems, AAMAS ‘10, Toronto, pp 1437–1438 Trias A, de la Rosa JL, Galitsky B, Drobocsi G (2010) Automation of social networks with QA agents (extended abstract). In: van der Hoek, Kaminka L, Luck, Sen (eds) Proceedings of 9th international conference on autonomous agents and multi-agent systems, AAMAS ‘10, Toronto, pp 1437–1438
go back to reference Vicient C, Sánchez D, Moreno A (2012) An automatic approach for ontology-based feature extraction from heterogeneous textual resources. Eng Appl Artif Intell. Available online 12 Sept 2012 Vicient C, Sánchez D, Moreno A (2012) An automatic approach for ontology-based feature extraction from heterogeneous textual resources. Eng Appl Artif Intell. Available online 12 Sept 2012
go back to reference Vicient C, Sánchez D, Moreno A (2013) An automatic approach for ontology-based feature extraction from heterogeneous textual resources. Eng Appl Artif Intell 26(3):1092–1106CrossRef Vicient C, Sánchez D, Moreno A (2013) An automatic approach for ontology-based feature extraction from heterogeneous textual resources. Eng Appl Artif Intell 26(3):1092–1106CrossRef
go back to reference Wang K, Ming Z, Chua TS (2009) A syntactic tree matching approach to finding similar questions in community-based QA services. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR ’09). ACM, New York, NY, USA, pp 187–194 Wang K, Ming Z, Chua TS (2009) A syntactic tree matching approach to finding similar questions in community-based QA services. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR ’09). ACM, New York, NY, USA, pp 187–194
Metadata
Title
Building Chatbot Thesaurus
Author
Boris Galitsky
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-04299-8_8

Premium Partner