Top

Published in:

2019 | OriginalPaper | Chapter

Efficient Answer-Annotation for Frequent Questions

Authors : Markus Zlabinger, Navid Rekabsaz, Stefan Zlabinger, Allan Hanbury

Published in: Experimental IR Meets Multilinguality, Multimodality, and Interaction

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Ground truth is a crucial resource for the creation of effective question-answering (Q-A) systems. When no appropriate ground truth is available, as it is often the case in domain-specific Q-A systems (e.g. customer-support, tourism) or in languages other than English, new ground truth can be created by human annotation. The annotation process in which a human annotator looks up the corresponding answer label for each question from an answer catalog (\(\textsc {Sequential}\) approach), however, is usually time-consuming and costly. In this paper, we propose a new approach, in which the annotator first manually groups questions that have the same intent as a candidate question, and then, labels the entire group in one step (\(\textsc {Group}\text {-}\textsc {Wise}\) approach). To retrieve same-intent questions effectively, we evaluate various unsupervised semantic similarity methods from recent literature, and implement the most effective one in our annotation approach. Afterwards, we compare the \(\textsc {Group}\text {-}\textsc {Wise}\) approach with the \(\textsc {Sequential}\) approach with respect to answer look-ups, annotation time, and label-quality. We show based on 500 German customer-support questions that the \(\textsc {Group}\text {-}\textsc {Wise}\) approach requires 51% fewer answer look-ups, is 41% more time-efficient, and retains the same label-quality as the \(\textsc {Sequential}\) approach. Note that the described approach is limited to Q-A systems where frequently asked questions occur.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Generating Cross-Domain Text Classification Corpora from Social Media Comments

next chapter Improving Ranking for Systematic Reviews Using Query Adaptation

https://github.com/epfml/sent2vec.

Questions are translated from German.

Arora, S., Liang, Y., Ma, T.: Simple but tough-to-beat baseline for sentence embeddings. In: International Conference on Learning Representations (2017)

Bajaj, P., et al.: MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 [cs], November 2016

Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information. arXiv:1607.04606 [cs], July 2016

Chahuara, P., Lampert, T., Gancarski, P.: Retrieving and Ranking Similar Questions from Question-Answer Archives Using Topic Modelling and Topic Distribution Regression. arXiv:1606.03783 [cs], June 2016

Charlet, D., Damnati, G.: SimBow at SemEval-2017 task 3: soft-cosine semantic similarity between questions for community question answering. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 315–319. Association for Computational Linguistics, Vancouver, August 2017

Franco-Salvador, M., Kar, S., Solorio, T., Rosso, P.: UH-PRHLT at SemEval-2016 task 3: combining lexical and semantic-based features for community question answering. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 814–821. Association for Computational Linguistics, San Diego, June 2016

Goyal, N.: LearningToQuestion at SemEval 2017 task 3: ranking similar questions by learning to rank using rich features. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 310–314 (2017)

Hazem, A., El Amal Boussaha, B., Hernandez, N.: MappSent: a textual mapping approach for question-to-question similarity. In: Recent Advances in Natural Language Processing Meet Deep Learning, RANLP 2017, pp. 291–300, November 2017

Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)

10.

McHugh, M.L.: Interrater reliability: the kappa statistic. Biochemia Medica 22(3), 276–282 (2012)MathSciNetCrossRef

11.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

12.

Nakov, P., et al.: SemEval-2017 task 3: community question answering. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 27–48. Association for Computational Linguistics, Vancouver, August 2017

13.

Nakov, P., et al.: SemEval-2016 task 3: community question answering. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 525–545. Association for Computational Linguistics, San Diego, June 2016

14.

Pagliardini, M., Gupta, P., Jaggi, M.: Unsupervised learning of sentence embeddings using compositional n-gram features. In: Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2018 (2018)

15.

Pathak, S., Mishra, N.: Context aware restricted tourism domain question answering system. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), pp. 534–539, October 2016. https://doi.org/10.1109/NGCT.2016.7877473

16.

Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv:1606.05250 [cs], June 2016

17.

Wang, D.S.: A domain-specific question answering system based on ontology and question templates. In: 2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 151–156, June 2010. https://doi.org/10.1109/SNPD.2010.31

18.

Wen, J.R., Nie, J.Y., Zhang, H.J.: Query clustering using user logs. ACM Trans. Inf. Syst. 20(1), 23 (2002)CrossRef

19.

Wu, C.H., Yeh, J.F., Chen, M.J.: Domain-specific FAQ retrieval using independent aspects. ACM Trans. Asian Lang. Inf. Process. 4(1), 17 (2005)CrossRef

Title: Efficient Answer-Annotation for Frequent Questions
Authors: Markus Zlabinger
Navid Rekabsaz
Stefan Zlabinger
Allan Hanbury
Publisher: Springer International Publishing
Book: Experimental IR Meets Multilinguality, Multimodality, and Interaction
Print ISBN: 978-3-030-28576-0

Electronic ISBN: 978-3-030-28577-7

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-030-28577-7_8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner