nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

Sentence Paraphrase Graphs: Classification Based on Predictive Models or Annotators’ Decisions?

verfasst von : Ekaterina Pronoza, Elena Yagunova, Nataliya Kochetkova

Erschienen in: Advances in Computational Intelligence

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

As part of our project ParaPhraser on the identification and classification of Russian paraphrase, we have collected a corpus of more than 8000 sentence pairs annotated as precise, loose or non-paraphrases. The corpus is annotated via crowdsourcing by naïve native Russian speakers, but from the point of view of the expert, our complex paraphrase detection model can be more successful at predicting paraphrase class than a naive native speaker.

Our paraphrase corpus is collected from news headlines and therefore can be considered a summarized news stream describing the most important events. By building a graph of paraphrases, we can detect such events.

In this paper we construct two such graphs: based on the current human annotation and on the complex model prediction. The structure of the graphs is compared and analyzed and it is shown that the model graph has larger connected components which give a more complete picture of the important events than the human annotation graph. Predictive model appears to be better at capturing full information about the important events from the news collection than human annotators.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Characteristics of Most Frequent Spanish Verb-Noun Combinations

Nächstes Kapitel Mathematical Model of an Ontological-Semantic Analyzer Using Basic Ontological-Semantic Patterns

http://scikit-learn.org.

Since the second half of the corpus is already annotated, actually we do not need any prediction here, but to be able to compare the graphs we have to construct them on the same data, and that is why we use model prediction.

Moreover, we only work with news headlines, and better results in the detection of the same events could be achieved by taking into account the bodies of the news reports as well. We believe that current results (i.e., model performance) are acceptable for building adequate paraphrase graph based on the corpus.

https://www.yworks.com/products/yed.

Alexandrov, M., Gelbukh, A., Rosso, P.: An approach to clustering abstracts. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 275–285. Springer, Heidelberg (2005). doi:10.1007/11428817_25 CrossRef

Braslavski, P., Ustalov, D., Mukhin, M.: A spinning wheel for YARN: user interface for a crowdsourced thesaurus. In: Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, pp. 101–104 (2014)

Clough, P., Gaizauskas, R., Piao, S., Wilks, Y.: METER: MEasuring TExt Reuse. In: Isabelle, P. (ed.) Proceedings of the Fortieth Annual Meeting on Association for Computational Linguistics, pp. 152–159. Association for Computational Linguistics, Philadelphia (2002)

Cohn, T., Callison-Burch, C., Lapata, M.: Constructing corpora for the development and evaluation of paraphrase systems. Comput. Linguist. Arch. 34(4), 597–614 (2008)CrossRef

Dolan, B., Quirk, C., Brockett, C.: Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 350–356 (2004)

Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: 11th Annual Research Colloqium on Computational Linguistics UK (CLUK 2008) (2008)

Gelbukh, A., Sidorov, G., Guzmán-Arenas, A.: A method of describing document contents through topic selection. In: Proceedings of the String Processing and Information Retrieval Symposium and International Workshop on Groupware, pp. 73–80 (1999)

Guha, R., Kumar R., Sivakumar, D., Sundaram, R.: Unweaving a web of documents. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 574–579 (2005)

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). http://arxiv.org/abs/1301.3781/

10.

Moe, R.E.: Clustering in a news corpus. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 301–307. Springer, Cham (2014). doi:10.1007/978-3-319-10816-2_37

11.

Norwegian Newspaper Corpus. http://avis.uib.no/om-aviskorpuset/english

12.

Pronoza, E., Yagunova, E., Pronoza, A.: Construction of a Russian paraphrase corpus: unsupervised paraphrase extraction. In: Braslavski, P., Markov, I., Pardalos, P., Volkovich, Y., Ignatov, Dmitry I., Koltsov, S., Koltsova, O. (eds.) RuSSIR 2015. CCIS, vol. 573, pp. 146–157. Springer, Cham (2016). doi:10.1007/978-3-319-41718-9_8 CrossRef

13.

Pronoza, E., Yagunova, E.: Low-level features for paraphrase identification. In: Sidorov, G., Galicia-Haro, Sofía N. (eds.) MICAI 2015. LNCS, vol. 9413, pp. 59–71. Springer, Cham (2015). doi:10.1007/978-3-319-27060-9_5 CrossRef

14.

Pronoza, E., Yagunova, E.: Comparison of sentence similarity measures for Russian paraphrase identification. In: Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT), pp. 74–82 (2015)

15.

Sidorov, G., Gelbukh, A., Gómez-Adorno, H., Pinto, D.: Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas 18(3), 491–504 (2014)CrossRef

16.

Tihonov, A. N.: Slovoobrazovatelnij Slovar’ Russkogo Yazika v Dvuh Tomah: Ok 145000 Slov. Moscow, Russkiy Yazik, vol. 1, 854 p.; vol. 2, 885 p. (1985)

17.

Xu, W., Ritter, A., Grishman, R.: Gathering and generating paraphrases from twitter with application to normalization. In: Proceedings of the Sixth Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, pp. 121–128, August 2013

Titel: Sentence Paraphrase Graphs: Classification Based on Predictive Models or Annotators’ Decisions?
verfasst von: Ekaterina Pronoza
Elena Yagunova
Nataliya Kochetkova
Verlag: Springer International Publishing
Buch: Advances in Computational Intelligence
Print ISBN: 978-3-319-62433-4

Electronic ISBN: 978-3-319-62434-1

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-3-319-62434-1_4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner