Skip to main content

2017 | OriginalPaper | Buchkapitel

Gradually Improving the Computation of Semantic Textual Similarity in Portuguese

verfasst von : Hugo Gonçalo Oliveira, Ana Oliveira Alves, Ricardo Rodrigues

Erschienen in: Progress in Artificial Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

There is much research on Semantic Textual Similarity (STS) in English, specially since its inclusion in the SemEval evaluations. For other languages, it is not as common, mostly due to the unavailability of benchmarks. Recently, the ASSIN shared task targeted STS in Portuguese and released training and test collections. This paper describes an incremental approach to ASSIN, where the computed similarity is gradually improved by exploiting different features (e.g., token overlap, semantic relations, chunks, and negation) and approaches. The best reported results, obtained with a supervised approach, would get second place overall in ASSIN.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agirre, E., Banea, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Mihalcea, R., Rigau, G., Wiebe, J.: Semeval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, pp. 497–511. ACL Press, June 2016 Agirre, E., Banea, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Mihalcea, R., Rigau, G., Wiebe, J.: Semeval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, pp. 497–511. ACL Press, June 2016
2.
Zurück zum Zitat Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: a pilot on semantic textual similarity. In: Proceedings of the 1st Joint Conference on Lexical and Computational Semantics, vol. 1: Proceedings of the Main Conference and the Shared Task, and Proceedings of the Sixth International Workshop on Semantic Evaluation, vol. 2, pp. 385–393. ACL Press (2012) Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: a pilot on semantic textual similarity. In: Proceedings of the 1st Joint Conference on Lexical and Computational Semantics, vol. 1: Proceedings of the Main Conference and the Shared Task, and Proceedings of the Sixth International Workshop on Semantic Evaluation, vol. 2, pp. 385–393. ACL Press (2012)
3.
Zurück zum Zitat Fonseca, E., Santos, L., Criscuolo, M., Aluísio, S.: Visão geral da avaliação de similaridade semântica e inferência textual. Linguamática 8(2), 3–13 (2016) Fonseca, E., Santos, L., Criscuolo, M., Aluísio, S.: Visão geral da avaliação de similaridade semântica e inferência textual. Linguamática 8(2), 3–13 (2016)
4.
Zurück zum Zitat Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge (1998)MATH Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge (1998)MATH
5.
Zurück zum Zitat Rychalska, B., Pakulska, K., Chodorowska, K., Walczak, W., Andruszkiewicz, P.: Samsung Poland NLP team at SemEval-2016 task 1: necessity for diversity; combining recursive autoencoders, wordnet and ensemble methods to measure semantic similarity. In: Proceedings of 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, pp. 602–608. ACL Press, June 2016 Rychalska, B., Pakulska, K., Chodorowska, K., Walczak, W., Andruszkiewicz, P.: Samsung Poland NLP team at SemEval-2016 task 1: necessity for diversity; combining recursive autoencoders, wordnet and ensemble methods to measure semantic similarity. In: Proceedings of 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, pp. 602–608. ACL Press, June 2016
6.
Zurück zum Zitat Brychcín, T., Svoboda, L.: UWB at semeval-2016 task 1: semantic textual similarity using lexical, syntactic, and semantic information. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, pp. 588–594. ACL Press, June 2016 Brychcín, T., Svoboda, L.: UWB at semeval-2016 task 1: semantic textual similarity using lexical, syntactic, and semantic information. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, pp. 588–594. ACL Press, June 2016
7.
Zurück zum Zitat Hänig, C., Remus, R., de la Puente, X.: ExB themis: extensive feature extraction from word alignments for semantic textual similarity. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, pp. 264–268. ACL Press, June 2015 Hänig, C., Remus, R., de la Puente, X.: ExB themis: extensive feature extraction from word alignments for semantic textual similarity. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, pp. 264–268. ACL Press, June 2015
8.
Zurück zum Zitat Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, pp. 1–8. ACL Press, August 2014 Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, pp. 1–8. ACL Press, August 2014
9.
Zurück zum Zitat Zhao, J., Zhu, T., Lan, M.: ECNU: one stone two birds: ensemble of heterogenous measures for semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, pp. 271–277. ACL Press, August 2014 Zhao, J., Zhu, T., Lan, M.: ECNU: one stone two birds: ensemble of heterogenous measures for semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, pp. 271–277. ACL Press, August 2014
10.
Zurück zum Zitat Alves, A., Ferrugento, A., Lourenço, M., Rodrigues, F.: ASAP: automatic semantic alignment for phrases. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, pp. 104–108. ACL Press, August 2014 Alves, A., Ferrugento, A., Lourenço, M., Rodrigues, F.: ASAP: automatic semantic alignment for phrases. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, pp. 104–108. ACL Press, August 2014
11.
Zurück zum Zitat Alves, A., Simões, D., Gonçalo Oliveira, H., Ferrugento, A.: ASAP-II: from the alignment of phrases to textual similarity. In: Proceedings of 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, pp. 184–189. ACL Press, June 2015 Alves, A., Simões, D., Gonçalo Oliveira, H., Ferrugento, A.: ASAP-II: from the alignment of phrases to textual similarity. In: Proceedings of 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, pp. 184–189. ACL Press, June 2015
12.
Zurück zum Zitat Pinheiro, V., Furtado, V., Albuquerque, A.: Semantic textual similarity of portuguese-language texts: an approach based on the semantic inferentialism model. In: Proceedings of the 11th Conference on the Computational Processing of the Portuguese Language, PROPOR 2014, São Carlos/SP, Brazil, pp. 183–188, 6–8 October 2014 (2014) Pinheiro, V., Furtado, V., Albuquerque, A.: Semantic textual similarity of portuguese-language texts: an approach based on the semantic inferentialism model. In: Proceedings of the 11th Conference on the Computational Processing of the Portuguese Language, PROPOR 2014, São Carlos/SP, Brazil, pp. 183–188, 6–8 October 2014 (2014)
13.
Zurück zum Zitat Hartmann, N.: Solo queue at ASSIN: combinando abordagens tradicionais e emergentes. Linguamática 8(2), 59–64 (2016) Hartmann, N.: Solo queue at ASSIN: combinando abordagens tradicionais e emergentes. Linguamática 8(2), 59–64 (2016)
14.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the Workshop track of the International Conference on Learning Representations (ICLR), Scottsdale, Arizona (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the Workshop track of the International Conference on Learning Representations (ICLR), Scottsdale, Arizona (2013)
15.
Zurück zum Zitat Fialho, P., Marques, R., Martins, B., Coheur, L., Quaresma, P.: INESC-ID@ASSIN: medição de similaridade semântica e reconhecimento de inferência textual. Linguamática 8(2), 33–42 (2016) Fialho, P., Marques, R., Martins, B., Coheur, L., Quaresma, P.: INESC-ID@ASSIN: medição de similaridade semântica e reconhecimento de inferência textual. Linguamática 8(2), 33–42 (2016)
16.
Zurück zum Zitat Alves, A., Gonçalo Oliveira, H., Rodrigues, R.: ASAPP: alinhamento semântico automático de palavras aplicado ao português. Linguamçtica 8(2), 43–58 (2016) Alves, A., Gonçalo Oliveira, H., Rodrigues, R.: ASAPP: alinhamento semântico automático de palavras aplicado ao português. Linguamçtica 8(2), 43–58 (2016)
17.
Zurück zum Zitat Rodrigues, R., Gonçalo-Oliveira, H., Gomes, P.: LemPORT: a high-accuracy cross-platform lemmatizer for portuguese. In: Proceedings of the 3rd Symposium on Languages, Applications and Technologies (SLATE 2014), OASICS, Germany, Schloss Dagstuhl–Leibniz-Zentrum für Informatik, pp. 267–274. Dagstuhl Publishing, June 2014 Rodrigues, R., Gonçalo-Oliveira, H., Gomes, P.: LemPORT: a high-accuracy cross-platform lemmatizer for portuguese. In: Proceedings of the 3rd Symposium on Languages, Applications and Technologies (SLATE 2014), OASICS, Germany, Schloss Dagstuhl–Leibniz-Zentrum für Informatik, pp. 267–274. Dagstuhl Publishing, June 2014
18.
Zurück zum Zitat Dias-da-Silva, B.C.: Wordnet.Br: an exercise of human language technology research. In: Proceedings of 3rd International WordNet Conference (GWC), GWC 2006, South Jeju Island, Korea, pp. 301–303, January 2006 Dias-da-Silva, B.C.: Wordnet.Br: an exercise of human language technology research. In: Proceedings of 3rd International WordNet Conference (GWC), GWC 2006, South Jeju Island, Korea, pp. 301–303, January 2006
19.
Zurück zum Zitat Paiva, V., Rademaker, A., Melo, G.: OpenWordNet-PT: an open Brazilian wordnet for reasoning. In: Proceedings of 24th International Conference on Computational Linguistics, COLING (Demo Paper) (2012) Paiva, V., Rademaker, A., Melo, G.: OpenWordNet-PT: an open Brazilian wordnet for reasoning. In: Proceedings of 24th International Conference on Computational Linguistics, COLING (Demo Paper) (2012)
20.
Zurück zum Zitat Simões, A., Guinovart, X.G.: Bootstrapping a Portuguese wordnet from Galician, Spanish and English wordnets. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS, vol. 8854, pp. 239–248. Springer, Cham (2014). doi:10.1007/978-3-319-13623-3_25CrossRef Simões, A., Guinovart, X.G.: Bootstrapping a Portuguese wordnet from Galician, Spanish and English wordnets. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS, vol. 8854, pp. 239–248. Springer, Cham (2014). doi:10.​1007/​978-3-319-13623-3_​25CrossRef
21.
Zurück zum Zitat Maziero, E., Pardo, T., Felippo, A., Dias-da-Silva, B.: A Base de Dados Lexical e a Interface Web do TeP 2.0 - Thesaurus Eletrônico para o Português do Brasil. In: VI Workshop em Tecnologia da Informação e da Linguagem Humana (TIL), pp. 390–392 (2008) Maziero, E., Pardo, T., Felippo, A., Dias-da-Silva, B.: A Base de Dados Lexical e a Interface Web do TeP 2.0 - Thesaurus Eletrônico para o Português do Brasil. In: VI Workshop em Tecnologia da Informação e da Linguagem Humana (TIL), pp. 390–392 (2008)
22.
Zurück zum Zitat Gonçalo Oliveira, H., Santos, D., Gomes, P., Seco, N.: PAPEL: a dictionary-based lexical ontology for Portuguese. In: Teixeira, A., Lima, V.L.S., Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS, vol. 5190, pp. 31–40. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85980-2_4CrossRef Gonçalo Oliveira, H., Santos, D., Gomes, P., Seco, N.: PAPEL: a dictionary-based lexical ontology for Portuguese. In: Teixeira, A., Lima, V.L.S., Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS, vol. 5190, pp. 31–40. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-85980-2_​4CrossRef
23.
Zurück zum Zitat Simões, A., Sanromán, Á.I., Almeida, J.J.: Dicionário-Aberto: a source of resources for the Portuguese language processing. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS, vol. 7243, pp. 121–127. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28885-2_14CrossRef Simões, A., Sanromán, Á.I., Almeida, J.J.: Dicionário-Aberto: a source of resources for the Portuguese language processing. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS, vol. 7243, pp. 121–127. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-28885-2_​14CrossRef
24.
Zurück zum Zitat Barreiro, A.: Port4NooJ: an open source, ontology-driven portuguese linguistic system with applications in machine translation. In: Proceedings of the 2008 International NooJ Conference (NooJ 2008), Budapest, Hungary, Newcastle-upon-Tyne: Cambridge Scholars Publishing (2010) Barreiro, A.: Port4NooJ: an open source, ontology-driven portuguese linguistic system with applications in machine translation. In: Proceedings of the 2008 International NooJ Conference (NooJ 2008), Budapest, Hungary, Newcastle-upon-Tyne: Cambridge Scholars Publishing (2010)
25.
Zurück zum Zitat Gonçalo Oliveira, H.: Comparing and combining Portuguese lexical-semantic knowledge bases. In: Proceedings of 6th Symposium on Languages, Applications and Technologies (SLATE 2017), OASICS, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. pp. 16: 1–16: 14 (2017) Gonçalo Oliveira, H.: Comparing and combining Portuguese lexical-semantic knowledge bases. In: Proceedings of 6th Symposium on Languages, Applications and Technologies (SLATE 2017), OASICS, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. pp. 16: 1–16: 14 (2017)
26.
Zurück zum Zitat Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRef
28.
Zurück zum Zitat Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)CrossRef Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)CrossRef
29.
Zurück zum Zitat Mackay, D.: Introduction to Gaussian processes. In: Bishop, C.M. (ed.) Neural Networks and Machine Learning. Springer, Berlin (1998) Mackay, D.: Introduction to Gaussian processes. In: Bishop, C.M. (ed.) Neural Networks and Machine Learning. Springer, Berlin (1998)
30.
Zurück zum Zitat Rodrigues, J., Branco, A., Neale, S., Silva, J.: LX-DSemVectors: distributional semantics models for Portuguese. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 259–270. Springer, Cham (2016). doi:10.1007/978-3-319-41552-9_27CrossRef Rodrigues, J., Branco, A., Neale, S., Silva, J.: LX-DSemVectors: distributional semantics models for Portuguese. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 259–270. Springer, Cham (2016). doi:10.​1007/​978-3-319-41552-9_​27CrossRef
31.
Zurück zum Zitat Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
Metadaten
Titel
Gradually Improving the Computation of Semantic Textual Similarity in Portuguese
verfasst von
Hugo Gonçalo Oliveira
Ana Oliveira Alves
Ricardo Rodrigues
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-65340-2_68