Skip to main content

2017 | OriginalPaper | Buchkapitel

Unsupervised Approaches for Computing Word Similarity in Portuguese

verfasst von : Hugo Gonçalo Oliveira

Erschienen in: Progress in Artificial Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents several approaches for computing word similarity in Portuguese and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, also recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. For instance, distributional models seem to capture relatedness better, but LKBs are better suited for computing genuine similarity.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Banjade, R., Maharjan, N., Niraula, N.B., Rus, V., Gautam, D.: Lemon and tea are not similar: measuring word-to-word similarity by combining different methods. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 335–346. Springer, Cham (2015). doi:10.1007/978-3-319-18111-0_25CrossRef Banjade, R., Maharjan, N., Niraula, N.B., Rus, V., Gautam, D.: Lemon and tea are not similar: measuring word-to-word similarity by combining different methods. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 335–346. Springer, Cham (2015). doi:10.​1007/​978-3-319-18111-0_​25CrossRef
2.
Zurück zum Zitat Barreiro, A.: ParaMT: a paraphraser for machine translation. In: Teixeira, A., Lima, V.L.S., Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS, vol. 5190, pp. 202–211. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85980-2_21CrossRef Barreiro, A.: ParaMT: a paraphraser for machine translation. In: Teixeira, A., Lima, V.L.S., Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS, vol. 5190, pp. 202–211. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-85980-2_​21CrossRef
3.
Zurück zum Zitat Barreiro, A.: Port4NooJ: an open source, ontology-driven Portuguese linguistic system with applications in machine translation. In: Proceedings of the 2008 International NooJ Conference (NooJ 2008). Newcastle-upon-Tyne: Cambridge Scholars Publishing, Budapest, Hungary (2010) Barreiro, A.: Port4NooJ: an open source, ontology-driven Portuguese linguistic system with applications in machine translation. In: Proceedings of the 2008 International NooJ Conference (NooJ 2008). Newcastle-upon-Tyne: Cambridge Scholars Publishing, Budapest, Hungary (2010)
4.
Zurück zum Zitat Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016) Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:​1607.​04606 (2016)
5.
Zurück zum Zitat Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)CrossRef Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)CrossRef
6.
Zurück zum Zitat Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20(1), 116–131 (2002)CrossRef Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20(1), 116–131 (2002)CrossRef
7.
Zurück zum Zitat Fonseca, E.R., dos Santos, L.B., Criscuolo, M., Aluísio, S.M.: Visão geral da avaliação de similaridade semântica e inferência textual. Linguamática 8(2), 3–13 (2016) Fonseca, E.R., dos Santos, L.B., Criscuolo, M., Aluísio, S.M.: Visão geral da avaliação de similaridade semântica e inferência textual. Linguamática 8(2), 3–13 (2016)
8.
Zurück zum Zitat Gonçalo Oliveira, H.: CONTO.PT: groundwork for the automatic creation of a fuzzy Portuguese wordnet. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 283–295. Springer, Cham (2016). doi:10.1007/978-3-319-41552-9_29CrossRef Gonçalo Oliveira, H.: CONTO.PT: groundwork for the automatic creation of a fuzzy Portuguese wordnet. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 283–295. Springer, Cham (2016). doi:10.​1007/​978-3-319-41552-9_​29CrossRef
9.
Zurück zum Zitat Gonçalo Oliveira, H.: Comparing and combining Portuguese lexical-semantic knowledge bases. In: Proceedings of the 6th Symposium on Languages, Applications and Technologies (SLATE 2017), pp. 16:1–16:14. OASICS, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017) Gonçalo Oliveira, H.: Comparing and combining Portuguese lexical-semantic knowledge bases. In: Proceedings of the 6th Symposium on Languages, Applications and Technologies (SLATE 2017), pp. 16:1–16:14. OASICS, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017)
10.
Zurück zum Zitat Gonçalo Oliveira, H., Santos, D., Gomes, P., Seco, N.: PAPEL: a dictionary-based lexical ontology for Portuguese. In: Teixeira, A., Lima, V.L.S., Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS, vol. 5190, pp. 31–40. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85980-2_4CrossRef Gonçalo Oliveira, H., Santos, D., Gomes, P., Seco, N.: PAPEL: a dictionary-based lexical ontology for Portuguese. In: Teixeira, A., Lima, V.L.S., Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS, vol. 5190, pp. 31–40. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-85980-2_​4CrossRef
11.
Zurück zum Zitat Granada, R., Trojahn, C., Vieira, R.: Comparing semantic relatedness between word pairs in Portuguese using wikipedia. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.G. (eds.) PROPOR 2014. LNCS, vol. 8775, pp. 170–175. Springer, Cham (2014). doi:10.1007/978-3-319-09761-9_17CrossRef Granada, R., Trojahn, C., Vieira, R.: Comparing semantic relatedness between word pairs in Portuguese using wikipedia. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.G. (eds.) PROPOR 2014. LNCS, vol. 8775, pp. 170–175. Springer, Cham (2014). doi:10.​1007/​978-3-319-09761-9_​17CrossRef
12.
Zurück zum Zitat Harris, Z.: Distributional structure. Word 10(2–3), 146–162 (1954)CrossRef Harris, Z.: Distributional structure. Word 10(2–3), 146–162 (1954)CrossRef
13.
Zurück zum Zitat Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with genuine similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)MathSciNetCrossRef Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with genuine similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)MathSciNetCrossRef
14.
Zurück zum Zitat Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC 1986), NY, USA, pp. 24–26 (1986) Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC 1986), NY, USA, pp. 24–26 (1986)
15.
Zurück zum Zitat Luong, T., Socher, R., Manning, C.: Better word representations with recursive neural networks for morphology. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 104–113. ACL Press, Sofia, Bulgaria, August 2013 Luong, T., Socher, R., Manning, C.: Better word representations with recursive neural networks for morphology. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 104–113. ACL Press, Sofia, Bulgaria, August 2013
16.
Zurück zum Zitat Maziero, E.G., Pardo, T.A.S., Felippo, A.D., Dias-da-Silva, B.C.: A Base de Dados Lexical e a Interface Web do TeP 2.0 - Thesaurus Eletrônico para o Português do Brasil. In: VI Workshop em Tecnologia da Informação e Linguagem Humana, pp. 390–392. TIL (2008) Maziero, E.G., Pardo, T.A.S., Felippo, A.D., Dias-da-Silva, B.C.: A Base de Dados Lexical e a Interface Web do TeP 2.0 - Thesaurus Eletrônico para o Português do Brasil. In: VI Workshop em Tecnologia da Informação e Linguagem Humana, pp. 390–392. TIL (2008)
17.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the Workshop Track of the International Conference on Learning Representations (ICLR), Scottsdale, Arizona (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the Workshop Track of the International Conference on Learning Representations (ICLR), Scottsdale, Arizona (2013)
18.
Zurück zum Zitat de Paiva, V., Rademaker, A., de Melo, G.: OpenWordNet-PT: an open brazilian wordnet for reasoning. In: Proceedings of 24th International Conference on Computational Linguistics. COLING (Demo Paper) (2012) de Paiva, V., Rademaker, A., de Melo, G.: OpenWordNet-PT: an open brazilian wordnet for reasoning. In: Proceedings of 24th International Conference on Computational Linguistics. COLING (Demo Paper) (2012)
19.
Zurück zum Zitat Pilehvar, M.T., Jurgens, D., Navigli, R.: Align, disambiguate and walk: a unified approach for measuring semantic similarity. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, Sofia, Bulgaria, vol. 1: Long Papers, pp. 1341–1351. ACL Press (2013) Pilehvar, M.T., Jurgens, D., Navigli, R.: Align, disambiguate and walk: a unified approach for measuring semantic similarity. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, Sofia, Bulgaria, vol. 1: Long Papers, pp. 1341–1351. ACL Press (2013)
20.
Zurück zum Zitat Pilehvar, M.T., Navigli, R.: From senses to texts: an all-in-one graph-based approach for measuring semantic similarity. Artif. Intell. 228, 95–128 (2015)MathSciNetCrossRef Pilehvar, M.T., Navigli, R.: From senses to texts: an all-in-one graph-based approach for measuring semantic similarity. Artif. Intell. 228, 95–128 (2015)MathSciNetCrossRef
21.
Zurück zum Zitat Pinheiro, V., Furtado, V., Albuquerque, A.: Semantic textual similarity of Portuguese-language texts: an approach based on the semantic inferentialism model. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.G. (eds.) PROPOR 2014. LNCS, vol. 8775, pp. 183–188. Springer, Cham (2014). doi:10.1007/978-3-319-09761-9_19CrossRef Pinheiro, V., Furtado, V., Albuquerque, A.: Semantic textual similarity of Portuguese-language texts: an approach based on the semantic inferentialism model. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.G. (eds.) PROPOR 2014. LNCS, vol. 8775, pp. 183–188. Springer, Cham (2014). doi:10.​1007/​978-3-319-09761-9_​19CrossRef
22.
Zurück zum Zitat Rodrigues, J., Branco, A., Neale, S., Silva, J.: LX-DSemVectors: distributional semantics models for Portuguese. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 259–270. Springer, Cham (2016). doi:10.1007/978-3-319-41552-9_27CrossRef Rodrigues, J., Branco, A., Neale, S., Silva, J.: LX-DSemVectors: distributional semantics models for Portuguese. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 259–270. Springer, Cham (2016). doi:10.​1007/​978-3-319-41552-9_​27CrossRef
23.
Zurück zum Zitat Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)CrossRef Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)CrossRef
24.
Zurück zum Zitat Simões, A., Sanromán, Á.I., Almeida, J.J.: Dicionário-Aberto: a source of resources for the Portuguese language processing. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS, vol. 7243, pp. 121–127. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28885-2_14CrossRef Simões, A., Sanromán, Á.I., Almeida, J.J.: Dicionário-Aberto: a source of resources for the Portuguese language processing. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS, vol. 7243, pp. 121–127. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-28885-2_​14CrossRef
25.
Zurück zum Zitat Simões, A., Guinovart, X.G.: Bootstrapping a Portuguese wordnet from galician, spanish and english wordnets. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS, vol. 8854, pp. 239–248. Springer, Cham (2014). doi:10.1007/978-3-319-13623-3_25CrossRef Simões, A., Guinovart, X.G.: Bootstrapping a Portuguese wordnet from galician, spanish and english wordnets. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS, vol. 8854, pp. 239–248. Springer, Cham (2014). doi:10.​1007/​978-3-319-13623-3_​25CrossRef
26.
Zurück zum Zitat Speer, R., Chin, J., Havasi, C.: Conceptnet 5.5: an open multilingual graph of general knowledge. In: Proceedings of 31st AAAI Conference on Artificial Intelligence, San Francisco, California, USA, pp. 4444–4451 (2017) Speer, R., Chin, J., Havasi, C.: Conceptnet 5.5: an open multilingual graph of general knowledge. In: Proceedings of 31st AAAI Conference on Artificial Intelligence, San Francisco, California, USA, pp. 4444–4451 (2017)
27.
28.
Zurück zum Zitat Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)MathSciNetCrossRef Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)MathSciNetCrossRef
29.
Zurück zum Zitat Wilkens, R., Zilio, L., Ferreira, E., Villavicencio, A.: B\(^2\)SG: a TOEFL-like task for Portuguese. In: Proceedings of 10th International Conference on Language Resources and Evaluation. LREC, ELRA (2016) Wilkens, R., Zilio, L., Ferreira, E., Villavicencio, A.: B\(^2\)SG: a TOEFL-like task for Portuguese. In: Proceedings of 10th International Conference on Language Resources and Evaluation. LREC, ELRA (2016)
Metadaten
Titel
Unsupervised Approaches for Computing Word Similarity in Portuguese
verfasst von
Hugo Gonçalo Oliveira
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-65340-2_67