Skip to main content
Erschienen in:
Buchtitelbild

2018 | OriginalPaper | Buchkapitel

An Automatic Approach for WordNet Enrichment Applied to Arabic WordNet

verfasst von : Mohamed Seghir Hadj Ameur, Ahlem Chérifa Khadir, Ahmed Guessoum

Erschienen in: Arabic Language Processing: From Theory to Practice

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper introduces an automatic method to extend existing WordNets via machine translation. Our proposal relies on the hierarchical skeleton of the English Princeton WordNet (PWN) as a backbone to extend their taxonomies. Our proposal is applied to the Arabic WordNet (AWN) to enrich it by adding new synsets, and also by providing vocalizations and usage examples for each inserted lemma. Around 12000 new potential synsets can be added to AWN with a precision of at least \(93\%\). As such the coverage of AWN in terms of synsets can be increased from 11269 to around 24000 a very promising achievement on the path of enriching the Arabic WordNet.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
The version of AWN used in this work is available in this link http://​globalwordnet.​org/​arabic-wordnet/​awn-browse/​.
 
3
A lemma is said to be non-ambiguous if it appears in only one synset; it is considered ambiguous otherwise.
 
7
The vocalization toolkit is available at https://​github.​com/​Ycfx/​Arabic-Diacritizer under the GNU License.
 
8
We vocalize the entire sentence to obtain a more reliable vocalization since the vocalization model proposed in [22] uses a statistical model which tends to give better results when a wider context is available.
 
9
The WordNet taxonomy is not a strict hierarchy; in other words, a synset in the taxonomy can have multiple parents.
 
12
The code source for fast_alig is publicly available at http://​github.​com/​clab/​fastalign.
 
Literatur
1.
Zurück zum Zitat Morato, J., Marzal, M.A., Lloréns, J., Moreiro, J.: Wordnet applications. In: Proceedings of GWC, pp. 20–23 (2004) Morato, J., Marzal, M.A., Lloréns, J., Moreiro, J.: Wordnet applications. In: Proceedings of GWC, pp. 20–23 (2004)
2.
Zurück zum Zitat Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)MATH Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)MATH
3.
Zurück zum Zitat Black, W., Elkateb, S., Rodriguez, H., Alkhalifa, M., Vossen, P., Pease, A., Fellbaum, C.: Introducing the Arabic wordnet project. In: Proceedings of the Third International WordNet Conference, pp. 295–300 (2006) Black, W., Elkateb, S., Rodriguez, H., Alkhalifa, M., Vossen, P., Pease, A., Fellbaum, C.: Introducing the Arabic wordnet project. In: Proceedings of the Third International WordNet Conference, pp. 295–300 (2006)
4.
Zurück zum Zitat Rodríguez, H., Farwell, D., Ferreres, J., Bertran, M., Alkhalifa, M., Martí, M.A.: Arabic wordnet: Semi-automatic extensions using bayesian inference. In: LREC (2008) Rodríguez, H., Farwell, D., Ferreres, J., Bertran, M., Alkhalifa, M., Martí, M.A.: Arabic wordnet: Semi-automatic extensions using bayesian inference. In: LREC (2008)
5.
Zurück zum Zitat Alkhalifa, M., Rodríguez, H.: Automatically extending ne coverage of arabic wordnet using wikipedia. In: Proceedings of the 3rd International Conference on Arabic Language Processing CITALA2009, Rabat, Morocco (2009) Alkhalifa, M., Rodríguez, H.: Automatically extending ne coverage of arabic wordnet using wikipedia. In: Proceedings of the 3rd International Conference on Arabic Language Processing CITALA2009, Rabat, Morocco (2009)
6.
Zurück zum Zitat Abouenour, L., Bouzoubaa, K., Rosso, P.: On the evaluation and improvement of Arabic wordnet coverage and usability. Lang. Resour. Eval. 47(3), 891–917 (2013)CrossRef Abouenour, L., Bouzoubaa, K., Rosso, P.: On the evaluation and improvement of Arabic wordnet coverage and usability. Lang. Resour. Eval. 47(3), 891–917 (2013)CrossRef
7.
Zurück zum Zitat Saveski, M., Trajkovski, I.: Automatic construction of wordnets by using machine translation and language modeling. In: Proceedings of the 13th International Multiconference Information Society on Seventh Language Technologies Conference, vol. C (2010) Saveski, M., Trajkovski, I.: Automatic construction of wordnets by using machine translation and language modeling. In: Proceedings of the 13th International Multiconference Information Society on Seventh Language Technologies Conference, vol. C (2010)
8.
Zurück zum Zitat Cilibrasi, R.L., Vitanyi, P.M.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)CrossRef Cilibrasi, R.L., Vitanyi, P.M.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)CrossRef
9.
Zurück zum Zitat Montazery, M., Faili, H.: Automatic persian wordnet construction. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 846–850. Association for Computational Linguistics (2010) Montazery, M., Faili, H.: Automatic persian wordnet construction. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 846–850. Association for Computational Linguistics (2010)
10.
Zurück zum Zitat Mousavi, Z., Faili, H.: Persian Wordnet Construction using Supervised Learning. ArXiv e-prints, April 2017 Mousavi, Z., Faili, H.: Persian Wordnet Construction using Supervised Learning. ArXiv e-prints, April 2017
11.
Zurück zum Zitat Niemi, J., Lindén, K., Hyvärinen, M., et al.: Using a bilingual resource to add synonyms to a wordnet. In: Proceedings of the Global Wordnet Conference (2012) Niemi, J., Lindén, K., Hyvärinen, M., et al.: Using a bilingual resource to add synonyms to a wordnet. In: Proceedings of the Global Wordnet Conference (2012)
12.
Zurück zum Zitat Lindén, K., Niemi, J.: Is it possible to create a very large wordnet in 100 days? an evaluation. Lang. Resour. Eval. 48(2), 191–201 (2014)CrossRef Lindén, K., Niemi, J.: Is it possible to create a very large wordnet in 100 days? an evaluation. Lang. Resour. Eval. 48(2), 191–201 (2014)CrossRef
13.
Zurück zum Zitat Al Tarouti, F., Kalita, J.: Enhancing automatic wordnet construction using word embeddings. In: Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP (2016) Al Tarouti, F., Kalita, J.: Enhancing automatic wordnet construction using word embeddings. In: Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP (2016)
14.
Zurück zum Zitat Cavalli-Sforza, V., Saddiki, H., Bouzoubaa, K., Abouenour, L., Maamouri, M., Goshey, E.: Bootstrapping a wordnet for an arabic dialect from other wordnets and dictionary resources. In: 2013 ACS International Conference on Computer systems and Applications (AICCSA), pp. 1–8. IEEE (2013) Cavalli-Sforza, V., Saddiki, H., Bouzoubaa, K., Abouenour, L., Maamouri, M., Goshey, E.: Bootstrapping a wordnet for an arabic dialect from other wordnets and dictionary resources. In: 2013 ACS International Conference on Computer systems and Applications (AICCSA), pp. 1–8. IEEE (2013)
15.
Zurück zum Zitat Boudabous, M.M., Kammoun, N.C., Khedher, N., Belguith, L.H., Sadat, F.: Arabic wordnet semantic relations enrichment through morpho-lexical patterns. In: 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), pp. 1–6. IEEE (2013) Boudabous, M.M., Kammoun, N.C., Khedher, N., Belguith, L.H., Sadat, F.: Arabic wordnet semantic relations enrichment through morpho-lexical patterns. In: 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), pp. 1–6. IEEE (2013)
16.
Zurück zum Zitat Al-Yahya, M., Al-Malak, S., Aldhubayi, L.: Ontological lexicon enrichment: The badea system for semi-automated extraction of antonymy relations from arabic language corpora. Malays. J. Comput. Sci. 29(1), 56–73 (2016)CrossRef Al-Yahya, M., Al-Malak, S., Aldhubayi, L.: Ontological lexicon enrichment: The badea system for semi-automated extraction of antonymy relations from arabic language corpora. Malays. J. Comput. Sci. 29(1), 56–73 (2016)CrossRef
17.
Zurück zum Zitat Koehn, P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)CrossRefMATH Koehn, P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)CrossRefMATH
18.
Zurück zum Zitat Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002) Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
19.
Zurück zum Zitat Powers, D.M.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNet Powers, D.M.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNet
20.
Zurück zum Zitat Glover, F.: Future paths for integer programming and links to artificial intelligence. Comput. Oper. Res. 13(5), 533–549 (1986)MathSciNetCrossRefMATH Glover, F.: Future paths for integer programming and links to artificial intelligence. Comput. Oper. Res. 13(5), 533–549 (1986)MathSciNetCrossRefMATH
23.
Zurück zum Zitat Diab, M.: Second generation AMIRA tools for arabic processing: Fast and robust tokenization, POS tagging, and base phrase chunking. In: 2nd International Conference on Arabic Language Resources and Tools, vol. 110 (2009) Diab, M.: Second generation AMIRA tools for arabic processing: Fast and robust tokenization, POS tagging, and base phrase chunking. In: 2nd International Conference on Arabic Language Resources and Tools, vol. 110 (2009)
24.
Zurück zum Zitat Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.: MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In: LREC, vol. 14, pp. 1094–1101 (2014) Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.: MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In: LREC, vol. 14, pp. 1094–1101 (2014)
25.
Zurück zum Zitat Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. Association for Computational Linguistics (2013) Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. Association for Computational Linguistics (2013)
26.
Zurück zum Zitat Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19(2), 263–311 (1993) Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Metadaten
Titel
An Automatic Approach for WordNet Enrichment Applied to Arabic WordNet
verfasst von
Mohamed Seghir Hadj Ameur
Ahlem Chérifa Khadir
Ahmed Guessoum
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-73500-9_1