Skip to main content
Erschienen in: International Journal of Speech Technology 2/2016

29.10.2015 | Special Issue Article

Time-sensitive Arabic multiword expressions extraction from social networks

verfasst von: Daoud Daoud, Akram Al-Kouz, Mohammad Daoud

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we present a comprehensive approach for extracting and relating Arabic multiword expressions (MWE) from Social Networks. 15 million tweets were collected and processed to form our data set. Due to the complexity of processing Arabic and the lack of resources, we built an experimental system to extract and relate similar MWE using statistical methods. We introduce a new metrics for measuring valid MWE in Social Networks. We compare results obtained from our experimental system against semantic graph obtained from web knowledgebase.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
The translation of “عاصفة الحزم” into English using Google translate is “Storm packets”, which is unrelated to the source MWE. This is a clear demonstration of the necessity to treat MWE as one unit.
 
Literatur
Zurück zum Zitat Al-Haj, H. (2010). Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy. In Proceedings of the 23rd international conference on computational linguistics (COLING). Al-Haj, H. (2010). Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy. In Proceedings of the 23rd international conference on computational linguistics (COLING).
Zurück zum Zitat Alkouz, A. & Albayrak, S. (2012). An interests discovery approach in social networks based on semantically enriched graphs. In International conference on advances in social networks analysis and mining, Istanbul. Alkouz, A. & Albayrak, S. (2012). An interests discovery approach in social networks based on semantically enriched graphs. In International conference on advances in social networks analysis and mining, Istanbul.
Zurück zum Zitat Baldwin, T. et al. (2008). A machine learning approach to multiword expression extraction. In Proceedings of the LREC workshop towards a shared task for multiword expressions. Baldwin, T. et al. (2008). A machine learning approach to multiword expression extraction. In Proceedings of the LREC workshop towards a shared task for multiword expressions.
Zurück zum Zitat Bar, K. & Dershowitz N. (2014). Inferring paraphrases for a highly inflected language from a monolingual corpus. In Computational linguistics and intelligent text processing, Lecture notes in computer science, New York: Springer, 8404, pp 254–270. Bar, K. & Dershowitz N. (2014). Inferring paraphrases for a highly inflected language from a monolingual corpus. In Computational linguistics and intelligent text processing, Lecture notes in computer science, New York: Springer, 8404, pp 254–270.
Zurück zum Zitat Bruce, C., et al. (2009). Search engines: Information retrieval in practice. Boston: Addison-Wesley Publishing Company. Bruce, C., et al. (2009). Search engines: Information retrieval in practice. Boston: Addison-Wesley Publishing Company.
Zurück zum Zitat Covington, M. A. (1992). A dependency parser for variable-word-order languages. In K. R. Billingsley, H. U. Brown III, & E. Derohanes (Eds.), Computer assisted modeling on the IBM 3090: Papers from the 1989 IBM supercomputing competition. Athens: Baldwin Press. Covington, M. A. (1992). A dependency parser for variable-word-order languages. In K. R. Billingsley, H. U. Brown III, & E. Derohanes (Eds.), Computer assisted modeling on the IBM 3090: Papers from the 1989 IBM supercomputing competition. Athens: Baldwin Press.
Zurück zum Zitat Daoud, D. (2005). Arabic Deconversion in the framework of the universal networking language. In J. Cardeٌosa, A. Gelbukh & E. Tovar (Eds.), Universal networking language, Advances in Theory and Applications. Research on Computing Science (Vol. 12). Daoud, D. (2005). Arabic Deconversion in the framework of the universal networking language. In J. Cardeٌosa, A. Gelbukh & E. Tovar (Eds.), Universal networking language, Advances in Theory and Applications. Research on Computing Science (Vol. 12).
Zurück zum Zitat Daoud, D. & Qais H. (2011). Stemming arabic using longest-match and dynamic normalization. In Arabic language technology international conference (ALTIC) 2011, Bibliotheca Alexandrina (B.A.), Alexandria. Daoud, D. & Qais H. (2011). Stemming arabic using longest-match and dynamic normalization. In Arabic language technology international conference (ALTIC) 2011, Bibliotheca Alexandrina (B.A.), Alexandria.
Zurück zum Zitat Daoud, D., & Boitet, C. (2014). Correctness, strength and similarity evaluation of stemming algorithms for arabic. The Egyptian Journal of Language Engineering, 1(1), 17–23. Daoud, D., & Boitet, C. (2014). Correctness, strength and similarity evaluation of stemming algorithms for arabic. The Egyptian Journal of Language Engineering, 1(1), 17–23.
Zurück zum Zitat Daoud, D., et al. (2015). Arabic tweets clustering and labeling based on lingual and semantically enriched bayesian network model. Recent Patents on Computer Science, 8(2), 1–14.MathSciNetCrossRef Daoud, D., et al. (2015). Arabic tweets clustering and labeling based on lingual and semantically enriched bayesian network model. Recent Patents on Computer Science, 8(2), 1–14.MathSciNetCrossRef
Zurück zum Zitat Frank, S. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1), 143–177. Frank, S. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1), 143–177.
Zurück zum Zitat Graham, K. & Giesbrecht, E. (2006). Automatic identification of non-compositional multiword expressions using latent semantic analysis. In Workshop on multiword expressions: Identifying and exploiting underlying properties, Sydney: Association for Computational Linguistics. Graham, K. & Giesbrecht, E. (2006). Automatic identification of non-compositional multiword expressions using latent semantic analysis. In Workshop on multiword expressions: Identifying and exploiting underlying properties, Sydney: Association for Computational Linguistics.
Zurück zum Zitat Grinev, M. et al. (2011). Analytics for the realtime web. In Proceedings of the VLDB endowment. Grinev, M. et al. (2011). Analytics for the realtime web. In Proceedings of the VLDB endowment.
Zurück zum Zitat Haewoon, K., et al. (2010). What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web. Raleigh: ACM. Haewoon, K., et al. (2010). What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web. Raleigh: ACM.
Zurück zum Zitat Ivan, A. S. et al. (2002). Multiword expressions: a pain in the neck for NLP. In Proceedings of the third international conference on computational linguistics and intelligent text processing, Springer-Verlag. Ivan, A. S. et al. (2002). Multiword expressions: a pain in the neck for NLP. In Proceedings of the third international conference on computational linguistics and intelligent text processing, Springer-Verlag.
Zurück zum Zitat Jackendoff, R. (1997). The architecture of the language faculty. Cambridge, MA: MIT Press. Jackendoff, R. (1997). The architecture of the language faculty. Cambridge, MA: MIT Press.
Zurück zum Zitat Kenneth Ward, C., & Patrick, H. (1990). Word association norms, mutual information, and lexicography. Computational Linguistic, 16(1), 22–29. Kenneth Ward, C., & Patrick, H. (1990). Word association norms, mutual information, and lexicography. Computational Linguistic, 16(1), 22–29.
Zurück zum Zitat Meghdad, F. & Ronaldo M. (2014). A supervised model for extraction of multiword expressions, based on statistical context features. In Proceedings of the 10th workshop on multiword expressions (MWE), Gothenburg: Association for Computational Linguistics. Meghdad, F. & Ronaldo M. (2014). A supervised model for extraction of multiword expressions, based on statistical context features. In Proceedings of the 10th workshop on multiword expressions (MWE), Gothenburg: Association for Computational Linguistics.
Zurück zum Zitat Piao, S. S., et al. (2005). Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Computer Speech & Language, 19(4), 378.CrossRef Piao, S. S., et al. (2005). Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Computer Speech & Language, 19(4), 378.CrossRef
Zurück zum Zitat Ramisch, C. (2015). Multiword expressions acquisition: A generic and open framework. Cham: Springer.CrossRef Ramisch, C. (2015). Multiword expressions acquisition: A generic and open framework. Cham: Springer.CrossRef
Zurück zum Zitat Ramisch, C. et al. (2010). Multiword Expressions in the wild? The mwetoolkit comes in handy. COLING (Demos), In Demonstrations volume. Ramisch, C. et al. (2010). Multiword Expressions in the wild? The mwetoolkit comes in handy. COLING (Demos), In Demonstrations volume.
Zurück zum Zitat Salloum, W. & Habash N. (2011). Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation. In Dialects workshop at the conference for empirical methods in natural language processing, Edinburgh. Salloum, W. & Habash N. (2011). Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation. In Dialects workshop at the conference for empirical methods in natural language processing, Edinburgh.
Zurück zum Zitat Uherčík, T. et al. (2013). Utilizing microblogs for web page relevant term acquisition. In SOFSEM 2013: Theory and practice of computer science lecture notes in computer science, 7741: pp. 457–468. Uherčík, T. et al. (2013). Utilizing microblogs for web page relevant term acquisition. In SOFSEM 2013: Theory and practice of computer science lecture notes in computer science, 7741: pp. 457–468.
Zurück zum Zitat Veronika Vincze, N. T. I. & Berend G. (2011). Multiword expressions and named entities in the Wiki50 corpus. In International conference recent advances in natural language processing, RANLP. Veronika Vincze, N. T. I. & Berend G. (2011). Multiword expressions and named entities in the Wiki50 corpus. In International conference recent advances in natural language processing, RANLP.
Zurück zum Zitat Yassin, Y. A. (2003). Why arabic is the most difficult language for localization. Globalization Insider, XII(3.6), 5. Yassin, Y. A. (2003). Why arabic is the most difficult language for localization. Globalization Insider, XII(3.6), 5.
Zurück zum Zitat Yulia, T. & Shuly W. (2010). Extraction of multi-word expressions from small parallel corpora. In Proceedings of the 23rd international conference on computational linguistics: Posters, Beijing: Association for Computational Linguistics. Yulia, T. & Shuly W. (2010). Extraction of multi-word expressions from small parallel corpora. In Proceedings of the 23rd international conference on computational linguistics: Posters, Beijing: Association for Computational Linguistics.
Zurück zum Zitat Yulia, T., & Shuly, W. (2014). Identification of multiword expressions by combining multiple linguistic information sources. Computational Linguistics, 40, 449–468.CrossRef Yulia, T., & Shuly, W. (2014). Identification of multiword expressions by combining multiple linguistic information sources. Computational Linguistics, 40, 449–468.CrossRef
Metadaten
Titel
Time-sensitive Arabic multiword expressions extraction from social networks
verfasst von
Daoud Daoud
Akram Al-Kouz
Mohammad Daoud
Publikationsdatum
29.10.2015
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-015-9315-3

Weitere Artikel der Ausgabe 2/2016

International Journal of Speech Technology 2/2016 Zur Ausgabe

Neuer Inhalt