Skip to main content
Erschienen in: Cognitive Computation 4/2018

24.03.2018

A Hybrid Approach for Arabic Text Summarization Using Domain Knowledge and Genetic Algorithms

verfasst von: Qasem A. Al-Radaideh, Dareen Q. Bataineh

Erschienen in: Cognitive Computation | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Text summarization is the process of producing a shorter version of a specific text. Automatic summarization techniques have been applied to various domains such as medical, political, news, and legal domains proving that adapting domain-relevant features could improve the summarization performance. Despite the existence of plenty of research work in the domain-based summarization in English and other languages, there is a lack of such work in Arabic due to the shortage of existing knowledge bases. In this paper, a hybrid, single-document text summarization approach (abbreviated as (ASDKGA)) is presented. The approach incorporates domain knowledge, statistical features, and genetic algorithms to extract important points of Arabic political documents. The ASDKGA approach is tested on two corpora KALIMAT corpus and Essex Arabic Summaries Corpus (EASC). The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) framework was used to compare the automatically generated summaries by the ASDKGA approach with summaries generated by humans. Also, the approach is compared against three other Arabic text summarization approaches. The (ASDKGA) approach demonstrated promising results when summarizing Arabic political documents with average F-measure of 0.605 at the compression ratio of 40%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lloret E, Palomar M. Text summarization in progress: a literature review. Artif Intell Rev. 2010;37(1):1–41.CrossRef Lloret E, Palomar M. Text summarization in progress: a literature review. Artif Intell Rev. 2010;37(1):1–41.CrossRef
2.
Zurück zum Zitat Radev D, Hovy E, McKeown K. Introduction to the special issue on summarization. Comput linguist. 2002;28(4):399–408.CrossRef Radev D, Hovy E, McKeown K. Introduction to the special issue on summarization. Comput linguist. 2002;28(4):399–408.CrossRef
3.
Zurück zum Zitat Ježek, K. and Steinberger, J. Automatic text summarization (the state of the Art 2007 and new challenges). In: the conference Znalosti, Bratislava, Slovakia 2008; p 1–12. Ježek, K. and Steinberger, J. Automatic text summarization (the state of the Art 2007 and new challenges). In: the conference Znalosti, Bratislava, Slovakia 2008; p 1–12.
4.
Zurück zum Zitat Saggion H. Automatic summarization: an overview. Rev Fr Linguist Appl. 2008;13(1):63–81. Saggion H. Automatic summarization: an overview. Rev Fr Linguist Appl. 2008;13(1):63–81.
5.
Zurück zum Zitat Luhn H. The automatic creation of literature abstracts. IBM J Res Dev. 1958;2(2):159–65.CrossRef Luhn H. The automatic creation of literature abstracts. IBM J Res Dev. 1958;2(2):159–65.CrossRef
6.
Zurück zum Zitat Reeve L, Han H, Brooks A. The use of domain-specific concepts in biomedical text summarization. Inf Process Manag. 2007;43(6):1765–76.CrossRef Reeve L, Han H, Brooks A. The use of domain-specific concepts in biomedical text summarization. Inf Process Manag. 2007;43(6):1765–76.CrossRef
7.
Zurück zum Zitat Chen Y, Foong O, Yong S, Kurniawan I. Text summarization for oil and gas drilling topic. Int J Comput Electr Autom Control Inf Eng World Acad Sci Technol. 2008;2(6):1799–802. Chen Y, Foong O, Yong S, Kurniawan I. Text summarization for oil and gas drilling topic. Int J Comput Electr Autom Control Inf Eng World Acad Sci Technol. 2008;2(6):1799–802.
8.
Zurück zum Zitat Yeh J, Ke H, Yang W, Meng I. Text summarization using a trainable summarizer and latent semantic analysis. Inf Process Manag. 2005;41(1):75–95.CrossRef Yeh J, Ke H, Yang W, Meng I. Text summarization using a trainable summarizer and latent semantic analysis. Inf Process Manag. 2005;41(1):75–95.CrossRef
9.
Zurück zum Zitat Moens, M., Uyttendaele, C., and Dumortier, J. Abstracting of legal cases: the SALOMON experience. In: the 6th International Conference on Artificial Intelligence and Law (ICAIL97), Melbourne, Australia. 1997; p 114–122. Moens, M., Uyttendaele, C., and Dumortier, J. Abstracting of legal cases: the SALOMON experience. In: the 6th International Conference on Artificial Intelligence and Law (ICAIL97), Melbourne, Australia. 1997; p 114–122.
10.
Zurück zum Zitat De Hollander, G. and Marx, M. Summarization of meetings using word clouds. In: the Computer Science and Software Engineering (CSSE) CSI International Symposium, Tehran 2011; p 54–61. De Hollander, G. and Marx, M. Summarization of meetings using word clouds. In: the Computer Science and Software Engineering (CSSE) CSI International Symposium, Tehran 2011; p 54–61.
12.
Zurück zum Zitat Chong L, Chen Y. Text summarization for oil and gas news article. Int J Comput Electr Autom Control Inf Eng World Acad Sci Technol. 2009;3(5):1282–5. Chong L, Chen Y. Text summarization for oil and gas news article. Int J Comput Electr Autom Control Inf Eng World Acad Sci Technol. 2009;3(5):1282–5.
13.
Zurück zum Zitat Sarkar K. Using domain knowledge for text summarization in medical domain. Int J Recent Trends Eng. 2009;1(1):200–5. Sarkar K. Using domain knowledge for text summarization in medical domain. Int J Recent Trends Eng. 2009;1(1):200–5.
14.
Zurück zum Zitat Imam I, Hamouda A, Khalek H. An ontology-based summarization system for Arabic documents (OSSAD). Int J Comput Appl. 2013;74(17):38–43. Imam I, Hamouda A, Khalek H. An ontology-based summarization system for Arabic documents (OSSAD). Int J Comput Appl. 2013;74(17):38–43.
15.
Zurück zum Zitat Jr S, Pappa C, Freitas A, Kaestner C. Automatic text summarization with genetic algorithm-based attribute selection. Adv Artif Intell–IBERAMIA Springer. 2004:305–14. Jr S, Pappa C, Freitas A, Kaestner C. Automatic text summarization with genetic algorithm-based attribute selection. Adv Artif Intell–IBERAMIA Springer. 2004:305–14.
16.
Zurück zum Zitat Qazvinian V, Hassanabadi L, Halavati R. Summarising text with a genetic algorithm-based sentence extraction. Int J Knowl Manag Stud. 2008;2(4):426–44.CrossRef Qazvinian V, Hassanabadi L, Halavati R. Summarising text with a genetic algorithm-based sentence extraction. Int J Knowl Manag Stud. 2008;2(4):426–44.CrossRef
17.
Zurück zum Zitat Fattah M, Ren F. Automatic text summarization. Int J Comput Electr Autom Control Inf Eng. 2008;2(1):90–3. Fattah M, Ren F. Automatic text summarization. Int J Comput Electr Autom Control Inf Eng. 2008;2(1):90–3.
18.
Zurück zum Zitat Litvak M, Last M, Friedman M. A new approach to improving multilingual summarization using genetic algorithms. In: The 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden; 2010. p. 927–36. Litvak M, Last M, Friedman M. A new approach to improving multilingual summarization using genetic algorithms. In: The 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden; 2010. p. 927–36.
19.
Zurück zum Zitat Nandhini K, Balasundaram S. Use of genetic algorithms for cohesive summary extraction to assist reading difficulties. Appl Comput Intell Soft Comput. 2013;2013:1–11.CrossRef Nandhini K, Balasundaram S. Use of genetic algorithms for cohesive summary extraction to assist reading difficulties. Appl Comput Intell Soft Comput. 2013;2013:1–11.CrossRef
20.
Zurück zum Zitat Hammo B, Abu-Salem H, Evens M. A hybrid Arabic text summarization technique based on text structure and topic identification. Int J Comput Process Lang. 2011;23(01):39–65.CrossRef Hammo B, Abu-Salem H, Evens M. A hybrid Arabic text summarization technique based on text structure and topic identification. Int J Comput Process Lang. 2011;23(01):39–65.CrossRef
21.
Zurück zum Zitat Al-Omour M. Extractive-based Arabic text summarization approach. M.Sc Thesis: Department of Computer Science, Yarmouk University, Irbid, Jordan; 2012. Al-Omour M. Extractive-based Arabic text summarization approach. M.Sc Thesis: Department of Computer Science, Yarmouk University, Irbid, Jordan; 2012.
22.
Zurück zum Zitat Ibrahim A, Elghazaly T, Gheith M. A novel Arabic text summarization model based on rhetorical structure theory and vector space model. Int J Comput Linguist Nat Lang Process. 2013;2(8):480–4. Ibrahim A, Elghazaly T, Gheith M. A novel Arabic text summarization model based on rhetorical structure theory and vector space model. Int J Comput Linguist Nat Lang Process. 2013;2(8):480–4.
23.
Zurück zum Zitat Douzidia, F. and Lapalme, G. Lakhas, an Arabic summarization system. In: the Document Understanding Conference (DUC), Boston, USA. 2004; p128–135. Douzidia, F. and Lapalme, G. Lakhas, an Arabic summarization system. In: the Document Understanding Conference (DUC), Boston, USA. 2004; p128–135.
24.
Zurück zum Zitat Bawakid, A., and Oussalah, M. A semantic summarization system: the University of Birmingham at TAC 2008. In: the first text analysis conference (TAC), Maryland, USA 2008; p 1–6. Bawakid, A., and Oussalah, M. A semantic summarization system: the University of Birmingham at TAC 2008. In: the first text analysis conference (TAC), Maryland, USA 2008; p 1–6.
25.
Zurück zum Zitat Al-Radaideh Q, Afif M. Arabic text summarization using aggregate similarity. In: The international Arab Conference on Information Technology (ACIT’2009). Yemen; 2009. p. 1–8. Al-Radaideh Q, Afif M. Arabic text summarization using aggregate similarity. In: The international Arab Conference on Information Technology (ACIT’2009). Yemen; 2009. p. 1–8.
26.
Zurück zum Zitat Sobh I. An optimized dual classification system for Arabic extractive generic text summarization. M.Sc Thesis: Department of Computer Engineering, Cairo University, Giza, Egypt; 2009. Sobh I. An optimized dual classification system for Arabic extractive generic text summarization. M.Sc Thesis: Department of Computer Engineering, Cairo University, Giza, Egypt; 2009.
27.
Zurück zum Zitat Hamodeh, A. and Mousa, M. Automatic system for summarizing Arabic comments on social media networks. Al-Majala Al-Dawlia Lelitesalat, Al-Jameia Al-Arabia Lelhasibat. Special Issue. 2013; p 44–56. (In Arabic). Hamodeh, A. and Mousa, M. Automatic system for summarizing Arabic comments on social media networks. Al-Majala Al-Dawlia Lelitesalat, Al-Jameia Al-Arabia Lelhasibat. Special Issue. 2013; p 44–56. (In Arabic).
28.
Zurück zum Zitat Al-Taani Ahmad and Al-Rousan, Suhaib. Arabic multi-document text summarization. In: the 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2016), Turkey 2016. Al-Taani Ahmad and Al-Rousan, Suhaib. Arabic multi-document text summarization. In: the 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2016), Turkey 2016.
29.
Zurück zum Zitat Oufaida H, Nouali O, Blache. Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization. J King Saud Univ-Comput Inf Sci. 2014;26(4):450–61. Oufaida H, Nouali O, Blache. Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization. J King Saud Univ-Comput Inf Sci. 2014;26(4):450–61.
30.
Zurück zum Zitat Al-Khawaldeh F, Samawi V. Lexical cohesion and entailment-based segmentation for Arabic text summarization (LCEAS). World Comput Sci Inf Technol J (WCSIT). 2015;5(03):51–60. Al-Khawaldeh F, Samawi V. Lexical cohesion and entailment-based segmentation for Arabic text summarization (LCEAS). World Comput Sci Inf Technol J (WCSIT). 2015;5(03):51–60.
31.
Zurück zum Zitat Tran HN, Cambria E, Hussain A. Towards GPU-based common-sense reasoning: using fast subgraph matching. Cogn Comput. 2016;8(6):1074–86.CrossRef Tran HN, Cambria E, Hussain A. Towards GPU-based common-sense reasoning: using fast subgraph matching. Cogn Comput. 2016;8(6):1074–86.CrossRef
32.
Zurück zum Zitat Yunqing Xia Y, Cambria E, Hussain A, Zhao H. Word polarity disambiguation using Bayesian model and opinion-level features. Cogn Comput. 2015;7(3):369–80.CrossRef Yunqing Xia Y, Cambria E, Hussain A, Zhao H. Word polarity disambiguation using Bayesian model and opinion-level features. Cogn Comput. 2015;7(3):369–80.CrossRef
33.
Zurück zum Zitat Li Y, Pan Q, Yang T, Suhang Wang S, Tang J, Cambria E. Learning word representations for sentiment analysis. Cogn Comput. 2017;9(6):843–51.CrossRef Li Y, Pan Q, Yang T, Suhang Wang S, Tang J, Cambria E. Learning word representations for sentiment analysis. Cogn Comput. 2017;9(6):843–51.CrossRef
34.
Zurück zum Zitat Al-Radaideh Q, Gh A-Q. Application of rough set-based feature selection for Arabic sentiment analysis. Cogn Comput. 2017;9(4):346–445.CrossRef Al-Radaideh Q, Gh A-Q. Application of rough set-based feature selection for Arabic sentiment analysis. Cogn Comput. 2017;9(4):346–445.CrossRef
35.
Zurück zum Zitat Recupero D, Presutti V, Consoli S, Gangemi A, Nuzzolese A. Sentilo: frame-based sentiment analysis. Cogn Comput. 2015;7(2):211–25.CrossRef Recupero D, Presutti V, Consoli S, Gangemi A, Nuzzolese A. Sentilo: frame-based sentiment analysis. Cogn Comput. 2015;7(2):211–25.CrossRef
36.
Zurück zum Zitat Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah A, Gelbukh A, et al. Multilingual sentiment analysis: state-of-the-art and independent comparison of techniques. Cogn Comput. 2016;8:757–71.CrossRef Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah A, Gelbukh A, et al. Multilingual sentiment analysis: state-of-the-art and independent comparison of techniques. Cogn Comput. 2016;8:757–71.CrossRef
37.
Zurück zum Zitat Mukhtar N, Khan MA, Chiragh N. Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cogn Comput. 2017;9(4):446–56.CrossRef Mukhtar N, Khan MA, Chiragh N. Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cogn Comput. 2017;9(4):446–56.CrossRef
38.
Zurück zum Zitat Lo SL, Cambria E, Chiong R, Cornforth D. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev. 2017;48(4):499–527.CrossRef Lo SL, Cambria E, Chiong R, Cornforth D. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev. 2017;48(4):499–527.CrossRef
39.
Zurück zum Zitat Duwairi R, El-Orfali M. A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. J Inf Sci. 2014;40(4):501–13.CrossRef Duwairi R, El-Orfali M. A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. J Inf Sci. 2014;40(4):501–13.CrossRef
40.
Zurück zum Zitat El-Khair I. Effects of stop words elimination for Arabic information retrieval: a comparative study. Int J Comput Inf Sci. 2006;4(3):119–33. El-Khair I. Effects of stop words elimination for Arabic information retrieval: a comparative study. Int J Comput Inf Sci. 2006;4(3):119–33.
41.
Zurück zum Zitat Green, S. and Manning, C. Better arabic parsing: baselines, evaluations, and analysis. In: the 23rd International Conference on Computational Linguistics (COLING), Beijing, China. 2010; p 394–402. Green, S. and Manning, C. Better arabic parsing: baselines, evaluations, and analysis. In: the 23rd International Conference on Computational Linguistics (COLING), Beijing, China. 2010; p 394–402.
42.
Zurück zum Zitat Mustafa S. Word stemming for Arabic information retrieval: the case for simple light stemming. Abhath Al-Yarmouk: Sci Eng Ser. 2012;21(1):123–44. Mustafa S. Word stemming for Arabic information retrieval: the case for simple light stemming. Abhath Al-Yarmouk: Sci Eng Ser. 2012;21(1):123–44.
43.
Zurück zum Zitat Singh J, Gupta V. An efficient corpus-based stemmer. Cogn Comput. 2017;9(5):671–88.CrossRef Singh J, Gupta V. An efficient corpus-based stemmer. Cogn Comput. 2017;9(5):671–88.CrossRef
44.
Zurück zum Zitat Edmundson H. New methods in automatic extracting. J Assoc Comput Mach. 1969;16(2):264–85.CrossRef Edmundson H. New methods in automatic extracting. J Assoc Comput Mach. 1969;16(2):264–85.CrossRef
45.
Zurück zum Zitat Perumal K, Chaudhuri B. Language independent sentence extraction based text summarization. In: The 9th international conference on natural language processing (ICON), Chennai, India; 2011. p. 213–7. Perumal K, Chaudhuri B. Language independent sentence extraction based text summarization. In: The 9th international conference on natural language processing (ICON), Chennai, India; 2011. p. 213–7.
46.
Zurück zum Zitat Kumar Y, Salim N. Automatic multi document summarization approaches. J Comput Sci. 2011;8(1):133–40.CrossRef Kumar Y, Salim N. Automatic multi document summarization approaches. J Comput Sci. 2011;8(1):133–40.CrossRef
47.
Zurück zum Zitat Gupta V, Lehal G. A Survey of text summarization extractive techniques. J Emerg Technol Web Intell. 2010;2(3):258–68. Gupta V, Lehal G. A Survey of text summarization extractive techniques. J Emerg Technol Web Intell. 2010;2(3):258–68.
48.
Zurück zum Zitat Miller B, Goldberg D. Genetic algorithms, tournament selection, and the effects of noise. Complex Syst. 1995;9(3):193–212. Miller B, Goldberg D. Genetic algorithms, tournament selection, and the effects of noise. Complex Syst. 1995;9(3):193–212.
50.
Zurück zum Zitat El-Haj M., Kruschwitz U., and Fox C. Using mechanical Turk to create a corpus of Arabic summaries. In: The 7th international language resources and evaluation conference (LREC), Valletta, Malta. 2010; p 36–39. El-Haj M., Kruschwitz U., and Fox C. Using mechanical Turk to create a corpus of Arabic summaries. In: The 7th international language resources and evaluation conference (LREC), Valletta, Malta. 2010; p 36–39.
51.
Zurück zum Zitat Lin, C. ROUGE: a package for automatic evaluation of summaries. In: the ACL Workshop on Text Summarization Branches out, Barcelona, Spain. 2004; p 74–81. Lin, C. ROUGE: a package for automatic evaluation of summaries. In: the ACL Workshop on Text Summarization Branches out, Barcelona, Spain. 2004; p 74–81.
52.
Zurück zum Zitat El-Haj M, Kruschwitz U, Fox C. Experimenting with automatic text summarisation for Arabic. Hum Lang Technol Chall Comput Sci Linguist Springer. 2011a:490–9. El-Haj M, Kruschwitz U, Fox C. Experimenting with automatic text summarisation for Arabic. Hum Lang Technol Chall Comput Sci Linguist Springer. 2011a:490–9.
Metadaten
Titel
A Hybrid Approach for Arabic Text Summarization Using Domain Knowledge and Genetic Algorithms
verfasst von
Qasem A. Al-Radaideh
Dareen Q. Bataineh
Publikationsdatum
24.03.2018
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 4/2018
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-018-9547-z

Weitere Artikel der Ausgabe 4/2018

Cognitive Computation 4/2018 Zur Ausgabe