Abstract
Cross-language information retrieval (CLIR) deals with retrieving relevant documents in one language using queries expressed in another language. As CLIR tools rely on translation techniques, they are challenged by the properties of highly derivational and flexional languages like Arabic. Much work has been done on CLIR for different languages including Arabic. In this article, we introduce the reader to the motivations for solving some problems related to Arabic CLIR approaches. The evaluation of these approaches is discussed starting from the 2001 and 2002 TREC Arabic CLIR tracks, which aim to objectively evaluate CLIR systems. We also study many other research works to highlight the unresolved problems or those that require further investigation. These works are discussed in the light of a deep study of the specificities and the tasks of Arabic information retrieval (IR). Particular attention is given to translation techniques and CLIR resources, which are key issues challenging Arabic CLIR. To push research in this field, we discuss how a new standard collection can improve Arabic IR and CLIR tracks.
- M. Ababneh, R. Al-Shalabi, G. Kanaan, and A. Al-Nobani. 2012. Building an effective rule-based light stemmer for Arabic language to improve search effectiveness. International Arab Journal of Information Technology 9, 4, 368--372. http://www.ccis2k.org/iajit/PDF/vol.9,no.4/2834-10.pdf.Google Scholar
- A. Abdelali. 2004. Localization in Modern Standard Arabic. Journal of the American Society for Information Science 55, 1, 23--28. DOI:http://dx.doi.org/10.1002/asi.10340 Google ScholarDigital Library
- A. Abdelali, J. R. Cowie, D. Farwell, and W. Ogden. 2004a. UCLIR: A multilingual information retrieval tool. Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial 8, 22, 103--110. http://nlp.uned.es/ia-mlia/iberamia2002/papers/mlia10.pdf.Google Scholar
- A. Abdelali, J. R. Cowie, and H. S. Soliman. 2004b. Arabic information retrieval perspectives. In Proceedings of JEP-TALN 2004, Arabic Language Processing. http://aune.lpl.univ-aix.fr/jep-taln04/proceed/actes/arabe2004/TAAA13.pdf.Google Scholar
- N. AbdulJaleel and L. S. Larkey. 2003. Statistical transliteration for English-Arabic cross-language information retrieval. In Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM’03). ACM, New York, NY, 139--146. DOI:http://doi.acm.org/10.1145/956863.956890 Google ScholarDigital Library
- M. Abdul-Rauf. 1996. Arabic for English Speaking Students. Al-Saadawi, Alexandria, VA.Google Scholar
- I. Abu El-Khair. 2006. Effects of stop words elimination for Arabic information retrieval: A comparative study. International Journal of Computing and Information Sciences 4, 3, 119--133. http://www.ijcis.info/Vol4N3/Vol4N3PP119-133FS.pdf.Google Scholar
- I. Abu El-Khair. 2007. Arabic information retrieval. Annual Review of Information Science and Technology 41, 1, 505--533. DOI:http://dx.doi.org/10.1002/aris.2007.1440410118 Google ScholarDigital Library
- H. Abu-Salem. 1992. A Microcomputer Based Arabic Bibliographic Information Retrieval System with Relational Thesauri (Arabic-IRS). Ph.D. Dissertation. Illinois Institute of Technology, Chicago, IL.Google Scholar
- H. Abu-Salem. 2004. Comparison of stemming and n-gram matching for term conflation in Arabic text. International Journal of Computer Processing of Oriental Languages 17, 2, 61--81. DOI:http://dx.doi.org/10.1142/S0219427904001024Google ScholarCross Ref
- H. Abu-Salem, M. Al-Omari, and M. W. Evens. 1999. Stemming methodologies over individual query words for an Arabic information retrieval system. Journal of the American Society for Information Science 50, 6, 524--529. DOI:http://dx.doi.org/10.1002/(SICI)1097-4571(1999)50:6$<$524::AID-ASI7$>$3.0.CO;2-M Google ScholarDigital Library
- H. Abu-Salem and P. K.-F. Chan. 2006. English-Arabic cross-language information retrieval based on parallel documents. International Journal of Computer Processing of Oriental Languages 19, 1, 21--37. DOI:http://dx.doi.org/10.1142/S0219427906001372Google ScholarCross Ref
- K. Ahn, B. Alex, J. Bos, T. Dalmas, J. L. Leidner, and M. Smillie. 2004. Cross-lingual question answering using off-the-shelf machine translation. In Multilingual Information Access for Text, Speech, and Images. Lecture Notes in Computer Science, Vol. 3491. Springer, 446--457. DOI:http://dx.doi.org/10.1007/11519645_44 Google ScholarDigital Library
- A. Alajmi, E. M. Saad, and R. R. Darwish. 2012. Toward an Arabic stop-words list generation. International Journal of Computer Applications 46, 8, 8--13. http://research.ijcaonline.org/volume46/number8/pxc3879341.pdf.Google Scholar
- M. Algarni, B. Martin, T. Bell, and K. Nehsatian. 2014. Simple Arabic stemmer. In Proceedings of the 23rd ACM International Conference on Information and Knowledge Management (CIKM’14). ACM, New York, NY, 1803--1806. DOI:10.1145/2661829.2661972 Google ScholarDigital Library
- B. Alhadidi and M. Alwedyan. 2008. Hybrid stop-word removal technique for Arabic language. Egyptian Computer Science Journal 30, 1, 35--38.Google Scholar
- M. Aljlayl, S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. O. Holmes, M. Lee, D. A. Grossman, and O. Frieder. 2001. IIT at TREC-10. In Proceedings of the 10th Text Retrieval Conference (TREC-2001). 265--274. http://trec.nist.gov/pubs/trec10/papers/IIT-TREC10.pdf.Google Scholar
- M. Aljlayl and O. Frieder. 2001. Effective Arabic-English cross-language information retrieval via machine readable dictionaries and machine translation. In Proceedings of the 2001 ACM International Conference on Information and Knowledge Management (CIKM’01). ACM, New York, NY, 295--302. DOI:http://doi.acm.org/10.1145/502585.502635 Google ScholarDigital Library
- M. Aljlayl and O. Frieder. 2002. On Arabic search: Improving the retrieval effectiveness via a light stemming approach. In Proceedings of the Conference on Information and Knowledge Management (CIKM’02). ACM, New York, NY, 340--347. DOI:http://doi.acm.org/10.1145/584792.584848 Google ScholarDigital Library
- M. Aljlayl, O. Frieder, and D. A. Grossman. 2002. On Arabic-English cross-language information retrieval: A machine translation approach. In Proceedings of the 2002 International Symposium on Information Technology. IEEE, Los Alamitos, CA, 2--7. DOI:http://doi.ieeecomputersociety.org/10.1109/ITCC.2002.1000351 Google ScholarDigital Library
- M. Y. Al-Nashashibi, D. Neagu, and A. A. Yaghi. 2010. Stemming techniques of Arabic language: A comparative study from the information retrieval perspective. In Proceedings of the 2nd International Conference on Computer Technology and Development. IEEE, Los Alamitos, CA, 270--276. http://scim.brad.ac.uk/staff/pdf/dneagu/05645873StemmingTechniquesAlnashashibi.pdf.Google Scholar
- A. Alqudsi, N. Omar, and K. Shaker. 2012. Arabic machine translation: A survey. Artificial Intelligence Review 42, 4, 549--572. DOI:10.1007/s10462-012-9351-1 Google ScholarDigital Library
- R. Al-Shalabi, G. Kanaan, J. M. Jaam, A. Hasnah, and E. Hilat. 2004. Stop-word removal algorithm for Arabic language. In Proceedings of the International Conference on Information and Communication Technologies from Theory to Applications. IEEE, Los Alamitos, CA, 1--5. http://www.cs.wayne.edu/∼eyad/sw_algo_arabic_2004.pdf.Google Scholar
- M. A. Attia. 2008. Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation. Ph.D. Dissertation. University of Manchester, Manchester, UK. http://attiaspace.com/Publications/Attia-PhD-Thesis.pdf.Google Scholar
- R. Ayed, I. Bounhas, B. Elayeb, N. Bellamine Ben Saoud, and F. Evrard. 2014a. Improving Arabic texts morphological disambiguation using possibilistic classifier. In Proceedings of the 19th International Conference on Application of Natural Language to Information Systems (NLDB’14). 138--147. DOI:10.1007/978-3-319-07983-7_18Google Scholar
- R. Ayed, I. Bounhas, B. Elayeb, N. Bellamine Ben Saoud, and F. Evrard. 2014b. Evaluation d’une approche possibiliste pour la désambiguïsation des textes arabes. In Actes de Traitement Automatique des Langue Naturelles (TALN’14). 316--327. http://www.aclweb.org/anthology/F/F14/F14-1028.pdf.Google Scholar
- R. Ayed, I. Bounhas, B. Elayeb, F. Evrard, and N. Bellamine Ben Saoud. 2012a. Arabic morphological analysis and disambiguation using a possibilistic classifier. In Proceedings of the 8th International Conference on Intelligent Computing (ICIC’12). 274--279. DOI:http://dx.doi.org/10.1007/978-3-642-31576-3_36 Google ScholarDigital Library
- R. Ayed, I. Bounhas, B. Elayeb, F. Evrard, and N. Bellamine Ben Saoud. 2012b. A possibilistic approach for the automatic morphological disambiguation of Arabic texts. In Proceedings of the 13th International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD’12). IEEE, Los Alamitos, CA, 187--194. DOI:10.1109/SNPD.2012.21 Google ScholarDigital Library
- H. Azarbonyad, A. Shakery, and H. Faili. 2013. Exploiting multiple translation resources for English-Persian cross language information retrieval. In Information Access Evaluation, Multilinguality, Multimodality, and Visualization. Lecture Notes in Computer Science, Vol. 8138. Springer, 93--99. DOI:10.1007/978-3-642-40802-1_11Google ScholarDigital Library
- K. R. Beesley. 1996. Arabic finite-state morphological analysis and generation. In Proceedings of the 16th International Conference on Computational Linguistics (COLING’96). 89--94. http://aclweb.org/anthology/C96-1017. Google ScholarDigital Library
- K. R. Beesley. 1998a. Romanization, Transcription and Transliteration. Retrieved December 1, 2015, from http://open.xerox.com/Services/arabic-morphology/Pages/romanization.Google Scholar
- K. R. Beesley. 1998b. Arabic morphological analysis on the Internet. In Proceedings of the International Conference on Multi-Lingual Computing Arabic. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.28.5655andrep=rep1andtype=pdf.Google Scholar
- K. R. Beesley. 2001. Finite-state morphological analysis and generation of Arabic at Xerox Research: Status and plans in 2001. In Proceedings of the ACL Workshop on Arabic Language Processing: Status and Perspective. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.2.3703andrep=rep1and type=pdf.Google Scholar
- O. Ben Khiroun, R. Ayed, B. Elayeb, I. Bounhas, N. Bellamine Ben Saoud, and F. Evrard. 2014a. Towards a new standard Arabic test collection for mono- and cross-language information retrieval. In Proceedings of the 19th International Conference on Application of Natural Language to Information Systems (NLDB’14). 168--171. DOI:10.1007/978-3-319-07983-7_23Google Scholar
- O. Ben Khiroun, B. Elayeb, I. Bounhas, F. Evrard, and N. Bellamine Ben Saoud. 2011. A possibilistic approach for semantic query expansion. In Proceedings of the 4th International Conference on Internet Technologies and Applications (ITA’11). 308--316.Google Scholar
- O. Ben Khiroun, B. Elayeb, I. Bounhas, F. Evrard, and N. Bellamine Ben Saoud. 2012. A possibilistic approach for automatic word sense disambiguation. In Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING’12). 261--327. http://aclweb.org/anthology/O/O12/O12-1025.pdf.Google Scholar
- O. Ben Khiroun, B. Elayeb, I. Bounhas, F. Evrard, and N. Bellamine Ben Saoud. 2014b. Improving query expansion by automatic query disambiguation in intelligent information retrieval. In Proceedings of the 6th International Conference on Agents and Artificial Intelligence (ICAART’14). 153--160.Google Scholar
- W. Ben Romdhane, B. Elayeb, I. Bounhas, F. Evrard, and N. Bellamine Ben Saoud. 2013. A possibilistic query translation approach for cross-language information retrieval. In Proceedings of the 9th International Conference on Intelligent Computing (ICIC’13). 73--82. DOI:http://dx.doi.org/10.1007/978-3-642-39482-9_9 Google ScholarDigital Library
- R. Besançon, S. Chaudiron, D. Mostefa, O. Hamon, I. Timimi, and K. Choukri. 2008. Overview of CLEF 2008 INFILE Pilot track. In Proceedings of Evaluating Systems for Multilingual and Multimodal Information Access, the 2008 Cross-Language Evaluation Forum (CLEF’08). 939--946. DOI:10.1007/978-3-642-04447-2_125 Google Scholar
- R. Besançon, S. Chaudiron, D. Mostefa, O. Hamon, I. Timimi, and K. Choukri. 2009. Information filtering evaluation: Overview of CLEF 2009 INFILE track. In Proceedings of Evaluating Systems for Multilingual and Multimodal Information Access, the 2009 Cross-Language Evaluation Forum (CLEF’09). 342--353. DOI:10.1007/978-3-642-15754-7_41Google Scholar
- I. Bounhas. 2012. Building and Integrating Ontologies for a Reliability-Guided Mapping of Arabic Corpora. Ph.D. Dissertation. Faculty of Sciences of Tunis, Tunisia.Google Scholar
- I. Bounhas, R. Ayed, B. Elayeb, F. Evrard, and N. B. Ben Saoud. 2015a. Experimenting a discriminative possibilistic classifier with reweighting model for Arabic morphological disambiguation. Computer Speech and Language 33, 67--87. DOI:http://dx.doi.org/10.1016/j.csl.2014.12.005 Google ScholarDigital Library
- I. Bounhas, B. Elayeb, F. Evrard, and Y. Slimani. 2010. Towards a computer study of the reliability of Arabic stories. Journal of the American Society for Information Science and Technology 61, 8, 1686--1705. DOI:10.1002/asi.21356 Google ScholarCross Ref
- I. Bounhas, B. Elayeb, F. Evrard, and Y. Slimani. 2011a. Organizing contextual knowledge for Arabic text disambiguation and terminology extraction. Knowledge Organization Journal 38, 6, 473--490.Google Scholar
- I. Bounhas, B. Elayeb, F. Evrard, and Y. Slimani. 2011b. ArabOnto: Experimenting a new distributional approach for building Arabic ontological resources. International Journal of Metadata, Semantics, and Ontologies 6, 2, 81--95. DOI:10.1504/IJMSO.2011.046578 Google ScholarDigital Library
- I. Bounhas, B. Elayeb, F. Evrard, and Y. Slimani. 2015b. Information reliability evaluation: From Arabic storytelling to computer sciences. ACM Journal on Computing and Cultural Heritage 8, 3, Article No. 14. DOI:http://dx.doi.org/10.1145/2693847. Google ScholarDigital Library
- I. Bounhas, W. Lahbib, and B. Elayeb. 2014a. Arabic domain terminology extraction: A literature review. In Proceedings of the 13th International Conference on Ontologies, Databases, and Applications of Semantics (ODBASE’14). 792--799. DOI:10.1007/978-3-662-45563-0_51Google Scholar
- I. Bounhas, W. Lahbib, and B. Elayeb. 2014b. Extraction de terminologies en langue Arabe: Un état de l’art. In Proceedings of Cinquième Journées Francophones sur les Ontologies (JFO’14). 271--282.Google Scholar
- T. Buckwalter. 2002. Arabic Transliteration. Retrieved December 1, 2015, from http://www.qamus.org/transliteration.htm.Google Scholar
- J. Callan, M. Hoy, C. Yoo, and L. Zhao. 2009. ClueWeb09 Dataset. Retrieved December 1, 2015, from http://lemurproject.org/clueweb09/.Google Scholar
- A. Chen and F. Gey. 2002. Building an Arabic stemmer for information retrieval. In Proceedings of the 11th Text Retrieval Conference (TREC’02). 631--639. http://metadata.sims.berkeley.edu/papers/trec2002.pdf.Google Scholar
- D. Chiang, M. Diab, N. Habash, O. Rambow, and S. Shareef. 2006. Parsing Arabic dialects. In Proceedings of the European Chapter of ACL (EACL’06), Vol. 111. 112. http://acl.ldc.upenn.edu/E/E06/E06-1047.pdf.Google Scholar
- A. Chowdhury, M. Aljlayl, E. Jensen, S. Beitzel, D. Grossman, and O. Frieder. 2002. IIT at TREC-2002: Linear combinations based on document structure and varied stemming for Arabic retrieval. In Proceedings of the 11th Text Retrieval Conference (TREC’02). 299--310. http://trec.nist.gov/pubs/trec11/papers/iit.grossman.pdf.Google Scholar
- K. Darwish. 2002. Building a shallow Arabic morphological analyzer in one day. In Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages. 47--54. http://acl.ldc.upenn.edu/W/W02/W02-0506.pdf?origin=publication_detail. Google ScholarDigital Library
- K. Darwish and A. M. Ali. 2012. Arabic retrieval revisited: Morphological hole filling. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 218--222. http://www.qcri.com/app/media/2039. Google ScholarDigital Library
- K. Darwish, D. S. Doermann, R. C. Jones, D. W. Oard, and M. Rautiainen. 2001. TREC-10 experiments at Maryland: CLIR and video. In Proceedings of the 10th Text Retrieval Conference (TREC’01). 549--561. http://trec.nist.gov/pubs/trec10/papers/umdTREC2000.pdf.Google Scholar
- K. Darwish, H. Hassan, and O. Emam. 2005. Examining the effect of improved context sensitive morphology on Arabic information retrieval. In Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages. 25--30. http://www.aclweb.org/anthology/W05-0704. Google ScholarDigital Library
- K. Darwish and W. Magdy. 2013. Arabic information retrieval. Foundations and Trends in Information Retrieval 7, 4, 239--342. DOI:10.1561/1500000031 Google ScholarDigital Library
- K. Darwish, W. Magdy, and A. Mourad. 2012. Language processing for Arabic microblog retrieval. In Proceedings of Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 2427--2430. DOI:http://doi.acm.org/10.1145/2396761.2398658 Google ScholarDigital Library
- K. Darwish and D. W. Oard. 2002a. Term selection for searching printed Arabic. In Proceedings of the 25th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 261--268. DOI:http://doi.acm.org/10.1145/564376.564423 Google ScholarDigital Library
- K. Darwish and D. W. Oard. 2002b. CLIR experiments at Maryland for TREC-2002: Evidence combination for Arabic-English retrieval. In Proceedings of the 11th Text Retrieval Conference (TREC’02). 721--732. http://trec.nist.gov/pubs/trec11/papers/umd.darwish.pdf.Google Scholar
- K. Darwish and D. W. Oard. 2003. Probabilistic structured query methods. In Proceedings of the 26th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 338--344. DOI:10.1145/860435.860497 Google ScholarDigital Library
- L. Denoyer and P. Gallinari. 2006. The Wikipedia XML corpus. ACM Special Interest Group on Information Retrieval Forum 40, 1. DOI:10.1145/1147197.1147210 Google ScholarDigital Library
- A. N. De Roeck and W. Al-Fares. 2000. A morphologically sensitive clustering algorithm for identifying Arabic roots. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics. 199--206. http://www.aclweb.org/anthology/P00-1026. Google ScholarDigital Library
- M. Diab, K. Hacioglu, and D. Jurafsky. 2007. Automatic processing of Modern Standard Arabic text. In Arabic Computational Morphology: Knowledge-Based and Empirical Methods, A. van den Bosch and A Soudi (Eds.). Kluwer/Springer, 159--180.Google Scholar
- N. T. Duc, D. Bollegala, and M. Ishizuka. 2012. Cross-language latent relational search between Japanese and English languages using a Web corpus. ACM Transactions on Asian Language Information Processing 11, 3, Article No. 11. DOI:http://doi.acm.org/10.1145/2334801.2334805 Google ScholarDigital Library
- B. Elayeb, I. Bounhas, O. Ben Khiroun, F. Evrard, and N. Bellamine Ben Saoud. 2011. Towards a possibilistic information retrieval system using semantic query expansion. International Journal of Intelligent Information Technologies 7, 4, 1--25. DOI:http://dx.doi.org/10.4018/jiit.2011100101 Google ScholarDigital Library
- B. Elayeb, I. Bounhas, O. Ben Khiroun, F. Evrard, and N. Bellamine Ben Saoud. 2014. A comparative study between possibilistic and probabilistic approaches for monolingual word sense disambiguation. Knowledge and Information Systems 44, 1, 92--126. DOI:10.1007/s10115-014-0753-z Google ScholarDigital Library
- B. Elayeb, F. Evrard, M. Zaghdoud, and M. Ben Ahmed. 2009. Towards an intelligent possibilistic Web information retrieval using multiagent system. Interactive Technology and Smart Education 6, 1, 40--59. DOI:10.1108/17415650910965191Google ScholarCross Ref
- M. I. Eldesouki, W. M. Arafa, and K. Darwish. 2009. Stemming techniques of Arabic language: Comparative study from the information retrieval perspective. Egyptian Computer Journal 36, 1, 30--49.Google Scholar
- T. A. Elghazaly and A. A. Fahmy. 2009. English/Arabic cross language information retrieval (CLIR) for Arabic OCR-degraded text. Communications of the IBIMA 9, 208--218. http://www.ibimapublishing.com/journals/CIBIMA/volume9/v9n25.pdf.Google Scholar
- A. El Kholy and N. Habash. 2010. Techniques for Arabic morphological detokenization and orthographic denormalization. In Proceedings of the Workshop on LR and HLT for Semitic Languages at LREC 2010. 45--51.Google Scholar
- C. España-Bonet, J. Giménez, and L. Màrquez. 2009. Discriminative phrase-based models for Arabic machine translation. ACM Transactions on Asian Language Information Processing 8, 4, Article No. 15. DOI:http://doi.acm.org/10.1145/1644879.1644882 Google ScholarDigital Library
- European Language Resource Association. 2001. An-Nahar Newspaper Text Corpus. Retrieved December 1, 2015, from http://catalog.elra.info/product_info.php?products_id=767.Google Scholar
- European Language Resource Association. 2002. Al-Hayat Arabic Corpus. Retrieved December 1, 2015, from http://catalog.elra.info/product_info.php?products_id=632.Google Scholar
- K. Faidi, R. Ayed, I. Bounhas, and B. Elayeb. 2014. Comparing Arabic NLP tools for Hadith classification. In Proceedings of the 2nd International Conference on Islamic Applications in Computer Science and Technologies (IMAN’14).Google Scholar
- A. Farag and A. Nürnberger. 2008. Arabic/English word translation disambiguation using parallel corpora and matching schemes. In Proceedings of the 12th Annual Conference of the European Association for Machine Translation. 6--11.Google Scholar
- A. Farag and A. Nürnberger. 2013. Translation ambiguity resolution using interactive contextual information. In Computational Linguistics. Studies in Computational Intelligence, Vol. 458. Springer, 219--240. DOI:10.1007/978-3-3-642-34399-5_12Google Scholar
- A. Farag and A. Nürnberger, and M. Nitsche. 2011. Supporting Arabic cross-lingual retrieval using contextual information. In Proceedings of the 2nd International Conference on Multidisciplinary Information Retrieval Facility. 30--45. Google ScholarDigital Library
- S. Farag and A. Nürnberger. 2012. Literature review of interactive cross-language information retrieval tools. International Arab Journal of Information Technology 9, 5, 479--486. http://www.ccis2k.org/iajit/PDF/vol.9,no.5/3039-12.pdf.Google Scholar
- M. Franz and J. S. McCarley. 2002. Arabic information retrieval at IBM. In Proceedings of the 11th Text Retrieval Conference (TREC’02). 260--262. http://trec.nist.gov/pubs/trec11/papers/ibm.franz.pdf.Google Scholar
- A. Fraser, J. Xu, and R. Weischedel. 2002. TREC 2002: Cross-lingual retrieval at BBN. In Proceedings of the 11th Text Retrieval Conference (TREC’02). 102--106. http://trec.nist.gov/pubs/trec11/papers/bbn.xu.cross.pdf.Google Scholar
- Y. Gal. 2002. An HMM approach to vowel restoration in Arabic and Hebrew. In Proceedings of the ACL 2002 Semitic Language Workshop. 1--7. DOI:http://dx.doi.org/10.3115/1118637.1118641 Google ScholarDigital Library
- J. Gao, J.-Y. Nie, and M. Zhou. 2006. Statistical query translation models for cross-language information retrieval. ACM Transactions on Asian Language Information Processing 5, 4, 323--359. DOI:http://doi.acm.org/10.1145/1236181.1236184 Google ScholarDigital Library
- F. C. Gey, H. Jiang, A. Chen, and R. R. Larson. 1998. Manual queries and machine translation in cross-language retrieval and interactive retrieval with Cheshire II at TREC-7. In Proceedings of the 7th Text Retrieval Conference (TREC’98). 463--476. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.689&rep==rep1&type==ps.Google Scholar
- F. C. Gey, N. Kando and C. Peters. 2005. Cross-language information retrieval: The way ahead. Information Processing and Management 41, 3, 415--431. DOI:http://dx.doi.org/10.1016/j.ipm.2004.06.006 Google ScholarDigital Library
- F. C. Gey and D. W. Oard. 2001. The TREC-2001 cross-language information retrieval track: Searching Arabic using English, French or Arabic queries. In Proceedings of the 10th Text Retrieval Conference (TREC’01). 16--25. http://trec.nist.gov/pubs/trec10/papers/clirtrack.pdf.Google Scholar
- A. Goweder and A. De Roeck. 2001. Assessment of a significant Arabic corpus. In Proceedings of the Arabic NLP Workshop. http://www.abdelali.net/ref/ACL-EACL%202001_goweder.pdf.Google Scholar
- A. Guessoum and R. N. Zantout. 2004. A methodology for evaluating Arabic machine translation systems. Machine Translation 18, 4, 299--335. DOI:http://dx.doi.org/10.1007/s10590-005-2412-3 Google ScholarDigital Library
- N. Habash. 2010. Introduction to Arabic Natural Language Processing. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers. DOI:http://dx.doi.org/10.2200/S00277ED1V01Y201008HLT010 Google ScholarDigital Library
- N. Habash, D. Mona, and O. Rambow. 2012. Conventional orthography for dialectal Arabic. In Proceedings of the 2012 Language Resources and Evaluation Conference. 711--718. http://www.lrec-conf.org/proceedings/lrec2012/pdf/579_Paper.pdf.Google Scholar
- N. Habash and O. Rambow. 2005. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). 573--580. http://www1.cs.columbia.edu/∼rambow/papers/habash-rambow-2005a.pdf. Google ScholarDigital Library
- N. Habash, O. Rambow, and R. Roth. 2009. MADA+TOKAN: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR’09). 102--109. http://www.elda.org/medar-conference/pdf/24.pdf.Google Scholar
- N. Habash, R. Roth, O. Rambow, R. Eskander, and N. Tomeh. 2013. Morphological analysis and disambiguation for dialectal Arabic. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’13). 426--432. http://www.aclweb.org/anthology/N13-1044.Google Scholar
- N. Habash and F. Sadat. 2006. Arabic preprocessing schemes for statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers (NAACL-Short’06). 49--52. http://www.mt-archive.info/HLT-NAACL-2006-Habash.pdf. Google ScholarDigital Library
- N. Habash, A. Soudi, and T. Buckwalter. 2007. On Arabic transliteration. In Arabic Computational Morphology: Knowledge-Based and Empirical Methods, A. Soudi, A. van den Bosch, and G. Neumann (Eds.). Springer, 15--22. http://nizarhabash.com/publications/chapter2BisHabash_et_al-2007-web.pdf.Google Scholar
- F. Harrag, A. Alothaim, A. Abanmy, F. Alomaigan, and S. Alsalehi. 2013. Ontology extraction approach for prophetic narration (Hadith) using association rules. International Journal on Islamic Applications in Computer Science and Technology 1, 2, 17--26.Google Scholar
- F. Harrag, A. Hamdi-Cherif, A. M. S. Al-Salman, and E. El-Qawasmeh. 2009. Experiments in improvement of Arabic information retrieval. In Proceedings of the 3rd International Conference on Arabic Language Processing (CITALA’09). 71--81. http://www.emi.ac.ma/citala2009/docs/citala%20papers/%28N%B011-Paper%2035%29.pdf.Google Scholar
- A. Hasnah. 1996. Full text processing and retrieval: Weight ranking, text structuring, and passage retrieval for Arabic documents. Ph.D. Dissertation. Illinois Institute of Technology (IIT), Chicago, IL.Google Scholar
- A. Hasnah and M. Evens. 2001. Arabic/English cross-language information retrieval using a bilingual dictionary. In Proceedings of the Arabic NLP Workshop at ACL/EACL 2001. http://www.elsnet.org/arabic2001/hasnah.pdf.Google Scholar
- T. Hedlund, E. Airio, H. Keskustalo, R. Lehtokangas, A. Pirkola, and K. Järvelin. 2004. Dictionary-based cross-language information retrieval: Learning experiences from CLEF 2000-2002. Information Retrieval 7, 1--2, 99--119. DOI:http://dx.doi.org/10.1023/B:INRT.0000009442.34054.55 Google ScholarDigital Library
- A. Hefny, K. Darwish, and A. Alkahky. 2011. Is a query worth translating: Ask the users! In Proceedings of the 33rd European Conference on IR Research (ECIR’11). 238--250. DOI:http://dx.doi.org/10.1007/978-3-642-20161-5_24 Google ScholarDigital Library
- I. Hmeidi. 1995. Design and implementation of automatic word and phrase indexing for information retrieval with Arabic documents. Ph.D. Dissertation. Illinois Institute of Technology, Chicago, IL. Google ScholarDigital Library
- I. Hmeidi, R. Al-Shalabi, A. T. Al-Taani, H. Najadat, and S. A. Al-Hazaimeh. 2010. A novel approach to the extraction of roots from Arabic words using bigrams. Journal of the American Society for Information Science and Technology 61, 3, 583--591. DOI:http://dx.doi.org/10.1002/asi.21247 Google ScholarDigital Library
- P. Iswarya and V. Radha. 2012. Cross language text retrieval: A review. International Journal of Engineering Research and Applications. 2, 5, 1036--1043. http://www.ijera.com/papers/Vol2_issue5/FQ2510361043.pdf.Google Scholar
- Y. Kadri. 2008. Recherche d’Information Translinguistique sur les Documents en Arabe. Ph.D. Dissertation. Faculty of Higher Studies, Montréal University, Canada. Google ScholarDigital Library
- Y. Kadri and J.-Y. Nie. 2004. Traduction des requêtes pour la recherche d’information translinguistique Anglais-Arabe. In Proceedings de la conference sur le Traitement Automatique des Langues Naturelles (TALN’04). 291--296.Google Scholar
- Y. Kadri and J.-Y. Nie. 2006a. Effective stemming for Arabic information retrieval. In Proceedings of the Challenge of Arabic for NLP/MT Conference. 68--74. http://mt-archive.info/BCS-2006-Kadri.pdf.Google Scholar
- Y. Kadri and J.-Y. Nie. 2006b. Improving query translation with confidence estimation for cross language information retrieval. In Proceedings of the Conference on Information and Knowledge Management (CIKM’06). ACM, New York, NY, 818--819. DOI:http://doi.acm.org/10.1145/1183614.1183746 Google ScholarDigital Library
- Y. Kadri and J.-Y. Nie. 2007. Combining resources with confidence measures for cross language information retrieval. In Proceedings of the 1st Ph.D. Workshop in CIKM (PIKM’07). ACM, New York, NY, 131--138. DOI:http://doi.acm.org/10.1145/1316874.1316896 Google ScholarDigital Library
- Y. Kadri and J.-Y. Nie. 2008. A comparative study for query translation using linear combination and confidence measure. In Proceedings of the 3rd International Joint Conference on Natural Language Processing. 181--188. http://aclweb.org/anthology/I/I08/I08-1024.pdf.Google Scholar
- S. Khoja. 2001. Khoja's Arabic Stemmer (version 1.0). London, UK.Google Scholar
- S. Khoja and R. Garside. 1999. Stemming Arabic Text. Technical Report. Computing Department, Lancaster University, Lancaster, UK. http://www.comp.lancs.ac.uk/computing/users/khoja/stemmer.ps.Google Scholar
- S. Khoja, R. Garside, and G. Knowles. 2001. A Tagset for the Morphosyntactic Tagging of Arabic. Retrieved December 1, 2015, from http://zeus.cs.pacificu.edu/shereen/CL2001.pdf.Google Scholar
- K. Kishida. 2005. Technical issues of cross-language information retrieval: A review. Information Processing and Management. 41, 3, 433--455. DOI:http://dx.doi.org/10.1016/j.ipm.2004.06.007 Google ScholarDigital Library
- K. Kishida. 2008. Prediction of performance of cross-language information retrieval using automatic evaluation of translation. Library and Information Science Research 30, 2, 138--144. DOI:10.1016/j.lisr. 2007.09.003Google ScholarCross Ref
- A. Kumar. 2012. Profound survey on cross-language information retrieval methods (CLIR). In Proceedings of the 2012 2nd International Conference on Advanced Computing and Communication Technologies (ACCT’12). IEEE, Los Alamitos, CA, 64--68. DOI:10.1109/ACCT.2012.91 Google ScholarDigital Library
- K.-L. Kwok, S. Choi, and N. Dinstl. 2005. Rich results from poor resources: NTCIR-4 monolingual and cross-lingual retrieval of Korean texts using Chinese and English. ACM Transactions on Asian Language Information Processing 4, 2, 136--162. DOI:http://doi.acm.org/10.1145/1105696.1105700 Google ScholarDigital Library
- W. Lahbib, I. Bounhas, and B. Elayeb. 2014. Arabic-English domain terminology extraction from aligned corpora. In Proceedings of the 13th International Conference on Ontologies, Databases, and Applications of Semantics (ODBASE’14). 745--759. DOI:10.1007/978-3-662-45563-0_46Google Scholar
- W. Lahbib, I. Bounhas, B. Elayeb, F. Evrard, and Y. Slimani. 2013. A hybrid approach for Arabic semantic relation extraction. In Proceedings of the 26th International FLAIRS Conference. 315--320. http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS13/paper/view/5891/6090.Google Scholar
- L. S. Larkey, J. Allan, M. E. Connell, A. Bolivar, and C. Wade. 2002a. UMass at TREC 2002: Cross-language and novelty tracks. In Proceedings of the 11th Text Retrieval Conference (TREC’02). 721--732. http://trec.nist.gov/pubs/trec11/papers/umass.wade.pdf.Google Scholar
- L. S. Larkey, L. Ballesteros, and M. E. Connell. 2002b. Improving stemming for Arabic information retrieval: Light stemming and co-occurrence analysis. In Proceedings of the 25th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 275--282. DOI:http://doi.acm.org/10.1145/564376.564425 Google ScholarDigital Library
- L. S. Larkey and M. E. Connell. 2001. Arabic information retrieval at UMass in TREC-10. In Proceedings of the 10th Text Retrieval Conference (TREC’01). 562--570. http://trec.nist.gov/pubs/trec10/papers/UMass_TREC10_Final.pdf.Google Scholar
- C.-J. Lee, C.-H. Chen, S.-H. Kao, and P.-J. Cheng. 2010. To translate or not to translate? In Proceedings of the 33rd ACM SIGIR Conference. ACM, New York, NY, 651--658. DOI:10.1145/1835449.1835558 Google ScholarDigital Library
- G.-A. Levow. 2003. Issues in pre- and post-translation document expansion: Untranslatable cognates and missegmented words. In Proceedings of the 6th International Workshop on Information Retrieval with Asian Languages. 77--83. DOI:http://doi.acm.org/10.1145/1118935.1118945 Google ScholarDigital Library
- G.-A. Levow, D. W. Oard, and P. Resnik. 2005. Dictionary-based techniques for cross-language information retrieval. Information Processing and Management 41, 3, 523--547. DOI:http://dx.doi.org/10.1016/j.ipm.2004.06.012 Google ScholarDigital Library
- D. Lewandowski. 2012. Web Search Engine Research, Vol. 4. Emerald Group Publishing.Google Scholar
- Linguistic Data Consortium. 2001. Arabic Newswire Part 1. Retrieved December 1, 2015, from http://catalog. ldc.upenn.edu/LDC2001T55Google Scholar
- Linguistic Data Consortium. 2003. Arabic Gigaword. Retrieved December 1, 2015, from http://catalog.ldc. upenn.edu/LDC2003T12Google Scholar
- Linguistic Data Consortium. 2006. Arabic Gigaword Second Edition. Retrieved December 1, 2015, from http://catalog.ldc.upenn.edu/LDC2006T02Google Scholar
- M. Maamouri and C. Cieri. 2002. Resources for Arabic natural language processing. In Proceedings of the International Symposium on Processing Arabic. 125--146.Google Scholar
- W. Magdy. 2013. TweetMogaz: A news portal of tweets. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1095--1096. DOI:10.1145/2484028.2484212 Google ScholarDigital Library
- W. Magdy, A. Ali, and K. Darwish. 2012. A summarization tool for time-sensitive social media. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). 2695--2697. DOI:10.1145/2396761.2398730 Google ScholarDigital Library
- S. Mallat, M. A. Ben Mohamed, E. Hkiri, A. Zouaghi, and M. Zrigui. 2014. Semantic and contextual knowledge representation for lexical disambiguation: Case of Arabic-French query translation. Journal of Computing and Information Technology 22, 3, 191--215. DOI:10.2498/cit.1002234Google ScholarCross Ref
- J. Mayfield, P. McNamee, C. Costello, C. D. Piatko, and A. Banerjee. 2001. JHU/APL at TREC 2001: Experiments in filtering and in Arabic, video, and Web retrieval. In Proceedings of the 10th Text Retrieval Conference (TREC’01). 322--330. http://trec.nist.gov/pubs/trec10/papers/jhuapl01.pdf.Google Scholar
- P. McNamee and J. Mayfield. 2002a. Scalable multilingual information access. In Advances in Cross-Language Information Retrieval. Lecture Notes in Computer Science, Vol. 2785. Springer, 207--218. DOI:http://dx.doi.org/10.1007/978-3-540-45237-9_17Google Scholar
- P. McNamee and J. Mayfield. 2002b. Comparing cross-language query expansion techniques by degrading translation resources. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 159--166. DOI:10.1145/564376.564406 Google ScholarDigital Library
- P. McNamee, C. D. Piatko, and J. Mayfield. 2002. JHU/APL at TREC 2002: Experiments in filtering and Arabic retrieval. In Proceedings of the 11th Text Retrieval Conference (TREC’02). 358--363. http://trec.nist.gov/pubs/trec11/papers/jhuapl.mcnamee.pdf.Google Scholar
- M. Moussa, M. W. Fakhr, and K. Darwish. 2012. Statistical denormalization for Arabic text. In Proceedings of the 11th Conference on Natural Language Processing (KONVENS’12). 228--232. http://www.oegai.at/konvens2012/proceedings/32_moussa12p/32_moussa12p.pdf.Google Scholar
- S. H. Mustafa. 2005. Character contiguity in n-gram--based word matching: The case for Arabic text. Information Processing and Management 41, 4, 819--827. DOI:http://dx.doi.org/10.1016/j.ipm.2004.02.003 Google ScholarDigital Library
- D. W. Oard and F. C. Gey. 2002. The TREC-2002 Arabic/English CLIR track. In Proceedings of the 11th Text Retrieval Conference (TREC’02). 17--26. http://trec.nist.gov/pubs/trec11/papers/OVERVIEW.gey.ps.gz.Google Scholar
- D. W. Oard, D. He, and J. Wang. 2008. User-assisted query translation for interactive cross-language information retrieval. Information Processing and Management 44, 1, 181--211. DOI:10.1016/j.ipm.2006.12.009 Google ScholarDigital Library
- D. W. Oard, G.-A. Levow, and C. I. Cabezas. 2000. CLEF experiments at Maryland: Statistical stemming and back off translation. In Cross-Language Information Retrieval and Evaluation. Lecture Notes in Computer Science, Vol. 2069. Springer, 176--187. DOI:http://dx.doi.org/10.1007/3-540-44645-1_17 Google ScholarDigital Library
- J. Olive, C. Christianson, and J. McCary. 2011. Handbook of Natural Language Processing and Machine Translation. Springer. Google ScholarDigital Library
- I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. 2011. Overview of the TREC-2011 microblog track. In Proceedings of the 2011 Text Retrieval Conference (TREC’11). http://trec.nist.gov/pubs/trec20/papers/MICROBLOG.OVERVIEW.pdf.Google Scholar
- A. Pasha, M. Al-Badrashiny, M. Altantawy, N. Habash, M. Pooleery, O. Rambow, and R. M. Roth. 2013. DIRA: Dialectal Arabic information retrieval assistant. In Proceedings of International Joint Conference on Natural Language Processing (IJCNLP’13): System Demonstrations. 13--16. http://aclweb.org/anthology/I/I13/I13-2004.pdf.Google Scholar
- A. Pasha, M. Al-Badrashiny, M. Diab, A. El Kholy, R. Eskander, N. Habash, M. Pooleery, O. Rambow, and R. M. Roth. 2014. MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 1094--1101. http://www.lrec-conf.org/proceedings/lrec2014/pdf/593_Paper.pdf.Google Scholar
- A. Peñas, E. H. Hovy, P. Forner, A. Rodrigo, R. Sutcliffe, and R. Morante. 2013. QA4MRE 2011-2013: Overview of question answering for machine reading evaluation. In Information Access Evaluation, Multilinguality, Multimodality, and Visualization. Lecture Notes in Computer Science, Vol. 8138. Springer, 303--320. DOI:10.1007/978-3-642-40802-1_29Google ScholarDigital Library
- A. Peñas, E. H. Hovy, P. Forner, A. Rodrigo, R. Sutcliffe, C. Sporleder, C. Forascu, Y. Benajiba, and P. Osenova. 2012. Overview of QA4MRE at CLEF 2012: Question answering for machine reading evaluation. In Proceedings of the 2012 Conference and Labs of the Evaluation Forum (CLEF’12). DOI:10.1.1.360.5198Google Scholar
- A. Pirkola. 1998. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 55--63. DOI:http://doi.acm.org/10.1145/290941.290957 Google ScholarDigital Library
- A. Pirkola, T. Hedlund, H. Keskustalo, and K. Järvelin. 2001. Dictionary-based cross-language information retrieval: Problems, methods, and research findings. Information Retrieval 4, 3--4, 209--230. DOI:http://dx.doi.org/10.1023/A:1011994105352 Google ScholarDigital Library
- A. Pirkola, H. Keskustalo, E. Leppänen, A.-P. Känsälä, and K. Järvelin. 2002. Targeted s-gram matching: A novel n-gram matching technique for cross- and mono-lingual word form variants. Information Research 7, 2. http://InformationR.net/ir/7-2/paper126.html.Google Scholar
- A. Pirkola, D. Puolamäki, and K. Järvelin. 2003. Applying query structuring in cross-language retrieval. Information Processing and Management 39, 3, 391--402. DOI:http://dx.doi.org/10.1016/S0306-4573(02)00091-2 Google ScholarDigital Library
- M. F. Porter. 1980. An algorithm for suffix stripping. Program 14, 3, 130--137. DOI:10.1108/eb046814Google ScholarCross Ref
- M. F. Porter. 2006. The English (Porter2) Stemming Algorithm. Retrieved December 1, 2015, from http://snowball.tartarus.org/algorithms/english/stemmer.html.Google Scholar
- M. Rammel, M. Sanan, and K. Zreik. 2011. Improving Arabic information retrieval system using n-gram method. WSEAS Transactions on Computers 10, 4, 125--133. DOI:http://dl.acm.org/citation.cfm?id=2001184.2001187 Google ScholarDigital Library
- R. Roth, O. Rambow, N. Habash, M. T. Diab, and C. Rudin. 2008. Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking. In Proceedings of the Association for Computational Linguistics Conference (ACL’08). 117--120. http://www.aclweb.org/anthology/P08-2030. Google ScholarDigital Library
- H. Sajjad, K. Darwish, and Y. Belinkov. 2013. Translating dialectal Arabic to English. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13). 1--6. http://aclweb.org/anthology/P/P13/P13-2001.pdf.Google Scholar
- X. Saralegi and M. de Lacalle. 2010. Dictionary and monolingual corpus-based query translation for Basque-English CLIR. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’10). 1353--1358. http://www.lrec-conf.org/proceedings/lrec2010/pdf/63_Paper.pdf.Google Scholar
- G. Salton. 1973. Experiments in multi-lingual information retrieval. Information Processing Letters 2, 1, 6--11. DOI:http://dx.doi.org/10.1016/0020-0190(73)90017-3Google ScholarCross Ref
- J. Savoy. 2002. Report on CLEF-2002 experiments: Combining multiple sources of evidence. In Advances in Cross-Language Information Retrieval. Lecture Notes in Computer Science, Vol. 2785. Springer, 66--90.Google Scholar
- J. Savoy and Y. Rasolofo. 2002. Report on the TREC-11 experiment: Arabic, named page and topic distillation searches. In Proceedings of the 11th Text Retrieval Conference (TREC’02). 765--774. http://trec.nist.gov/pubs/trec11/papers/uneuchatel.pdf.Google Scholar
- M. Q. Shatnawi, Q. Q. Abuein, and O. Darwish. 2011. Verification Hadith correctness in Islamic Web pages using information retrieval techniques. In Proceedings of the International Conference on Information and Communication Systems. http://www.icics.info/icics/proceeding/icics.paper/64.pdf.Google Scholar
- N. Soudani, I. Bounhas, B. Elayeb, and Y. Slimani. 2014a. Toward an Arabic ontology for Arabic word sense disambiguation based on normalized dictionaries. In Proceedings of the 13th International Conference on Ontologies, Databases, and Applications of Semantics (ODBASE’14). 655--658. DOI:10.1007/978-3-662-45550-0_68Google ScholarDigital Library
- N. Soudani, I. Bounhas, B. Elayeb, and Y. Slimani. 2014b. An LMF-based normalization approach of Arabic Islamic dictionaries for Arabic word sense disambiguation: Application on hadith. In Proceedings of the 2nd International Conference on Islamic Applications in Computer Science and Technologies (IMAN’14).Google Scholar
- N. Soudani, I. Bounhas, B. Elayeb, and Y. Slimani. 2014c. Generic normalization approach of Arabic dictionaries for Arabic word sense disambiguation. In Proceedings of Cinquième Journées Francophones sur les Ontologies (JFO’14). 309--315.Google Scholar
- S. Strassel, M. A. Przybocki, K. Peterson, Z. Song, and K. Maeda. 2008. Linguistic resources and evaluation techniques for evaluation of cross-document automatic content extraction. In Proceedings of the 2008 International Conference on Language Resources and Evaluation (LREC’08). http://www.itl.nist.gov/iad/mig//publications/storage_paper/ACEXDOC_FinalPaperV3_NIST.pdf.Google Scholar
- P. Sujatha and P. Dhavachelvan. 2011. A review on the cross and multilingual information retrieval. International Journal of Web and Semantic Technology 2, 4, 115--124. DOI:10.5121/ijwest.2011.2409Google ScholarCross Ref
- T. Talvensaari. 2008. Effects of aligned corpus quality and size in corpus-based CLIR. In Proceedings of the 30th European Conference on Information Retrieval (ECIR’08). 114--125. DOI:10.1007/978-3-540-78646-7_13 Google ScholarCross Ref
- J. Toivonen, A. Pirkola, H. Keskustalo, K. Visala, and K. Järvelin. 2005. Translating cross-lingual spelling variants using transformation rules. Information Processing and Management 41, 4, 859--872. DOI:http://dx.doi.org/10.1016/j.ipm.2004.02.001 Google ScholarDigital Library
- S. Tomlinson. 2002. Experiments in named page finding and Arabic retrieval with Hummingbird SearchServerTM at TREC 2002. In Proceedings of the 11th Text Retrieval Conference (TREC’02). 248--259. http://trec.nist.gov/pubs/trec11/papers/hummingbird.tomlinson.pdf.Google Scholar
- F. Türe and E. Boschee. 2014. Learning to translate: A query-specific combination approach for cross-lingual information retrieval. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 589--599. http://emnlp2014.org/papers/pdf/EMNLP2014064.pdf.Google ScholarCross Ref
- F. Türe, J. Lin, and D. W. Oard. 2012. Combining statistical translation techniques for cross-language information retrieval. In Proceedings of the 24th International Conference on Computational Linguistics (COLING’12): Technical Papers. 2685--2702.Google Scholar
- R. Udupa, K. Saravanan, A. Bakalov, and A. Bhole. 2009. “They are out there, if you know where to look”: Mining transliterations of OOV query terms for cross-language information retrieval. In Proceedings of the 31st European Conference on Information Retrieval (ECIR’09). 437--448. DOI:10.1007/978-3-642-00958-7_39 Google ScholarDigital Library
- E. M. Voorhees. 2002. Overview of TREC 2002. In Proceedings of the 11th Text Retrieval Conference (TREC’02). 1--15. http://trec.nist.gov/pubs/trec11/papers/OVERVIEW.11.pdf.Google Scholar
- E. M. Voorhees and D. Harman. 2001. Overview of TREC 2001. In Proceedings of the 10th Text Retrieval Conference (TREC’01). 1--15. http://trec.nist.gov/pubs/trec10/papers/overview_10.pdf.Google Scholar
- J. Wang and D. W. Oard. 2006. Combining bidirectional translation and synonymy for cross-language information retrieval. In Proceedings of the 29th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 202--209. DOI:10.1145/1148170.1148208 Google ScholarDigital Library
- D. Wu, D. He, H. Ji, and R. Grishman. 2008. A study of using an out-of-box commercial MT system for query translation in CLIR. In Proceedings the 2nd ACM Workshop on Improving Non English Web Searching (iNEWS’08). ACM, New York, NY, 71--76. DOI:10.1145/1460027.1460038 Google ScholarDigital Library
- J. Xu, A. Fraser, J. Makhoul, M. Noamany, and G. Osman. 2001a. UN Arabic English Parallel Text Version 1.0 beta [CD-ROM]. Linguistic Data Consortium, Philadelphia, PA.Google Scholar
- J. Xu, A. Fraser, and R. M. Weischedel. 2001b. TREC 2001: Cross-lingual retrieval at BBN. In Proceedings of the 10th Text Retrieval Conference (TREC’01). 68--77. http://trec.nist.gov/pubs/trec11/ypapers/bbn.xu.cross.pdf.Google Scholar
- J. Xu, A. Fraser, and R. M. Weischedel. 2002. Empirical studies in strategies for Arabic retrieval. In Proceedings of the 25th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 269--274. DOI:http://doi.acm.org/10.1145/564376.564424 Google ScholarDigital Library
- J. Xu and R. M. Weischedel. 2005. Empirical studies on the impact of lexical resources on CLIR performance. Information Processing and Management 41, 3, 475--487. DOI:10.1016/j.ipm.2004.06.009 Google ScholarDigital Library
- Z. Yahya, A. M. Taufik, A. Azreen, and A. K. Rabiah. 2013. Query translation using concepts similarity based on Quran ontology for cross-language information retrieval. Journal of Computer Science 9, 7, 889--897. DOI:10.3844/jcssp.2013.889.897Google ScholarCross Ref
- Y. Yang, M. Rogati, and B. Kisiel. 2005. Combining categorization-based and corpus-based approaches for CLIR. In Proceedings of the 2005 FLAIRS Conference. 295--300. http://www.aaai.org/Papers/FLAIRS/2005/Flairs05-049.pdf.Google Scholar
- R. Zajac, A. Malki, and A. Abdelali. 2001. Arabic-English NLP at CRL. In Proceedings of the Arabic NLP Workshop (ACL/EACL’01). http://www.elsnet.org/arabic2001/zajac.pdf.Google Scholar
- H. Zeng, M. A. Alhossaini, L. Ding, R. Fikes, and D. L. McGuinness. 2006. Computing trust from revision history. In Proceedings of the 2006 International Conference on Privacy, Security, and Trust: Bridge the Gap between PST Technologies and Business Services. Article No. 8. DOI:10.1145/1501434.1501445 Google ScholarDigital Library
- Y. Zhang, P. Vines, and J. Zobel. 2005. Chinese OOV translation and post-translation query expansion in Chinese-English cross-lingual information retrieval. ACM Transactions on Asian Language Information Processing 4, 2, 57--77. DOI:http://doi.acm.org/10.1145/1105696.1105697 Google ScholarDigital Library
- D. Zhou, M. Truran, T. J. Brailsford, and H. Ashman. 2008. A hybrid technique for English-Chinese cross language information retrieval. ACM Transactions on Asian Language Information Processing 7, 2, Article No. 5. DOI:http://doi.acm.org/10.1145/1362782.1362784. Google ScholarDigital Library
- D. Zhou, M. Truran, T. J. Brailsford, V. Wade, and H. Ashman. 2012. Translation techniques in cross-language information retrieval. ACM Computing Surveys 45, 1, 1--44. DOI:http://doi.acm.org/10.1145/2379776.2379777 Google ScholarDigital Library
- I. Zitouni (Ed.): 2014. Natural Language Processing of Semitic Languages. Springer-Verlag, Berlin, Germany. Google ScholarDigital Library
Index Terms
- Arabic Cross-Language Information Retrieval: A Review
Recommendations
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalArabic, a highly inflected language, requires good stemming for effective information retrieval, yet no standard approach to stemming has emerged. We developed several light stemmers based on heuristics and a statistical stemmer based on co-occurrence ...
A novel Arabic lemmatization algorithm
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text dataTokenization is a fundamental step in processing textual data preceding the tasks of information retrieval, text mining, and natural language processing. Tokenization is a language-dependent approach, including normalization, stop words removal, ...
Proper nouns in English–Arabic cross language information retrieval
Out of vocabulary words, mostly proper nouns and technical terms, are one main source of performance degradation in Cross Language Information Retrieval (CLIR) systems. Those are words not found in the dictionary. Bilingual dictionaries in general do ...
Comments