Skip to main content
Erschienen in: International Journal on Digital Libraries 1/2021

07.09.2020

A crowdsourcing approach to construct mono-lingual plagiarism detection corpus

verfasst von: Habibollah Asghari, Omid Fatemi, Salar Mohtaj, Heshaam Faili

Erschienen in: International Journal on Digital Libraries | Ausgabe 1/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Plagiarism detection deals with detecting plagiarized fragments among textual documents. The availability of digital documents in online libraries makes plagiarism easier and on the other hand, to be easily detected by automatic plagiarism detection systems. Large scale plagiarism corpora with a wide variety of plagiarism cases are needed to evaluate different detection methods in different languages. Plagiarism detection corpora play an important role in evaluating and tuning plagiarism detection systems. Despite of their importance, few corpora have been developed for low resource languages. In this paper, we propose HAMTA, a Persian plagiarism detection corpus. To simulate real cases of plagiarism, manually paraphrased text are used to compile the corpus. For obtaining the manual plagiarism cases, a crowdsourcing platform is developed and crowd workers are asked to paraphrase fragments of text in order to simulate real cases of plagiarism. Moreover, artificial methods are used to scale-up the proposed corpus by automatically generating cases of text re-use. The evaluation results indicate a high correlation between the proposed corpus and the PAN state-of-the-art English plagiarism detection corpus.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Al-Raisi, F., Lin, W., Bourai, A.: A monolingual parallel corpus of arabic. In: Fourth International Conference On Arabic Computational Linguistics, ACLING 2018, November 17–19, 2018, Dubai, United Arab Emirates, pp. 334–338 (2018) Al-Raisi, F., Lin, W., Bourai, A.: A monolingual parallel corpus of arabic. In: Fourth International Conference On Arabic Computational Linguistics, ACLING 2018, November 17–19, 2018, Dubai, United Arab Emirates, pp. 334–338 (2018)
2.
Zurück zum Zitat Ambati, V., Vogel, S.: Can crowds build parallel corpora for machine translation systems? In: Proceedings of the 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, USA, June 6, 2010, pp. 62–65 (2010) Ambati, V., Vogel, S.: Can crowds build parallel corpora for machine translation systems? In: Proceedings of the 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, USA, June 6, 2010, pp. 62–65 (2010)
3.
Zurück zum Zitat Asghari, H., Fatemi, O., Mohtaj, S., Faili, H., Rosso, P.: On the use of word embedding for cross language plagiarism detection. Intell. Data Anal. 23(3), 661–680 (2019)CrossRef Asghari, H., Fatemi, O., Mohtaj, S., Faili, H., Rosso, P.: On the use of word embedding for cross language plagiarism detection. Intell. Data Anal. 23(3), 661–680 (2019)CrossRef
4.
Zurück zum Zitat Asghari, H., Mohtaj, S., Fatemi, O., Faili, H., Rosso, P., Potthast, M.: Algorithms and corpora for persian plagiarism detection: overview of PAN at FIRE 2016. In: P. Majumder, M. Mitra, P. Mehta, J. Sankhavara, and K. Ghosh (Eds.), Working notes of FIRE 2016—Forum for Information Retrieval Evaluation, Kolkata, India, December 7-10, 2016., Volume 1737 of CEUR Workshop Proceedings, pp. 135–144. CEUR-WS.org (2016) Asghari, H., Mohtaj, S., Fatemi, O., Faili, H., Rosso, P., Potthast, M.: Algorithms and corpora for persian plagiarism detection: overview of PAN at FIRE 2016. In: P. Majumder, M. Mitra, P. Mehta, J. Sankhavara, and K. Ghosh (Eds.), Working notes of FIRE 2016—Forum for Information Retrieval Evaluation, Kolkata, India, December 7-10, 2016., Volume 1737 of CEUR Workshop Proceedings, pp. 135–144. CEUR-WS.org (2016)
5.
Zurück zum Zitat Barrón-Cedeño, A., Gupta, P., Rosso, P.: Methods for cross-language plagiarism detection. Knowl.-Based Syst. 50, 211–217 (2013) Barrón-Cedeño, A., Gupta, P., Rosso, P.: Methods for cross-language plagiarism detection. Knowl.-Based Syst.  50, 211–217 (2013)
6.
Zurück zum Zitat Barrón-Cedeño, A., M. Potthast, P. Rosso, and B. Stein (2010). Corpus and evaluation measures for automatic plagiarism detection. In Proceedings of the International Conference on Language Resources and Evaluation, LREC: 17–23 May 2010. Valletta, Malta (2010) Barrón-Cedeño, A., M. Potthast, P. Rosso, and B. Stein (2010). Corpus and evaluation measures for automatic plagiarism detection. In Proceedings of the International Conference on Language Resources and Evaluation, LREC: 17–23 May 2010. Valletta, Malta (2010)
7.
Zurück zum Zitat Barrón-Cedeño, A., Vila, M., Martí, M.A., Rosso, P.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39(4), 917–947 (2013)CrossRef Barrón-Cedeño, A., Vila, M., Martí, M.A., Rosso, P.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39(4), 917–947 (2013)CrossRef
8.
Zurück zum Zitat Bensalem, I., Rosso, P., Chikhi, S.: A new corpus for the evaluation of arabic intrinsic plagiarism detection. In: P. Forner, H. Müller, R. Paredes, P. Rosso, and B. Stein (Eds.), Information Access Evaluation. Multilinguality, Multimodality, and Visualization - 4th International Conference of the CLEF Initiative, CLEF 2013, Valencia, Spain, September 23-26, 2013. Proceedings, Volume 8138 of Lecture Notes in Computer Science, pp. 53–58. Springer (2013) Bensalem, I., Rosso, P., Chikhi, S.: A new corpus for the evaluation of arabic intrinsic plagiarism detection. In: P. Forner, H. Müller, R. Paredes, P. Rosso, and B. Stein (Eds.), Information Access Evaluation. Multilinguality, Multimodality, and Visualization - 4th International Conference of the CLEF Initiative, CLEF 2013, Valencia, Spain, September 23-26, 2013. Proceedings, Volume 8138 of Lecture Notes in Computer Science, pp. 53–58. Springer (2013)
9.
Zurück zum Zitat Bloodgood, M., Callison-Burch, C.: Bucking the trend: Large-scale cost-focused active learning for statistical machine translation. In: ACL 2010, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, July 11–16, 2010, Uppsala, Sweden, pp. 854–864 (2010a) Bloodgood, M., Callison-Burch, C.: Bucking the trend: Large-scale cost-focused active learning for statistical machine translation. In: ACL 2010, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, July 11–16, 2010, Uppsala, Sweden, pp. 854–864 (2010a)
10.
Zurück zum Zitat Bloodgood, M., Callison-Burch, C.: Using mechanical turk to build machine translation evaluation sets. In: Proceedings of the 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, USA, June 6, 2010, pp. 208–211 (2010b) Bloodgood, M., Callison-Burch, C.: Using mechanical turk to build machine translation evaluation sets. In: Proceedings of the 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, USA, June 6, 2010, pp. 208–211 (2010b)
11.
Zurück zum Zitat Callison-Burch, C.: Fast, cheap, and creative: Evaluating translation quality using amazon’s mechanical turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, 6–7 August 2009, Singapore, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 286–295 (2009) Callison-Burch, C.: Fast, cheap, and creative: Evaluating translation quality using amazon’s mechanical turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, 6–7 August 2009, Singapore, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 286–295 (2009)
12.
Zurück zum Zitat Cappellato, L., Ferro, N., Jones, G. J. F., SanJuan, E.: Working Notes of CLEF 2015—Conference and Labs of the Evaluation forum, Toulouse, France, September 8-11, 2015, Volume 1391 of CEUR Workshop Proceedings. CEUR-WS.org (2015) Cappellato, L., Ferro, N., Jones, G. J. F., SanJuan, E.: Working Notes of CLEF 2015—Conference and Labs of the Evaluation forum, Toulouse, France, September 8-11, 2015, Volume 1391 of CEUR Workshop Proceedings. CEUR-WS.org (2015)
13.
Zurück zum Zitat Chen, D., Dolan, W. B.: Collecting highly parallel data for paraphrase evaluation. In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19–24 June, 2011, Portland, Oregon, USA, pp. 190–200 (2011) Chen, D., Dolan, W. B.: Collecting highly parallel data for paraphrase evaluation. In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19–24 June, 2011, Portland, Oregon, USA, pp. 190–200 (2011)
14.
Zurück zum Zitat Clough, P. D., Gaizauskas, R. J., Piao, S. S. L., Wilks, Y.: METER: measuring text reuse. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6–12, 2002, Philadelphia, PA, USA., pp. 152–159. ACL (2002) Clough, P. D., Gaizauskas, R. J., Piao, S. S. L., Wilks, Y.: METER: measuring text reuse. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6–12, 2002, Philadelphia, PA, USA., pp. 152–159. ACL (2002)
15.
Zurück zum Zitat Clough, P.D., Stevenson, M.: Developing a corpus of plagiarised short answers. Lang. Resour. Eval. 45(1), 5–24 (2011)CrossRef Clough, P.D., Stevenson, M.: Developing a corpus of plagiarised short answers. Lang. Resour. Eval. 45(1), 5–24 (2011)CrossRef
16.
Zurück zum Zitat Denkowski, M. J., Lavie, A.: Exploring normalization techniques for human judgments of machine translation adequacy collected using amazon mechanical turk. In: Proceedings of the 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, USA, June 6, 2010, pp. 57–61 (2010) Denkowski, M. J., Lavie, A.: Exploring normalization techniques for human judgments of machine translation adequacy collected using amazon mechanical turk. In: Proceedings of the 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, USA, June 6, 2010, pp. 57–61 (2010)
17.
Zurück zum Zitat Farghaly, A.: Computer processing of arabic script-based languages: Current state and future directions. In: Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, Stroudsburg, PA, USA. Association for Computational Linguistics (2004) Farghaly, A.: Computer processing of arabic script-based languages: Current state and future directions. In: Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, Stroudsburg, PA, USA. Association for Computational Linguistics (2004)
18.
Zurück zum Zitat Fortunato, S., Lancichinetti, A.: Community detection algorithms: a comparative analysis. In: G. Stea, J. Mairesse, and J. Mendes (Eds.), 4th International Conference on Performance Evaluation Methodologies and Tools, VALUETOOLS ’09, Pisa, Italy, October 20–22, 2009, pp. 27. ICST/ACM (2009) Fortunato, S., Lancichinetti, A.: Community detection algorithms: a comparative analysis. In: G. Stea, J. Mairesse, and J. Mendes (Eds.), 4th International Conference on Performance Evaluation Methodologies and Tools, VALUETOOLS ’09, Pisa, Italy, October 20–22, 2009, pp.  27. ICST/ACM (2009)
19.
Zurück zum Zitat Franco-Salvador, M., Gupta, P., Rosso, P., Banchs, R.E.: Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language. Knowl.-Based Syst. 111, 87–99 (2016)CrossRef Franco-Salvador, M., Gupta, P., Rosso, P., Banchs, R.E.: Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language. Knowl.-Based Syst. 111, 87–99 (2016)CrossRef
20.
Zurück zum Zitat Franco-Salvador, M., Rosso, P., Montes-y-Gómez, M.: A systematic study of knowledge graph analysis for cross-language plagiarism detection. Inf. Process. Manag. 52(4), 550–570 (2016)CrossRef Franco-Salvador, M., Rosso, P., Montes-y-Gómez, M.: A systematic study of knowledge graph analysis for cross-language plagiarism detection. Inf. Process. Manag. 52(4), 550–570 (2016)CrossRef
21.
Zurück zum Zitat Irvine, A., Klementiev, A.: Using mechanical turk to annotate lexicons for less commonly used languages. In: Proceedings of the 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, USA, June 6, 2010, pp. 108–113 (2010) Irvine, A., Klementiev, A.: Using mechanical turk to annotate lexicons for less commonly used languages. In: Proceedings of the 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, USA, June 6, 2010, pp. 108–113 (2010)
22.
Zurück zum Zitat Khoshnavataher, K., Zarrabi, V., Mohtaj, S., Asghari, H.: Developing monolingual persian corpus for extrinsic plagiarism detection using artificial obfuscation: notebook for PAN at CLEF 2015. See DBLP:conf/clef/2015w (2015) Khoshnavataher, K., Zarrabi, V., Mohtaj, S., Asghari, H.: Developing monolingual persian corpus for extrinsic plagiarism detection using artificial obfuscation: notebook for PAN at CLEF 2015. See DBLP:conf/clef/2015w (2015)
23.
Zurück zum Zitat Lizorkin, D., Medelyan, O., Grineva, M. P.: Analysis of community structure in wikipedia. In: J. Quemada, G. León, Y. S. Maarek, and W. Nejdl (Eds.), Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20–24, 2009, pp. 1221–1222. ACM (2009) Lizorkin, D., Medelyan, O., Grineva, M. P.: Analysis of community structure in wikipedia. In: J. Quemada, G. León, Y. S. Maarek, and W. Nejdl (Eds.), Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20–24, 2009, pp. 1221–1222. ACM (2009)
24.
Zurück zum Zitat Mashhadirajab, F., Shamsfard, M., Adelkhah, R., Shafiee, F., Saedi, C.: A text alignment corpus for persian plagiarism detection. In: FIRE (Working Notes), Volume 1737 of CEUR Workshop Proceedings, pp. 184–189. CEUR-WS.org (2016) Mashhadirajab, F., Shamsfard, M., Adelkhah, R., Shafiee, F., Saedi, C.: A text alignment corpus for persian plagiarism detection. In: FIRE (Working Notes), Volume 1737 of CEUR Workshop Proceedings, pp. 184–189. CEUR-WS.org (2016)
25.
Zurück zum Zitat Meuschke, N., Stange, V., Schubotz, M., Gipp, B.: Hyplag: a hybrid approach to academic plagiarism detection. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’18, New York, NY, USA, pp. 1321–1324. ACM (2018) Meuschke, N., Stange, V., Schubotz, M., Gipp, B.: Hyplag: a hybrid approach to academic plagiarism detection. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’18, New York, NY, USA, pp. 1321–1324. ACM (2018)
26.
Zurück zum Zitat Mohtaj, S., Asghari, H., Zarrabi, V.: Developing monolingual english corpus for plagiarism detection using human annotated paraphrase corpus. See DBLP:conf/clef/2015w (2015) Mohtaj, S., Asghari, H., Zarrabi, V.: Developing monolingual english corpus for plagiarism detection using human annotated paraphrase corpus. See DBLP:conf/clef/2015w (2015)
27.
Zurück zum Zitat Mohtaj, S., Asghari, H., Zarrabi, V.: Compiling a text re-use detection corpus from scientific papers with semi-real cases of plagiarism. In: 2017 International Conference on Asian Language Processing, IALP 2017, Singapore, December 5–7, 2017, pp. 227–230 (2017) Mohtaj, S., Asghari, H., Zarrabi, V.: Compiling a text re-use detection corpus from scientific papers with semi-real cases of plagiarism. In: 2017 International Conference on Asian Language Processing, IALP 2017, Singapore, December 5–7, 2017, pp. 227–230 (2017)
28.
Zurück zum Zitat Mohtaj, S., Roshanfekr, B., Zafarian, A., Asghari, H.: Parsivar: a language processing toolkit for persian. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7–12, (2018) Mohtaj, S., Roshanfekr, B., Zafarian, A., Asghari, H.: Parsivar: a language processing toolkit for persian. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7–12, (2018)
29.
Zurück zum Zitat Potthast, M., Gollub, T., Hagen, M., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th international competition on plagiarism detection. In: CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy, September 17–20, (2012) Potthast, M., Gollub, T., Hagen, M., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th international competition on plagiarism detection. In: CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy, September 17–20, (2012)
30.
Zurück zum Zitat Potthast, M., Hagen, M., Völske, M., Stein, B.: Crowdsourcing interaction logs to understand text reuse from the web. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, 4–9 August 2013, Sofia, Bulgaria, Volume 1: Long Papers, pp. 1212–1221. The Association for Computer Linguistics (2013) Potthast, M., Hagen, M., Völske, M., Stein, B.: Crowdsourcing interaction logs to understand text reuse from the web. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, 4–9 August 2013, Sofia, Bulgaria, Volume 1: Long Papers, pp. 1212–1221. The Association for Computer Linguistics (2013)
31.
Zurück zum Zitat Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An evaluation framework for plagiarism detection. In: C. Huang and D. Jurafsky (Eds.), COLING 2010, 23rd International Conference on Computational Linguistics, Posters Volume, 23–27 August 2010, Beijing, China, pp. 997–1005. Chinese Information Processing Society of China (2010) Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An evaluation framework for plagiarism detection. In: C. Huang and D. Jurafsky (Eds.), COLING 2010, 23rd International Conference on Computational Linguistics, Posters Volume, 23–27 August 2010, Beijing, China, pp. 997–1005. Chinese Information Processing Society of China (2010)
32.
Zurück zum Zitat Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeno, A., Rosso, P.: Overview of the 1st international competition on plagiarism detection. In: 3rd PAN Workshop. Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 09), CEUR-WS.org, pp. 1–9 (2009) Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeno, A., Rosso, P.: Overview of the 1st international competition on plagiarism detection. In: 3rd PAN Workshop. Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 09), CEUR-WS.org, pp. 1–9 (2009)
33.
Zurück zum Zitat Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008)CrossRef Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008)CrossRef
34.
Zurück zum Zitat Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through crowdsourcing: Towards best practice guidelines. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, May 26–31, 2014., pp. 859–866 (2014) Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through crowdsourcing: Towards best practice guidelines. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, May 26–31, 2014., pp. 859–866 (2014)
35.
Zurück zum Zitat Shamsfard, M.: Challenges and open problems in persian text processing. In: 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 65—69 (2011) Shamsfard, M.: Challenges and open problems in persian text processing. In: 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 65—69 (2011)
36.
Zurück zum Zitat Shamsfard, M., Hesabi, A., Fadaei, H., Mansoory, N., Famian, A., Bagherbeigi, S., Fekri, E., Monshizadeh, M., Assi, S. M.: Semi automatic development of farsnet; the persian wordnet. In: Proceedings of 5th Global WordNet Conference (GWA2010), Mumbai, India, Volume 29 (2010) Shamsfard, M., Hesabi, A., Fadaei, H., Mansoory, N., Famian, A., Bagherbeigi, S., Fekri, E., Monshizadeh, M., Assi, S. M.: Semi automatic development of farsnet; the persian wordnet. In: Proceedings of 5th Global WordNet Conference (GWA2010), Mumbai, India, Volume 29 (2010)
37.
Zurück zum Zitat Sharifabadi, M. R., Eftekhari, S.A.: Mahak samim: A corpus of persian academic texts for evaluating plagiarism detection systems. In: FIRE (Working Notes), Volume 1737 of CEUR Workshop Proceedings, pp. 190–192. CEUR-WS.org (2016) Sharifabadi, M. R., Eftekhari, S.A.: Mahak samim: A corpus of persian academic texts for evaluating plagiarism detection systems. In: FIRE (Working Notes), Volume 1737 of CEUR Workshop Proceedings, pp. 190–192. CEUR-WS.org (2016)
38.
Zurück zum Zitat Sharjeel, M., Nawab, R. M. A., Rayson, P.: Counter: corpus of urdu news text reuse. Language Resources and Evaluation, 1–27 (2016) Sharjeel, M., Nawab, R. M. A., Rayson, P.: Counter: corpus of urdu news text reuse. Language Resources and Evaluation, 1–27 (2016)
39.
Zurück zum Zitat Stein, B., zu Eissen, S. M., Potthast, M.: Strategies for retrieving plagiarized documents. In: W. Kraaij, A. P. de Vries, C. L. A. Clarke, N. Fuhr, and N. Kando (Eds.), SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23–27, 2007, pp. 825–826. ACM (2007) Stein, B., zu Eissen, S. M., Potthast, M.: Strategies for retrieving plagiarized documents. In: W. Kraaij, A. P. de Vries, C. L. A. Clarke, N. Fuhr, and N. Kando (Eds.), SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23–27, 2007, pp. 825–826. ACM (2007)
40.
Zurück zum Zitat Yang, Z., Algesheimer, R., Tessone, C. J.: A comparative analysis of community detection algorithms on artificial networks. Scientific Reports 6 (2016) Yang, Z., Algesheimer, R., Tessone, C. J.: A comparative analysis of community detection algorithms on artificial networks. Scientific Reports  6 (2016)
41.
Zurück zum Zitat zu Eissen, S. M., Stein, B.: Intrinsic plagiarism detection. In: M. Lalmas, A. MacFarlane, S. M. Rüger, A. Tombros, T. Tsikrika, and A. Yavlinsky (Eds.), Advances in Information Retrieval, 28th European Conference on IR Research, ECIR 2006, London, UK, April 10–12, 2006, Proceedings, Volume 3936 of Lecture Notes in Computer Science, pp. 565–569. Springer (2006) zu Eissen, S. M., Stein, B.: Intrinsic plagiarism detection. In: M. Lalmas, A. MacFarlane, S. M. Rüger, A. Tombros, T. Tsikrika, and A. Yavlinsky (Eds.), Advances in Information Retrieval, 28th European Conference on IR Research, ECIR 2006, London, UK, April 10–12, 2006, Proceedings, Volume 3936 of Lecture Notes in Computer Science, pp. 565–569. Springer (2006)
Metadaten
Titel
A crowdsourcing approach to construct mono-lingual plagiarism detection corpus
verfasst von
Habibollah Asghari
Omid Fatemi
Salar Mohtaj
Heshaam Faili
Publikationsdatum
07.09.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Digital Libraries / Ausgabe 1/2021
Print ISSN: 1432-5012
Elektronische ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-020-00294-4

Weitere Artikel der Ausgabe 1/2021

International Journal on Digital Libraries 1/2021 Zur Ausgabe