Skip to main content
Erschienen in: Cognitive Computation 6/2017

22.08.2017

A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources

verfasst von: Mansi Sahi, Vishal Gupta

Erschienen in: Cognitive Computation | Ausgabe 6/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Plagiarism takes place when we use any person’s work without giving due acknowledgment. There are several fields where the text similarity is involved like web document retrieval, information mining, and searching related articles. Several approaches have been introduced for detecting plagiarism in the text documents based on the syntactic structure of the text, string similarity, fingerprinting, semantic meaning underlying the text, etc. The basic limitation of plagiarism detection systems these days is that they fail to detect tough cases of plagiarism. The proposed plagiarism detection approach is the hybrid of semantic and syntactic similarity between the text documents. This novel approach exploits linguistic information sources non-linearly using the lexical database for finding the relatedness between text documents. The proposed approach uses semantic knowledge to perform cognitive-inspired computing. The framework is capable of detecting intelligent plagiarism cases like a verbatim copy, paraphrasing, rewording in a sentence, and sentence transformation. The approach has been evaluated on the standard PAN-PC-11 dataset. The experiments show that our technique has outperformed other strong baseline techniques in terms of precision, recall, F-measure, and plagiarism detection (PlagDet) score.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kauffman Y, Young MF. Digital plagiarism: an experimental study of the effect of instructional goals and copy-and-paste affordance. Comput Educ. 2015;83:44–56.CrossRef Kauffman Y, Young MF. Digital plagiarism: an experimental study of the effect of instructional goals and copy-and-paste affordance. Comput Educ. 2015;83:44–56.CrossRef
2.
Zurück zum Zitat Smedley A, Crawford T, Cloete L. An intervention aimed at reducing plagiarism in undergraduate nursing students. Nurse Educ Pract. 2015;15(3):168–73.CrossRefPubMed Smedley A, Crawford T, Cloete L. An intervention aimed at reducing plagiarism in undergraduate nursing students. Nurse Educ Pract. 2015;15(3):168–73.CrossRefPubMed
3.
Zurück zum Zitat Eret E, Gokmenoglu T. Plagiarism in higher education: a case study with prospective academicians. Proc Soc Behav Sci. 2010;2(2):3303–7.CrossRef Eret E, Gokmenoglu T. Plagiarism in higher education: a case study with prospective academicians. Proc Soc Behav Sci. 2010;2(2):3303–7.CrossRef
4.
Zurück zum Zitat Shivakumar N, Garcia-Molina H. SCAM: a copy detection mechanism for digital documents. Stanford: Department of Computer Science, Stanford University: Austin, Texas 1995. Shivakumar N, Garcia-Molina H. SCAM: a copy detection mechanism for digital documents. Stanford: Department of Computer Science, Stanford University: Austin, Texas 1995.
5.
Zurück zum Zitat Si A, Leong HV, Lau RWH. Check: a document plagiarism detection system. In: Proceedings of the 1997 ACM symposium on applied computing. ACM: San Jose, California, USA 1997. pp. 70-7. Si A, Leong HV, Lau RWH. Check: a document plagiarism detection system. In: Proceedings of the 1997 ACM symposium on applied computing. ACM: San Jose, California, USA 1997. pp. 70-7.
6.
Zurück zum Zitat Balaguer EV. Putting ourselves in SME’s shoes: automatic detection of plagiarism by the WCopyFind tool. In: Proceedings of the 3rd PAN'09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Bauhaus University Weimar, 2009. pp. 34-5. Balaguer EV. Putting ourselves in SME’s shoes: automatic detection of plagiarism by the WCopyFind tool. In: Proceedings of the 3rd PAN'09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Bauhaus University Weimar, 2009. pp. 34-5.
7.
Zurück zum Zitat Chong MYM. A study on plagiarism detection and plagiarism direction identification using natural language processing techniques. Thesis report, England: University of Wolverhampton; 2013. Chong MYM. A study on plagiarism detection and plagiarism direction identification using natural language processing techniques. Thesis report, England: University of Wolverhampton; 2013.
8.
Zurück zum Zitat Barrón-Cedeño A, Gupta P, Rosso P. Methods for cross-language plagiarism detection. Knowl-Based Syst. 2013;50:211–7.CrossRef Barrón-Cedeño A, Gupta P, Rosso P. Methods for cross-language plagiarism detection. Knowl-Based Syst. 2013;50:211–7.CrossRef
9.
Zurück zum Zitat Kent CK, Salim N. Web-based cross-language plagiarism detection. In: Computational Intelligence, Modelling and Simulation (CIMSiM), 2010 Second International Conference on. Sydney: IEEE: Indonesia 2010. pp. 199-204. Kent CK, Salim N. Web-based cross-language plagiarism detection. In: Computational Intelligence, Modelling and Simulation (CIMSiM), 2010 Second International Conference on. Sydney: IEEE: Indonesia 2010. pp. 199-204.
10.
Zurück zum Zitat Potthast M, et al. Cross-language plagiarism detection. Lang Resour Eval. 2011;45(1):45–62.CrossRef Potthast M, et al. Cross-language plagiarism detection. Lang Resour Eval. 2011;45(1):45–62.CrossRef
11.
Zurück zum Zitat Menai MEB, Bagais M. APlag: a plagiarism checker for Arabic texts. In: Computer Science & Education (ICCSE), 2011 6th International Conference on. Singapore: IEEE: Singapore 2011. pp. 1379-83. Menai MEB, Bagais M. APlag: a plagiarism checker for Arabic texts. In: Computer Science & Education (ICCSE), 2011 6th International Conference on. Singapore: IEEE: Singapore 2011. pp. 1379-83.
12.
Zurück zum Zitat Butakov S, Scherbinin V. The toolbox for local and global plagiarism detection. Comput Educ. 2009;52(4):781–8.CrossRef Butakov S, Scherbinin V. The toolbox for local and global plagiarism detection. Comput Educ. 2009;52(4):781–8.CrossRef
13.
Zurück zum Zitat Jadalla A, Elnagar A. A plagiarism detection system for Arabic text-based documents. Pacific-Asia Workshop on Intelligence and Security Informatics. Heidelberg: Springer Berlin; 2012. pp. 145-53. Jadalla A, Elnagar A. A plagiarism detection system for Arabic text-based documents. Pacific-Asia Workshop on Intelligence and Security Informatics. Heidelberg: Springer Berlin; 2012. pp. 145-53.
14.
Zurück zum Zitat Li Y, Bandar ZA, McLean D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng. 2003;15(4):871–82.CrossRef Li Y, Bandar ZA, McLean D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng. 2003;15(4):871–82.CrossRef
15.
Zurück zum Zitat Grozea C, Gehl C, Popescu M. ENCOPLOT: pairwise sequence matching in linear time applied to plagiarism detection. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 10–8. Grozea C, Gehl C, Popescu M. ENCOPLOT: pairwise sequence matching in linear time applied to plagiarism detection. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 10–8.
16.
Zurück zum Zitat Agarwal B, Poria S, Mittal N, Gelbukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput. 2015;7(4):487–99.CrossRef Agarwal B, Poria S, Mittal N, Gelbukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput. 2015;7(4):487–99.CrossRef
17.
Zurück zum Zitat Zechner M, Muhr M, Kern R, Granitzer M. External and intrinsic plagiarism detection using vector space models. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Donostia; 2009. p. 47-55. Zechner M, Muhr M, Kern R, Granitzer M. External and intrinsic plagiarism detection using vector space models. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Donostia; 2009. p. 47-55.
18.
Zurück zum Zitat Basile C, Benedetto D, Caglioti E, Cristadoro G, Esposti MD. A plagiarism detection procedure in three steps: selection, matches and “squares”. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 19–23. Basile C, Benedetto D, Caglioti E, Cristadoro G, Esposti MD. A plagiarism detection procedure in three steps: selection, matches and “squares”. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 19–23.
19.
Zurück zum Zitat Kent CK, Salim N. Features based text similarity detection. Journal of Computing 2010. 2(1). Kent CK, Salim N. Features based text similarity detection. Journal of Computing 2010. 2(1).
20.
Zurück zum Zitat Hussein AS. Arabic document similarity analysis using n-grams and singular value decomposition. In: 2015 I.E. 9th International Conference on Research Challenges in Information Science (RCIS). Athens: IEEE, 2015. pp. 445-55 Hussein AS. Arabic document similarity analysis using n-grams and singular value decomposition. In: 2015 I.E. 9th International Conference on Research Challenges in Information Science (RCIS). Athens: IEEE, 2015. pp. 445-55
21.
Zurück zum Zitat Chanceaux M, Guérin-Dugué A, Lemaire B, Baccino T. A computational cognitive model of information search in textual materials. Cogn Comput. 2014;6(1):1–17.CrossRef Chanceaux M, Guérin-Dugué A, Lemaire B, Baccino T. A computational cognitive model of information search in textual materials. Cogn Comput. 2014;6(1):1–17.CrossRef
22.
Zurück zum Zitat Ekbal A, Saha S, Choudhary G. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 12th International Conference on. Pune: IEEE: Pune, India 2012. pp. 366-71. Ekbal A, Saha S, Choudhary G. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 12th International Conference on. Pune: IEEE: Pune, India 2012. pp. 366-71.
23.
Zurück zum Zitat Lin C, Liu D, Pang W, Wang Z. Sherlock: a semi-automatic framework for quiz generation using a hybrid semantic similarity measure. Cogn Comput. 2015;7(6):667–79.CrossRef Lin C, Liu D, Pang W, Wang Z. Sherlock: a semi-automatic framework for quiz generation using a hybrid semantic similarity measure. Cogn Comput. 2015;7(6):667–79.CrossRef
24.
Zurück zum Zitat Abdi A, et al. PDLK: plagiarism detection using linguistic knowledge. Expert Syst Appl. 2015;42(22):8936–46.CrossRef Abdi A, et al. PDLK: plagiarism detection using linguistic knowledge. Expert Syst Appl. 2015;42(22):8936–46.CrossRef
25.
Zurück zum Zitat Schleimer S, Wilkerson DS, Aiken A. Winnowing: local algorithms for document fingerprinting. Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM: San Diego, California, USA 2003. pp. 76-85 Schleimer S, Wilkerson DS, Aiken A. Winnowing: local algorithms for document fingerprinting. Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM: San Diego, California, USA 2003. pp. 76-85
26.
Zurück zum Zitat Velásquez JD, et al. DOCODE 3.0 (DOcument COpy DEtector): a system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf Fusion. 2016;27:64–75.CrossRef Velásquez JD, et al. DOCODE 3.0 (DOcument COpy DEtector): a system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf Fusion. 2016;27:64–75.CrossRef
27.
Zurück zum Zitat Sánchez-Vega F, et al. Determining and characterizing the reused text for plagiarism detection. Expert Syst Appl. 2013;40(5):1804–13.CrossRef Sánchez-Vega F, et al. Determining and characterizing the reused text for plagiarism detection. Expert Syst Appl. 2013;40(5):1804–13.CrossRef
28.
Zurück zum Zitat Osman AH, et al. An improved plagiarism detection scheme based on semantic role labeling. Appl Soft Comput. 2012;12(5):1493–502.CrossRef Osman AH, et al. An improved plagiarism detection scheme based on semantic role labeling. Appl Soft Comput. 2012;12(5):1493–502.CrossRef
29.
Zurück zum Zitat Paul M, Jamal S. An improved SRL based plagiarism detection technique using sentence ranking. Proc Comput Sci. 2015;46:223–30.CrossRef Paul M, Jamal S. An improved SRL based plagiarism detection technique using sentence ranking. Proc Comput Sci. 2015;46:223–30.CrossRef
30.
Zurück zum Zitat Osman AH, et al. Conceptual similarity and graph-based method for plagiarism detection. J Theor Appl Inf Technol. 2011;32(2):135–45. Osman AH, et al. Conceptual similarity and graph-based method for plagiarism detection. J Theor Appl Inf Technol. 2011;32(2):135–45.
31.
Zurück zum Zitat Alzahrani S, Salim N. Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler and Harman. 2010;1176:1-8. Alzahrani S, Salim N. Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler and Harman. 2010;1176:1-8.
32.
Zurück zum Zitat Medin DL, Goldstone RL, Gentner D. Respects for similarity. Psychol Rev. 1993;100(2):254.CrossRef Medin DL, Goldstone RL, Gentner D. Respects for similarity. Psychol Rev. 1993;100(2):254.CrossRef
33.
Zurück zum Zitat Miller GA. WordNet: a lexical database for English. Commun ACM. 1995;38(11):39–41.CrossRef Miller GA. WordNet: a lexical database for English. Commun ACM. 1995;38(11):39–41.CrossRef
34.
Zurück zum Zitat Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32(Suppl 1):D258–61.CrossRef Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32(Suppl 1):D258–61.CrossRef
35.
Zurück zum Zitat Altheide P. Spatial data transfer standard (sdts). In: Encyclopedia of GIS. Springer US: USA 2008. pp. 1087-95. Altheide P. Spatial data transfer standard (sdts). In: Encyclopedia of GIS. Springer US: USA 2008. pp. 1087-95.
36.
Zurück zum Zitat Li Y, et al. Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng. 2006;18(8):1138–50.CrossRef Li Y, et al. Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng. 2006;18(8):1138–50.CrossRef
37.
Zurück zum Zitat Rada R, e a. Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern. 1989;19(1):17–30.CrossRef Rada R, e a. Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern. 1989;19(1):17–30.CrossRef
38.
Zurück zum Zitat Wu Z, Palmer M. Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics: Las Cruces, 1994. pp. 133–8. Wu Z, Palmer M. Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics: Las Cruces, 1994. pp. 133–8.
39.
Zurück zum Zitat Lin D. An information-theoretic definition of similarity. Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998). San Francisco: Morgan Kaufmann Publishers Inc; 1998. pp. 296–304. Lin D. An information-theoretic definition of similarity. Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998). San Francisco: Morgan Kaufmann Publishers Inc; 1998. pp. 296–304.
40.
Zurück zum Zitat Lennon M, et al. An evaluation of some conflation algorithms for information retrieval. J Inf Sci. 1981;3(4):177–83. Lennon M, et al. An evaluation of some conflation algorithms for information retrieval. J Inf Sci. 1981;3(4):177–83.
41.
Zurück zum Zitat Tomasic A, Garcia-Molina H. Query processing and inverted indices in shared: nothing text document information retrieval systems. VLDB J. 1993;2(3):243–76.CrossRef Tomasic A, Garcia-Molina H. Query processing and inverted indices in shared: nothing text document information retrieval systems. VLDB J. 1993;2(3):243–76.CrossRef
42.
Zurück zum Zitat Alzahrani SM, Salim N, Abraham A. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans Syst Man Cybern Part C Appl Rev. 2012;42(2):133-49. Alzahrani SM, Salim N, Abraham A. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans Syst Man Cybern Part C Appl Rev. 2012;42(2):133-49.
43.
Zurück zum Zitat Stamatatos E. Plagiarism detection using stopword n-grams. J Assoc Inf Sci Technol. 2011;62(12):2512-27. Stamatatos E. Plagiarism detection using stopword n-grams. J Assoc Inf Sci Technol. 2011;62(12):2512-27.
44.
Zurück zum Zitat Ahsaee MG, Naghibzadeh M, Ehsan Yasrebi Naeini S. Semantic similarity assessment of words using weighted WordNet. Int J Mach Learn Cybern. 2014;5(3):479–90.CrossRef Ahsaee MG, Naghibzadeh M, Ehsan Yasrebi Naeini S. Semantic similarity assessment of words using weighted WordNet. Int J Mach Learn Cybern. 2014;5(3):479–90.CrossRef
45.
Zurück zum Zitat Wang S, Qi H, Kong L, Nu C. Combination of VSM and Jaccard coefficient for external plagiarism detection. In: 2013 International Conference on Machine Learning and Cybernetics, vol 4. Tianjin: IEEE: Tianjin, 2013. pp. 1880–85. Wang S, Qi H, Kong L, Nu C. Combination of VSM and Jaccard coefficient for external plagiarism detection. In: 2013 International Conference on Machine Learning and Cybernetics, vol 4. Tianjin: IEEE: Tianjin, 2013. pp. 1880–85.
46.
Zurück zum Zitat Ekbal A, Saha S, and Choudhary S. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 2012 12th International Conference on. Pune: IEEE; 2012. pp. 366–71. Ekbal A, Saha S, and Choudhary S. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 2012 12th International Conference on. Pune: IEEE; 2012. pp. 366–71.
47.
Zurück zum Zitat Grman J, Ravas R. Improved implementation for finding text similarities in large collections of data. Proc PAN at CLEF conference in Amsterdam, The Netherlands. 2011;4(4):339–365. Grman J, Ravas R. Improved implementation for finding text similarities in large collections of data. Proc PAN at CLEF conference in Amsterdam, The Netherlands. 2011;4(4):339–365.
Metadaten
Titel
A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources
verfasst von
Mansi Sahi
Vishal Gupta
Publikationsdatum
22.08.2017
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 6/2017
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-017-9502-4

Weitere Artikel der Ausgabe 6/2017

Cognitive Computation 6/2017 Zur Ausgabe