Skip to main content
Top
Published in: Cognitive Computation 6/2017

22-08-2017

A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources

Authors: Mansi Sahi, Vishal Gupta

Published in: Cognitive Computation | Issue 6/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Plagiarism takes place when we use any person’s work without giving due acknowledgment. There are several fields where the text similarity is involved like web document retrieval, information mining, and searching related articles. Several approaches have been introduced for detecting plagiarism in the text documents based on the syntactic structure of the text, string similarity, fingerprinting, semantic meaning underlying the text, etc. The basic limitation of plagiarism detection systems these days is that they fail to detect tough cases of plagiarism. The proposed plagiarism detection approach is the hybrid of semantic and syntactic similarity between the text documents. This novel approach exploits linguistic information sources non-linearly using the lexical database for finding the relatedness between text documents. The proposed approach uses semantic knowledge to perform cognitive-inspired computing. The framework is capable of detecting intelligent plagiarism cases like a verbatim copy, paraphrasing, rewording in a sentence, and sentence transformation. The approach has been evaluated on the standard PAN-PC-11 dataset. The experiments show that our technique has outperformed other strong baseline techniques in terms of precision, recall, F-measure, and plagiarism detection (PlagDet) score.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Kauffman Y, Young MF. Digital plagiarism: an experimental study of the effect of instructional goals and copy-and-paste affordance. Comput Educ. 2015;83:44–56.CrossRef Kauffman Y, Young MF. Digital plagiarism: an experimental study of the effect of instructional goals and copy-and-paste affordance. Comput Educ. 2015;83:44–56.CrossRef
2.
go back to reference Smedley A, Crawford T, Cloete L. An intervention aimed at reducing plagiarism in undergraduate nursing students. Nurse Educ Pract. 2015;15(3):168–73.CrossRefPubMed Smedley A, Crawford T, Cloete L. An intervention aimed at reducing plagiarism in undergraduate nursing students. Nurse Educ Pract. 2015;15(3):168–73.CrossRefPubMed
3.
go back to reference Eret E, Gokmenoglu T. Plagiarism in higher education: a case study with prospective academicians. Proc Soc Behav Sci. 2010;2(2):3303–7.CrossRef Eret E, Gokmenoglu T. Plagiarism in higher education: a case study with prospective academicians. Proc Soc Behav Sci. 2010;2(2):3303–7.CrossRef
4.
go back to reference Shivakumar N, Garcia-Molina H. SCAM: a copy detection mechanism for digital documents. Stanford: Department of Computer Science, Stanford University: Austin, Texas 1995. Shivakumar N, Garcia-Molina H. SCAM: a copy detection mechanism for digital documents. Stanford: Department of Computer Science, Stanford University: Austin, Texas 1995.
5.
go back to reference Si A, Leong HV, Lau RWH. Check: a document plagiarism detection system. In: Proceedings of the 1997 ACM symposium on applied computing. ACM: San Jose, California, USA 1997. pp. 70-7. Si A, Leong HV, Lau RWH. Check: a document plagiarism detection system. In: Proceedings of the 1997 ACM symposium on applied computing. ACM: San Jose, California, USA 1997. pp. 70-7.
6.
go back to reference Balaguer EV. Putting ourselves in SME’s shoes: automatic detection of plagiarism by the WCopyFind tool. In: Proceedings of the 3rd PAN'09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Bauhaus University Weimar, 2009. pp. 34-5. Balaguer EV. Putting ourselves in SME’s shoes: automatic detection of plagiarism by the WCopyFind tool. In: Proceedings of the 3rd PAN'09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Bauhaus University Weimar, 2009. pp. 34-5.
7.
go back to reference Chong MYM. A study on plagiarism detection and plagiarism direction identification using natural language processing techniques. Thesis report, England: University of Wolverhampton; 2013. Chong MYM. A study on plagiarism detection and plagiarism direction identification using natural language processing techniques. Thesis report, England: University of Wolverhampton; 2013.
8.
go back to reference Barrón-Cedeño A, Gupta P, Rosso P. Methods for cross-language plagiarism detection. Knowl-Based Syst. 2013;50:211–7.CrossRef Barrón-Cedeño A, Gupta P, Rosso P. Methods for cross-language plagiarism detection. Knowl-Based Syst. 2013;50:211–7.CrossRef
9.
go back to reference Kent CK, Salim N. Web-based cross-language plagiarism detection. In: Computational Intelligence, Modelling and Simulation (CIMSiM), 2010 Second International Conference on. Sydney: IEEE: Indonesia 2010. pp. 199-204. Kent CK, Salim N. Web-based cross-language plagiarism detection. In: Computational Intelligence, Modelling and Simulation (CIMSiM), 2010 Second International Conference on. Sydney: IEEE: Indonesia 2010. pp. 199-204.
10.
go back to reference Potthast M, et al. Cross-language plagiarism detection. Lang Resour Eval. 2011;45(1):45–62.CrossRef Potthast M, et al. Cross-language plagiarism detection. Lang Resour Eval. 2011;45(1):45–62.CrossRef
11.
go back to reference Menai MEB, Bagais M. APlag: a plagiarism checker for Arabic texts. In: Computer Science & Education (ICCSE), 2011 6th International Conference on. Singapore: IEEE: Singapore 2011. pp. 1379-83. Menai MEB, Bagais M. APlag: a plagiarism checker for Arabic texts. In: Computer Science & Education (ICCSE), 2011 6th International Conference on. Singapore: IEEE: Singapore 2011. pp. 1379-83.
12.
go back to reference Butakov S, Scherbinin V. The toolbox for local and global plagiarism detection. Comput Educ. 2009;52(4):781–8.CrossRef Butakov S, Scherbinin V. The toolbox for local and global plagiarism detection. Comput Educ. 2009;52(4):781–8.CrossRef
13.
go back to reference Jadalla A, Elnagar A. A plagiarism detection system for Arabic text-based documents. Pacific-Asia Workshop on Intelligence and Security Informatics. Heidelberg: Springer Berlin; 2012. pp. 145-53. Jadalla A, Elnagar A. A plagiarism detection system for Arabic text-based documents. Pacific-Asia Workshop on Intelligence and Security Informatics. Heidelberg: Springer Berlin; 2012. pp. 145-53.
14.
go back to reference Li Y, Bandar ZA, McLean D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng. 2003;15(4):871–82.CrossRef Li Y, Bandar ZA, McLean D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng. 2003;15(4):871–82.CrossRef
15.
go back to reference Grozea C, Gehl C, Popescu M. ENCOPLOT: pairwise sequence matching in linear time applied to plagiarism detection. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 10–8. Grozea C, Gehl C, Popescu M. ENCOPLOT: pairwise sequence matching in linear time applied to plagiarism detection. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 10–8.
16.
go back to reference Agarwal B, Poria S, Mittal N, Gelbukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput. 2015;7(4):487–99.CrossRef Agarwal B, Poria S, Mittal N, Gelbukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput. 2015;7(4):487–99.CrossRef
17.
go back to reference Zechner M, Muhr M, Kern R, Granitzer M. External and intrinsic plagiarism detection using vector space models. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Donostia; 2009. p. 47-55. Zechner M, Muhr M, Kern R, Granitzer M. External and intrinsic plagiarism detection using vector space models. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Donostia; 2009. p. 47-55.
18.
go back to reference Basile C, Benedetto D, Caglioti E, Cristadoro G, Esposti MD. A plagiarism detection procedure in three steps: selection, matches and “squares”. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 19–23. Basile C, Benedetto D, Caglioti E, Cristadoro G, Esposti MD. A plagiarism detection procedure in three steps: selection, matches and “squares”. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 19–23.
19.
go back to reference Kent CK, Salim N. Features based text similarity detection. Journal of Computing 2010. 2(1). Kent CK, Salim N. Features based text similarity detection. Journal of Computing 2010. 2(1).
20.
go back to reference Hussein AS. Arabic document similarity analysis using n-grams and singular value decomposition. In: 2015 I.E. 9th International Conference on Research Challenges in Information Science (RCIS). Athens: IEEE, 2015. pp. 445-55 Hussein AS. Arabic document similarity analysis using n-grams and singular value decomposition. In: 2015 I.E. 9th International Conference on Research Challenges in Information Science (RCIS). Athens: IEEE, 2015. pp. 445-55
21.
go back to reference Chanceaux M, Guérin-Dugué A, Lemaire B, Baccino T. A computational cognitive model of information search in textual materials. Cogn Comput. 2014;6(1):1–17.CrossRef Chanceaux M, Guérin-Dugué A, Lemaire B, Baccino T. A computational cognitive model of information search in textual materials. Cogn Comput. 2014;6(1):1–17.CrossRef
22.
go back to reference Ekbal A, Saha S, Choudhary G. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 12th International Conference on. Pune: IEEE: Pune, India 2012. pp. 366-71. Ekbal A, Saha S, Choudhary G. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 12th International Conference on. Pune: IEEE: Pune, India 2012. pp. 366-71.
23.
go back to reference Lin C, Liu D, Pang W, Wang Z. Sherlock: a semi-automatic framework for quiz generation using a hybrid semantic similarity measure. Cogn Comput. 2015;7(6):667–79.CrossRef Lin C, Liu D, Pang W, Wang Z. Sherlock: a semi-automatic framework for quiz generation using a hybrid semantic similarity measure. Cogn Comput. 2015;7(6):667–79.CrossRef
24.
go back to reference Abdi A, et al. PDLK: plagiarism detection using linguistic knowledge. Expert Syst Appl. 2015;42(22):8936–46.CrossRef Abdi A, et al. PDLK: plagiarism detection using linguistic knowledge. Expert Syst Appl. 2015;42(22):8936–46.CrossRef
25.
go back to reference Schleimer S, Wilkerson DS, Aiken A. Winnowing: local algorithms for document fingerprinting. Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM: San Diego, California, USA 2003. pp. 76-85 Schleimer S, Wilkerson DS, Aiken A. Winnowing: local algorithms for document fingerprinting. Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM: San Diego, California, USA 2003. pp. 76-85
26.
go back to reference Velásquez JD, et al. DOCODE 3.0 (DOcument COpy DEtector): a system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf Fusion. 2016;27:64–75.CrossRef Velásquez JD, et al. DOCODE 3.0 (DOcument COpy DEtector): a system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf Fusion. 2016;27:64–75.CrossRef
27.
go back to reference Sánchez-Vega F, et al. Determining and characterizing the reused text for plagiarism detection. Expert Syst Appl. 2013;40(5):1804–13.CrossRef Sánchez-Vega F, et al. Determining and characterizing the reused text for plagiarism detection. Expert Syst Appl. 2013;40(5):1804–13.CrossRef
28.
go back to reference Osman AH, et al. An improved plagiarism detection scheme based on semantic role labeling. Appl Soft Comput. 2012;12(5):1493–502.CrossRef Osman AH, et al. An improved plagiarism detection scheme based on semantic role labeling. Appl Soft Comput. 2012;12(5):1493–502.CrossRef
29.
go back to reference Paul M, Jamal S. An improved SRL based plagiarism detection technique using sentence ranking. Proc Comput Sci. 2015;46:223–30.CrossRef Paul M, Jamal S. An improved SRL based plagiarism detection technique using sentence ranking. Proc Comput Sci. 2015;46:223–30.CrossRef
30.
go back to reference Osman AH, et al. Conceptual similarity and graph-based method for plagiarism detection. J Theor Appl Inf Technol. 2011;32(2):135–45. Osman AH, et al. Conceptual similarity and graph-based method for plagiarism detection. J Theor Appl Inf Technol. 2011;32(2):135–45.
31.
go back to reference Alzahrani S, Salim N. Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler and Harman. 2010;1176:1-8. Alzahrani S, Salim N. Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler and Harman. 2010;1176:1-8.
32.
go back to reference Medin DL, Goldstone RL, Gentner D. Respects for similarity. Psychol Rev. 1993;100(2):254.CrossRef Medin DL, Goldstone RL, Gentner D. Respects for similarity. Psychol Rev. 1993;100(2):254.CrossRef
33.
go back to reference Miller GA. WordNet: a lexical database for English. Commun ACM. 1995;38(11):39–41.CrossRef Miller GA. WordNet: a lexical database for English. Commun ACM. 1995;38(11):39–41.CrossRef
34.
go back to reference Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32(Suppl 1):D258–61.CrossRef Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32(Suppl 1):D258–61.CrossRef
35.
go back to reference Altheide P. Spatial data transfer standard (sdts). In: Encyclopedia of GIS. Springer US: USA 2008. pp. 1087-95. Altheide P. Spatial data transfer standard (sdts). In: Encyclopedia of GIS. Springer US: USA 2008. pp. 1087-95.
36.
go back to reference Li Y, et al. Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng. 2006;18(8):1138–50.CrossRef Li Y, et al. Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng. 2006;18(8):1138–50.CrossRef
37.
go back to reference Rada R, e a. Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern. 1989;19(1):17–30.CrossRef Rada R, e a. Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern. 1989;19(1):17–30.CrossRef
38.
go back to reference Wu Z, Palmer M. Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics: Las Cruces, 1994. pp. 133–8. Wu Z, Palmer M. Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics: Las Cruces, 1994. pp. 133–8.
39.
go back to reference Lin D. An information-theoretic definition of similarity. Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998). San Francisco: Morgan Kaufmann Publishers Inc; 1998. pp. 296–304. Lin D. An information-theoretic definition of similarity. Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998). San Francisco: Morgan Kaufmann Publishers Inc; 1998. pp. 296–304.
40.
go back to reference Lennon M, et al. An evaluation of some conflation algorithms for information retrieval. J Inf Sci. 1981;3(4):177–83. Lennon M, et al. An evaluation of some conflation algorithms for information retrieval. J Inf Sci. 1981;3(4):177–83.
41.
go back to reference Tomasic A, Garcia-Molina H. Query processing and inverted indices in shared: nothing text document information retrieval systems. VLDB J. 1993;2(3):243–76.CrossRef Tomasic A, Garcia-Molina H. Query processing and inverted indices in shared: nothing text document information retrieval systems. VLDB J. 1993;2(3):243–76.CrossRef
42.
go back to reference Alzahrani SM, Salim N, Abraham A. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans Syst Man Cybern Part C Appl Rev. 2012;42(2):133-49. Alzahrani SM, Salim N, Abraham A. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans Syst Man Cybern Part C Appl Rev. 2012;42(2):133-49.
43.
go back to reference Stamatatos E. Plagiarism detection using stopword n-grams. J Assoc Inf Sci Technol. 2011;62(12):2512-27. Stamatatos E. Plagiarism detection using stopword n-grams. J Assoc Inf Sci Technol. 2011;62(12):2512-27.
44.
go back to reference Ahsaee MG, Naghibzadeh M, Ehsan Yasrebi Naeini S. Semantic similarity assessment of words using weighted WordNet. Int J Mach Learn Cybern. 2014;5(3):479–90.CrossRef Ahsaee MG, Naghibzadeh M, Ehsan Yasrebi Naeini S. Semantic similarity assessment of words using weighted WordNet. Int J Mach Learn Cybern. 2014;5(3):479–90.CrossRef
45.
go back to reference Wang S, Qi H, Kong L, Nu C. Combination of VSM and Jaccard coefficient for external plagiarism detection. In: 2013 International Conference on Machine Learning and Cybernetics, vol 4. Tianjin: IEEE: Tianjin, 2013. pp. 1880–85. Wang S, Qi H, Kong L, Nu C. Combination of VSM and Jaccard coefficient for external plagiarism detection. In: 2013 International Conference on Machine Learning and Cybernetics, vol 4. Tianjin: IEEE: Tianjin, 2013. pp. 1880–85.
46.
go back to reference Ekbal A, Saha S, and Choudhary S. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 2012 12th International Conference on. Pune: IEEE; 2012. pp. 366–71. Ekbal A, Saha S, and Choudhary S. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 2012 12th International Conference on. Pune: IEEE; 2012. pp. 366–71.
47.
go back to reference Grman J, Ravas R. Improved implementation for finding text similarities in large collections of data. Proc PAN at CLEF conference in Amsterdam, The Netherlands. 2011;4(4):339–365. Grman J, Ravas R. Improved implementation for finding text similarities in large collections of data. Proc PAN at CLEF conference in Amsterdam, The Netherlands. 2011;4(4):339–365.
Metadata
Title
A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources
Authors
Mansi Sahi
Vishal Gupta
Publication date
22-08-2017
Publisher
Springer US
Published in
Cognitive Computation / Issue 6/2017
Print ISSN: 1866-9956
Electronic ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-017-9502-4

Other articles of this Issue 6/2017

Cognitive Computation 6/2017 Go to the issue

Premium Partner