Skip to main content
Erschienen in: Discover Computing 4-5/2021

14.07.2021

Using word semantic concepts for plagiarism detection in text documents

verfasst von: Chia-Yang Chang, Shie-Jue Lee, Chih-Hung Wu, Chih-Feng Liu, Ching-Kuan Liu

Erschienen in: Discover Computing | Ausgabe 4-5/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Plagiarism is a common problem in the modern age. With the advance of Internet, it is more and more convenient to access other people’s writings or publications. When someone uses the content of a text in an undesirable way, plagiarism may occur. Plagiarism infringes the intellectual property rights, so it is a serious problem nowadays. However, detecting plagiarism effectively is a challenging work. Traditional methods, like vector space model or bag-of-words, are short of providing a good solution due to the incapability of handling the semantics of words satisfactorily. In this paper, we propose a new method for plagiarism detection. We use Word2vec to transform the words into word vectors which are able to reveal the semantic relationship among different words. Through word vectors, words are clustered into concepts. Then documents and their paragraphs are represented in terms of concepts, and plagiarism detection can be done more effectively. A number of experiments are conducted to demonstrate the good performance of our proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abdi, A., Idris, N., Alguliyev, R. M., & Aliguliyev, R. M. (2015). PDLK: Plagiarism detection using linguistic knowledge. Expert Systems with Applications, 42, 8936–8946.CrossRef Abdi, A., Idris, N., Alguliyev, R. M., & Aliguliyev, R. M. (2015). PDLK: Plagiarism detection using linguistic knowledge. Expert Systems with Applications, 42, 8936–8946.CrossRef
Zurück zum Zitat Alotaibi, N., & Joy, M. (2020). Using sentence embedding for cross-language In lecture notes in computer science plagiarism detection. Berlin: Springer. Alotaibi, N., & Joy, M. (2020). Using sentence embedding for cross-language In lecture notes in computer science plagiarism detection. Berlin: Springer.
Zurück zum Zitat Alvarez-Carmona, M. A., Franco-Salvador, M., Montes-y Gómez, M., Rosso, P., Villasenor-Pineda, L., & Villatoro-Tello, E. (2018). Semantically-informed distance and similarity measures for paraphrase plagiarism identification. Journal of Intelligent & Fuzzy Systems, 34(5), 2983–2990.CrossRef Alvarez-Carmona, M. A., Franco-Salvador, M., Montes-y Gómez, M., Rosso, P., Villasenor-Pineda, L., & Villatoro-Tello, E. (2018). Semantically-informed distance and similarity measures for paraphrase plagiarism identification. Journal of Intelligent & Fuzzy Systems, 34(5), 2983–2990.CrossRef
Zurück zum Zitat Alzahrani, S., & Salim, N. (2010). Fuzzy semantic-based string similarity for extrinsic plagiarism detection. In Lab Report for PAN at CLEF 2010 - Conference and Labs of the Evaluation Forum CLEF (pp. 22–23). Alzahrani, S., & Salim, N. (2010). Fuzzy semantic-based string similarity for extrinsic plagiarism detection. In Lab Report for PAN at CLEF 2010 - Conference and Labs of the Evaluation Forum CLEF (pp. 22–23).
Zurück zum Zitat Alzahrani, S. M., Salim, N., & Abraham, A. (2012). Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(2), 133–149.CrossRef Alzahrani, S. M., Salim, N., & Abraham, A. (2012). Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(2), 133–149.CrossRef
Zurück zum Zitat Baba, K., Nakatoh, T., & Minami, T. (2017). Plagiarism detection using document similarity based on distributed representation. Procedia Computer Science, 111, 382–387.CrossRef Baba, K., Nakatoh, T., & Minami, T. (2017). Plagiarism detection using document similarity based on distributed representation. Procedia Computer Science, 111, 382–387.CrossRef
Zurück zum Zitat Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search (2nd ed.). New York: ACM press. Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search (2nd ed.). New York: ACM press.
Zurück zum Zitat Barrón-Cedeño, A., Rosso, P. & Benedí, J.-M. (2009). Reducing the plagiarism detection search space on the basis of the Kullback-Leibler distance. In Proceedings of International conference on intelligent text processing and computational linguistics (pp. 523–534). Springer. Barrón-Cedeño, A., Rosso, P. & Benedí, J.-M. (2009). Reducing the plagiarism detection search space on the basis of the Kullback-Leibler distance. In Proceedings of International conference on intelligent text processing and computational linguistics (pp. 523–534). Springer.
Zurück zum Zitat Blair, D. C., & Maron, M. E. (1985). An evaluation of retrieval effectiveness for a full-text document-retrieval system. Communications of the ACM, 28(3), 289–299.CrossRef Blair, D. C., & Maron, M. E. (1985). An evaluation of retrieval effectiveness for a full-text document-retrieval system. Communications of the ACM, 28(3), 289–299.CrossRef
Zurück zum Zitat Brin, S., Davis, J., & Garcia-Molina, H. (1995). Copy detection mechanisms for digital documents. ACM SIGMOD Record, 24(2), 398–409.CrossRef Brin, S., Davis, J., & Garcia-Molina, H. (1995). Copy detection mechanisms for digital documents. ACM SIGMOD Record, 24(2), 398–409.CrossRef
Zurück zum Zitat Campbell, D., Chen, W. & Smith, R. (2000). Copy detection systems for digital documents. In Proceedings IEEE Advances in Digital Libraries 2000 (pp. 78–88). IEEE. Campbell, D., Chen, W. & Smith, R. (2000). Copy detection systems for digital documents. In Proceedings IEEE Advances in Digital Libraries 2000 (pp. 78–88). IEEE.
Zurück zum Zitat Ceglarek, D. (2013). Evaluation of the SHAPD2 algorithm efficiency in plagiarism detection tasks. In 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE) (pp. 465–470). Ceglarek, D. (2013). Evaluation of the SHAPD2 algorithm efficiency in plagiarism detection tasks. In 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE) (pp. 465–470).
Zurück zum Zitat Chacko, A. M. (2018). A comprehensive review on question answering systems. IOSR Journal of Engineering, 8(4), 18–21. Chacko, A. M. (2018). A comprehensive review on question answering systems. IOSR Journal of Engineering, 8(4), 18–21.
Zurück zum Zitat Chow, T. W. S., & Rahman, M. K. M. (2009). Multilayer SOM with tree-structured data for efficient document retrieval and plagiarism detection. IEEE Transactions on Neural Networks, 20(9), 1385–1402.CrossRef Chow, T. W. S., & Rahman, M. K. M. (2009). Multilayer SOM with tree-structured data for efficient document retrieval and plagiarism detection. IEEE Transactions on Neural Networks, 20(9), 1385–1402.CrossRef
Zurück zum Zitat Clough, P. (2000). Plagiarism in natural and programming languages: an overview of current tools and technologies Department of Computer Science, University of Sheffield. Sheffield: Tech. rep. Clough, P. (2000). Plagiarism in natural and programming languages: an overview of current tools and technologies Department of Computer Science, University of Sheffield. Sheffield: Tech. rep.
Zurück zum Zitat Deepa, G., Vani, K., & Leema, L. M. (2016). Plagiarism detection in text documents using sentence bounded stop word n-grams. Journal of Engineering Science and Technology, 11(10), 1403–1420. Deepa, G., Vani, K., & Leema, L. M. (2016). Plagiarism detection in text documents using sentence bounded stop word n-grams. Journal of Engineering Science and Technology, 11(10), 1403–1420.
Zurück zum Zitat Deerwester, S. (1988). Improving information retrieval with latent semantic indexing. In Proceedings of the 51st Annual Meeting of the American Society for Information Science (vol. 25, pp. 36–40). Deerwester, S. (1988). Improving information retrieval with latent semantic indexing. In Proceedings of the 51st Annual Meeting of the American Society for Information Science (vol. 25, pp. 36–40).
Zurück zum Zitat Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805v2 [cs.CL]. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:​1810.​04805v2 [cs.CL].
Zurück zum Zitat Dhillon, I. S., & Modha, D. S. (2012). Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1/2), 143–175.MATHCrossRef Dhillon, I. S., & Modha, D. S. (2012). Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1/2), 143–175.MATHCrossRef
Zurück zum Zitat Erfaneh, G., Veisi, H., Bijari, K., & Zahirnia, K. (2018). A fast multi-level plagiarism detection method based In lecture notes in computer science on document embedding representation. Berlin: Springer. Erfaneh, G., Veisi, H., Bijari, K., & Zahirnia, K. (2018). A fast multi-level plagiarism detection method based In lecture notes in computer science on document embedding representation. Berlin: Springer.
Zurück zum Zitat Fellows, M. R., Guo, J., Komusiewicz, C., Niedermeier, R., & Uhlmann, J. (2011). Graph-based data clustering with overlaps. Discrete Optimization, 8(1), 2–17.MathSciNetMATHCrossRef Fellows, M. R., Guo, J., Komusiewicz, C., Niedermeier, R., & Uhlmann, J. (2011). Graph-based data clustering with overlaps. Discrete Optimization, 8(1), 2–17.MathSciNetMATHCrossRef
Zurück zum Zitat Franco-Salvador, M., Rosso, P., & Montes-y Gómez, M. (2016). A systematic study of knowledge graph analysis for cross-language plagiarism detection. Information Processing & Management, 52(4), 550–570.CrossRef Franco-Salvador, M., Rosso, P., & Montes-y Gómez, M. (2016). A systematic study of knowledge graph analysis for cross-language plagiarism detection. Information Processing & Management, 52(4), 550–570.CrossRef
Zurück zum Zitat Gagolewski, M., Bartoszuk, M., & Cena, A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363, 8–23.CrossRef Gagolewski, M., Bartoszuk, M., & Cena, A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363, 8–23.CrossRef
Zurück zum Zitat Gipp, B. (2014). Citation-based plagiarism detection. New York: Springer Vieweg Research.CrossRef Gipp, B. (2014). Citation-based plagiarism detection. New York: Springer Vieweg Research.CrossRef
Zurück zum Zitat Gonzalez-Agirre, A. (2017). Computational models for semantic textual similarity. Ph.D. thesis, Department of Computer Languages and Systems, University of the Basque Country. Gonzalez-Agirre, A. (2017). Computational models for semantic textual similarity. Ph.D. thesis, Department of Computer Languages and Systems, University of the Basque Country.
Zurück zum Zitat Hedar, A.-R., Ibrahim, A.-M.M., Abdel-Hakim, A. E., & SewisyDhillon, A. A. (2018). K-means cloning: Adaptive spherical k-means clustering. Algorithms, 11(151), 1–21.MathSciNetMATH Hedar, A.-R., Ibrahim, A.-M.M., Abdel-Hakim, A. E., & SewisyDhillon, A. A. (2018). K-means cloning: Adaptive spherical k-means clustering. Algorithms, 11(151), 1–21.MathSciNetMATH
Zurück zum Zitat Henzinger, M. (2006). Finding near-duplicate web pages: a large-scale evaluation of algorithms. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 284–291). ACM. Henzinger, M. (2006). Finding near-duplicate web pages: a large-scale evaluation of algorithms. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 284–291). ACM.
Zurück zum Zitat Jadalla, A., & Elnagar, A. (2012). Iqtebas 1.0: A fingerprinting-based plagiarism detection system for arabic text-based documents. International Journal on Data Mining and Intelligent Information Technology Applications, 2(2), 31–43.CrossRef Jadalla, A., & Elnagar, A. (2012). Iqtebas 1.0: A fingerprinting-based plagiarism detection system for arabic text-based documents. International Journal on Data Mining and Intelligent Information Technology Applications, 2(2), 31–43.CrossRef
Zurück zum Zitat Jolliffe, I. T. (2002). Principal Component Analysis. New York: Springer-Verlag.MATH Jolliffe, I. T. (2002). Principal Component Analysis. New York: Springer-Verlag.MATH
Zurück zum Zitat Kadhim, N. J., & Mohammed, M. T. (2019). VSM based models and integration of exact and fuzzy similarity for improving detection of external textual plagiarism. Journal of Mechanics of Continua and Mathematical Sciences, 14(3), 555–578. Kadhim, N. J., & Mohammed, M. T. (2019). VSM based models and integration of exact and fuzzy similarity for improving detection of external textual plagiarism. Journal of Mechanics of Continua and Mathematical Sciences, 14(3), 555–578.
Zurück zum Zitat Kasprzak, J. & Brandejs, M. (2010). Improving the reliability of the plagiarism detection system. In Lab Report for PAN at CLEF 2010 - Conference and Labs of the Evaluation Forum CLEF. Kasprzak, J. & Brandejs, M. (2010). Improving the reliability of the plagiarism detection system. In Lab Report for PAN at CLEF 2010 - Conference and Labs of the Evaluation Forum CLEF.
Zurück zum Zitat Kuznetsov, M., Motrenko, A., Kuznetsova, R. & Strijov, V. (2016). Methods for intrinsic plagiarism detection and authordiarization. In Working Notes for PAN at CLEF 2016 - Conference and Labs of the Evaluation Forum (pp. 912–919). Kuznetsov, M., Motrenko, A., Kuznetsova, R. & Strijov, V. (2016). Methods for intrinsic plagiarism detection and authordiarization. In Working Notes for PAN at CLEF 2016 - Conference and Labs of the Evaluation Forum (pp. 912–919).
Zurück zum Zitat Leung, C. H., & Cheng, S. C. L. (2017). An instructional approach to practical solutions for plagiarism. Universal Journal of Educational Research, 5(9), 1646–1652.CrossRef Leung, C. H., & Cheng, S. C. L. (2017). An instructional approach to practical solutions for plagiarism. Universal Journal of Educational Research, 5(9), 1646–1652.CrossRef
Zurück zum Zitat Luo, S., Zhang, C., Zhang, W. & Cao, X. (2018). Consistent and specific multi-view subspace clustering. In Proceedings of 32nd AAAI Conference on Artificial Intelligence (pp. 3730–3737). Luo, S., Zhang, C., Zhang, W. & Cao, X. (2018). Consistent and specific multi-view subspace clustering. In Proceedings of 32nd AAAI Conference on Artificial Intelligence (pp. 3730–3737).
Zurück zum Zitat Mahmoud, A., Zrigui, A., & Zrigui, M. (2017). A text semantic similarity approach for arabic paraphrase detection In Lecture Notes in Computer Science. Berlin: Springer. Mahmoud, A., Zrigui, A., & Zrigui, M. (2017). A text semantic similarity approach for arabic paraphrase detection In Lecture Notes in Computer Science. Berlin: Springer.
Zurück zum Zitat Marti, M. A., Barrón-Cedeño, A., Vila, M., & Rosso, P. (2013). Plagiarism meets paraphrasing: Insights for the next generation in automatic plagiarism detection. Computational Linguistics, 39(4), 917–947.CrossRef Marti, M. A., Barrón-Cedeño, A., Vila, M., & Rosso, P. (2013). Plagiarism meets paraphrasing: Insights for the next generation in automatic plagiarism detection. Computational Linguistics, 39(4), 917–947.CrossRef
Zurück zum Zitat Meuschke, N., Schubotz, M., Hamborg, F., Skopal, T. & Gipp, B. (2017). Analyzing mathematical content to detect academic plagiarism. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (pp. 2211–2214). ACM. Meuschke, N., Schubotz, M., Hamborg, F., Skopal, T. & Gipp, B. (2017). Analyzing mathematical content to detect academic plagiarism. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (pp. 2211–2214). ACM.
Zurück zum Zitat Meyer zu Eissen, S. & Stein, B. (2006). Intrinsic plagiarism detection. In Proceedings of 28th European Conference on IR Research (pp. 565–569). Meyer zu Eissen, S. & Stein, B. (2006). Intrinsic plagiarism detection. In Proceedings of 28th European Conference on IR Research (pp. 565–569).
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546v1 [cs.CL]. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:​1310.​4546v1 [cs.CL].
Zurück zum Zitat Monostori, K., Zaslavsky, A. & Schmidt, H. (2000). Document overlap detection system for distributed digital libraries. In Proceedings of the fifth ACM conference on Digital Libraries (pp. 226–227). Monostori, K., Zaslavsky, A. & Schmidt, H. (2000). Document overlap detection system for distributed digital libraries. In Proceedings of the fifth ACM conference on Digital Libraries (pp. 226–227).
Zurück zum Zitat Muhr, M., Zechner, M., Kern, R., & Granitzer, M. (2009). External and intrinsic plagiarism detection using vector space models. CEUR Workshop Proceedings., 502, 47–55. Muhr, M., Zechner, M., Kern, R., & Granitzer, M. (2009). External and intrinsic plagiarism detection using vector space models. CEUR Workshop Proceedings., 502, 47–55.
Zurück zum Zitat Naawab, R. M. A., Stevenson, M., & Clough, P. (2016). An ir-based approach utilizing query expansion for plagiarism detection in MEDLINE. IEEE Transactions on Computational Biology and Bioinformatics, 14(4), 796–804.CrossRef Naawab, R. M. A., Stevenson, M., & Clough, P. (2016). An ir-based approach utilizing query expansion for plagiarism detection in MEDLINE. IEEE Transactions on Computational Biology and Bioinformatics, 14(4), 796–804.CrossRef
Zurück zum Zitat Pennington, J., Socher, R. & Manning, C. (Oct. 2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Association for Computational Linguistics, Doha, Qatar. https://www.aclweb.org/anthology/D14-1162. Pennington, J., Socher, R. & Manning, C. (Oct. 2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Association for Computational Linguistics, Doha, Qatar. https://​www.​aclweb.​org/​anthology/​D14-1162.
Zurück zum Zitat Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B. & Rosso, P. (2010a). Overview of the 2nd international competition on plagiarism detection. In Notebook Papers of CLEF 2010 LABs and Workshops. Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B. & Rosso, P. (2010a). Overview of the 2nd international competition on plagiarism detection. In Notebook Papers of CLEF 2010 LABs and Workshops.
Zurück zum Zitat Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., & Rosso, P. (2009). Overview of the 1st international competition on plagiarism detection. CEUR Workshop Proceedings., 502, 1–9. Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., & Rosso, P. (2009). Overview of the 1st international competition on plagiarism detection. CEUR Workshop Proceedings., 502, 1–9.
Zurück zum Zitat Potthast, M., Stein, B., no, A. B.-C. & Rosso, P. (2010b). An evaluation framework for plagiarism detection. In Proceedings of 23rd International Conference on Computational Linguistics (pp. 997–1005). Potthast, M., Stein, B., no, A. B.-C. & Rosso, P. (2010b). An evaluation framework for plagiarism detection. In Proceedings of 23rd International Conference on Computational Linguistics (pp. 997–1005).
Zurück zum Zitat Pratap, R., Deshmukh, A., Nair, P., & Dutt, T. (2018). A faster sampling algorithm for spherical k-means. Proceedings of Machine Learning Research - Asian Conference on Machine Learning., 95, 343–358. Pratap, R., Deshmukh, A., Nair, P., & Dutt, T. (2018). A faster sampling algorithm for spherical k-means. Proceedings of Machine Learning Research - Asian Conference on Machine Learning., 95, 343–358.
Zurück zum Zitat Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389.CrossRef Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389.CrossRef
Zurück zum Zitat Sánchez-Vega, F., Villatoro-Tello, E., y Gómez, M. M., Rosso, P., Stamatatos, E., & Pineda, L. V. (2017). Paraphrase plagiarism identification with character-level features. Pattern Analysis and Applications, 22, 669–681.MathSciNetCrossRef Sánchez-Vega, F., Villatoro-Tello, E., y Gómez, M. M., Rosso, P., Stamatatos, E., & Pineda, L. V. (2017). Paraphrase plagiarism identification with character-level features. Pattern Analysis and Applications, 22, 669–681.MathSciNetCrossRef
Zurück zum Zitat Sarmiento, A., Fondón, I., Durán-Díaz, I., & Cruces, S. (2019). Centroid-based clustering with \(\alpha \beta \)-divergences. Entropy, 21(196). Sarmiento, A., Fondón, I., Durán-Díaz, I., & Cruces, S. (2019). Centroid-based clustering with \(\alpha \beta \)-divergences. Entropy, 21(196).
Zurück zum Zitat Sarrouti, M., & Alaoui, S. O. E. (2017). A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering. Journal of Biomedical Informatics, 68, 96–103.CrossRef Sarrouti, M., & Alaoui, S. O. E. (2017). A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering. Journal of Biomedical Informatics, 68, 96–103.CrossRef
Zurück zum Zitat Sattler, S., Wiegel, C., & Veen, Fv. (2017). The use frequency of 10 different methods for preventing and detecting academic dishonesty and the factors influencing their use. Studies in Higher Education, 42(6), 1126–1144.CrossRef Sattler, S., Wiegel, C., & Veen, Fv. (2017). The use frequency of 10 different methods for preventing and detecting academic dishonesty and the factors influencing their use. Studies in Higher Education, 42(6), 1126–1144.CrossRef
Zurück zum Zitat Schneider, J., Bernstein, A., vom Brocke, J., Damevski, K., & Shepherd, D. (2018). Detecting plagiarism based on the creation process. IEEE Transactions on Learning Technologies, 11(3), 348–361.CrossRef Schneider, J., Bernstein, A., vom Brocke, J., Damevski, K., & Shepherd, D. (2018). Detecting plagiarism based on the creation process. IEEE Transactions on Learning Technologies, 11(3), 348–361.CrossRef
Zurück zum Zitat Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., & Chanona-Hernández, L. (2014). Syntactic n-grams as machine learning features for natural language processing. Expert Systems with Applications, 41(3), 853–860.CrossRef Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., & Chanona-Hernández, L. (2014). Syntactic n-grams as machine learning features for natural language processing. Expert Systems with Applications, 41(3), 853–860.CrossRef
Zurück zum Zitat Stein, B., Koppel, M., & Stamatatos, E. (2007a). Plagiarism analysis, authorship identification, and near-duplicate detection pan’07. ACM SIGIR Forum, 41(2), 68–71.CrossRef Stein, B., Koppel, M., & Stamatatos, E. (2007a). Plagiarism analysis, authorship identification, and near-duplicate detection pan’07. ACM SIGIR Forum, 41(2), 68–71.CrossRef
Zurück zum Zitat Stein, B., Lipka, N., & Prettenhofer, P. (2011). Intrinsic plagiarism analysis. Language Resources and Evaluation, 45(1), 63–82.CrossRef Stein, B., Lipka, N., & Prettenhofer, P. (2011). Intrinsic plagiarism analysis. Language Resources and Evaluation, 45(1), 63–82.CrossRef
Zurück zum Zitat Stein, B., Meyer zu Eissen, S. & Potthast, M. (2007b). Strategies for retrieving plagiarized documents. In Proceedings of 30th Annual International ACM SIGIR Conference (pp. 825–826). ACM. Stein, B., Meyer zu Eissen, S. & Potthast, M. (2007b). Strategies for retrieving plagiarized documents. In Proceedings of 30th Annual International ACM SIGIR Conference (pp. 825–826). ACM.
Zurück zum Zitat Vysotska, V., Burov, Y., Lytvyn, V. & Demchuk, A. (2018). Defining author’s style for plagiarism detection in academic environment. In 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP) (pp. 128–133). IEEE. Vysotska, V., Burov, Y., Lytvyn, V. & Demchuk, A. (2018). Defining author’s style for plagiarism detection in academic environment. In 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP) (pp. 128–133). IEEE.
Zurück zum Zitat Waheeb, A., & Babu, A. P. (2016). Answer extraction and passage retrieval for questionanswering systems. International Journal of Advanced Research in Computer Engineering & Technology, 5(12), 2703–2706. Waheeb, A., & Babu, A. P. (2016). Answer extraction and passage retrieval for questionanswering systems. International Journal of Advanced Research in Computer Engineering & Technology, 5(12), 2703–2706.
Zurück zum Zitat Wang, T., Ren, C., Luo, Y., & Tian, J. (2019). NS-DBSCAN: A density-based clustering algorithm in network space. International Journal of Geo-Information, 8(218), 1–20. Wang, T., Ren, C., Luo, Y., & Tian, J. (2019). NS-DBSCAN: A density-based clustering algorithm in network space. International Journal of Geo-Information, 8(218), 1–20.
Zurück zum Zitat Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1–3), 37–52.CrossRef Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1–3), 37–52.CrossRef
Zurück zum Zitat Zhang, H., & Chow, T. W. (2011). A coarse-to-fine framework to efficiently thwart plagiarism. Pattern Recognition, 44(2), 471–487.CrossRef Zhang, H., & Chow, T. W. (2011). A coarse-to-fine framework to efficiently thwart plagiarism. Pattern Recognition, 44(2), 471–487.CrossRef
Metadaten
Titel
Using word semantic concepts for plagiarism detection in text documents
verfasst von
Chia-Yang Chang
Shie-Jue Lee
Chih-Hung Wu
Chih-Feng Liu
Ching-Kuan Liu
Publikationsdatum
14.07.2021
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 4-5/2021
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-021-09394-4

Premium Partner