Skip to main content
Top

2018 | OriginalPaper | Chapter

Reducing Computational Effort for Plagiarism Detection with Approximate String Matching

Authors : Tetsuya Nakatoh, Toshiro Minami

Published in: Recent Advances on Soft Computing and Data Mining

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Currently, a large number of documents are created as digital material and distributed world-wide. Digital materials are easy to publish and copy at a remarkably low cost. As a result, many documents are copied illegally, and this practice is spreading, making plagiarism a significant social issue. Therefore, the need to develop systems that detect plagiarism is very high. We have developed a new plagiarism detection method that compares documents by using approximate string matching to detect plagiarism. We have also developed a technique that reduces the computational time of the comparison method. In this paper, we demonstrate our proposed method’s usefulness through experiments and through the measuring indexes of precision and recall.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Lukashenko, R., Graudina, V., Grundspenkis, J.: Computer-based plagiarism detection methods and tools: an overview. In: Proceedings of the 2007 International Conference on Computer Systems and Technologies. ACM, 2007, pp. 1–6 Lukashenko, R., Graudina, V., Grundspenkis, J.: Computer-based plagiarism detection methods and tools: an overview. In: Proceedings of the 2007 International Conference on Computer Systems and Technologies. ACM, 2007, pp. 1–6
2.
go back to reference Nakatoh, T., Baba, K., Yamada, Y., Ikeda, D.: Partial plagiarism detection using string matching with mismatches. In: Proceedings of Informatics Engineering and Information Science, Springer CCIS254, pp. 265–272 (2011) Nakatoh, T., Baba, K., Yamada, Y., Ikeda, D.: Partial plagiarism detection using string matching with mismatches. In: Proceedings of Informatics Engineering and Information Science, Springer CCIS254, pp. 265–272 (2011)
3.
go back to reference Nakatoh, T., Baba, K., Yamada, Y., Ikeda, D.: Speed improvement of the plagiarism detection method. In: Proceedings of IIAI International Conference on Advanced Information Technologies 2013 (IIAI AIT 2013), 2013, P38.pdf Nakatoh, T., Baba, K., Yamada, Y., Ikeda, D.: Speed improvement of the plagiarism detection method. In: Proceedings of IIAI International Conference on Advanced Information Technologies 2013 (IIAI AIT 2013), 2013, P38.pdf
4.
go back to reference Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)CrossRefMATH Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)CrossRefMATH
5.
go back to reference Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Inc., New York, NY, USA (1994)MATH Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Inc., New York, NY, USA (1994)MATH
6.
go back to reference Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific Publishing Company (2002) Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific Publishing Company (2002)
7.
go back to reference Fischer, M.J., Paterson, M.S.: String-matching and other products. In Proceedings of the SIAM-AMS Applied Mathematics Symposium. Massachusetts Institute of Technology Cambridge, MA, USA, pp. 113–125 (1974) Fischer, M.J., Paterson, M.S.: String-matching and other products. In Proceedings of the SIAM-AMS Applied Mathematics Symposium. Massachusetts Institute of Technology Cambridge, MA, USA, pp. 113–125 (1974)
8.
9.
go back to reference Baba, K., Shinohara, A., Takeda, M., Inenaga, S., Arikawa, S.: A note on randomized algorithm for string matching with mismatches. Nordic J. Comput. 10, 2–12 (2003)MathSciNetMATH Baba, K., Shinohara, A., Takeda, M., Inenaga, S., Arikawa, S.: A note on randomized algorithm for string matching with mismatches. Nordic J. Comput. 10, 2–12 (2003)MathSciNetMATH
10.
go back to reference Nakatoh, T., Baba, K., Ikeda, D., Yamada, Y., Hirokawa, S.: An efficient mapping for computing the score of string matching. J. Automata Lang. Combinat. 10(5/6), 697–704 (2005)MathSciNetMATH Nakatoh, T., Baba, K., Ikeda, D., Yamada, Y., Hirokawa, S.: An efficient mapping for computing the score of string matching. J. Automata Lang. Combinat. 10(5/6), 697–704 (2005)MathSciNetMATH
11.
go back to reference Schoenmeyr, T., Zhang, D.Y.: FFT-based algorithms for the string matching with mismatches problem. J. Algorithms 57(2), 130–139 (2005)MathSciNetCrossRefMATH Schoenmeyr, T., Zhang, D.Y.: FFT-based algorithms for the string matching with mismatches problem. J. Algorithms 57(2), 130–139 (2005)MathSciNetCrossRefMATH
12.
go back to reference Baba, K., Tanaka, Y., Nakatoh, T., Shinohara, A.: A generalization of FFT algorithm for string matching. In: Proceedings of International Symposium on Information Science and Electrical Engineering, pp. 191–194 (2003) Baba, K., Tanaka, Y., Nakatoh, T., Shinohara, A.: A generalization of FFT algorithm for string matching. In: Proceedings of International Symposium on Information Science and Electrical Engineering, pp. 191–194 (2003)
13.
go back to reference Nakatoh, T., Baba, K., Ikeda, D., Mori, M., Hirokawa, S.: Accuracy evaluation of FFT-based randomized algorithms for string matching with mismatches (in Japanese). IPSJ Trans. Databases (TOD) 2(4), 24–31 (2009) Nakatoh, T., Baba, K., Ikeda, D., Mori, M., Hirokawa, S.: Accuracy evaluation of FFT-based randomized algorithms for string matching with mismatches (in Japanese). IPSJ Trans. Databases (TOD) 2(4), 24–31 (2009)
14.
go back to reference Potthast, M., Gollub, T., Hagen, M., Graßegger, J., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2012 Evaluation Labs, eds. by Forner, P., Karlgren, J., Womser-Hacker, C., Sept. 2012 Potthast, M., Gollub, T., Hagen, M., Graßegger, J., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2012 Evaluation Labs, eds. by Forner, P., Karlgren, J., Womser-Hacker, C., Sept. 2012
15.
go back to reference Potthast, M., Gollub, T., Hagen, M., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 5th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2013 Evaluation Labs, eds. by Forner, P., Navigli, R., Tufis, D. Sep. 2013 Potthast, M., Gollub, T., Hagen, M., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 5th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2013 Evaluation Labs, eds. by Forner, P., Navigli, R., Tufis, D. Sep. 2013
16.
go back to reference Potthast, M., Gollub, T., Hagen, M., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 6th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2014 Evaluation Labs, ser. CEUR Workshop Proceedings, L. Cappellato, N. Ferro, M. Halvey, W. Kraaij, Eds. CLEF and CEUR-WS.org, Sep. 2014 Potthast, M., Gollub, T., Hagen, M., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 6th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2014 Evaluation Labs, ser. CEUR Workshop Proceedings, L. Cappellato, N. Ferro, M. Halvey, W. Kraaij, Eds. CLEF and CEUR-WS.org, Sep. 2014
17.
go back to reference L. Kong, H. Qi, S. Wang, C. Du, S. Wang, and Y. Han, “Approaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection-Notebook for PAN at CLEF 2012,” in CLEF 2012 Evaluation Labs and Workshop - Working Notes Papers, 17–20 September, Rome, Italy, P. Forner, J. Karlgren, and C. Womser-Hacker, Eds., Sep. 2012 L. Kong, H. Qi, S. Wang, C. Du, S. Wang, and Y. Han, “Approaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection-Notebook for PAN at CLEF 2012,” in CLEF 2012 Evaluation Labs and Workshop - Working Notes Papers, 17–20 September, Rome, Italy, P. Forner, J. Karlgren, and C. Womser-Hacker, Eds., Sep. 2012
18.
go back to reference Rodríguez Torrejon, D., Martín Ramos, J.: Text alignment module in CoReMo 2.1 plagiarism detector-notebook for PAN at CLEF 2013. In: CLEF 2013 Evaluation Labs and Workshop Working Notes Papers, 23–26 September, Valencia, Spain, eds. by Forner, P., Navigli, R., Tufis, D. Sept. 2013 Rodríguez Torrejon, D., Martín Ramos, J.: Text alignment module in CoReMo 2.1 plagiarism detector-notebook for PAN at CLEF 2013. In: CLEF 2013 Evaluation Labs and Workshop Working Notes Papers, 23–26 September, Valencia, Spain, eds. by Forner, P., Navigli, R., Tufis, D. Sept. 2013
19.
go back to reference Sanchez-Perez, M., Sidorov, G., Gelbukh, A.: A winning approach to text alignment for text reuse detection at PAN 2014 Notebook for PAN at CLEF 2014. In: CLEF 2014 Evaluation Labs and Workshop—Working Notes Papers, 15–18 September, Sheffield, UK, ser. CEUR Workshop Proceedings, eds. by Cappellato, L., Ferro, N., Halvey, M., Kraaij, W., CEUR-WS.org, Sept. 2014 Sanchez-Perez, M., Sidorov, G., Gelbukh, A.: A winning approach to text alignment for text reuse detection at PAN 2014 Notebook for PAN at CLEF 2014. In: CLEF 2014 Evaluation Labs and Workshop—Working Notes Papers, 15–18 September, Sheffield, UK, ser. CEUR Workshop Proceedings, eds. by Cappellato, L., Ferro, N., Halvey, M., Kraaij, W., CEUR-WS.org, Sept. 2014
Metadata
Title
Reducing Computational Effort for Plagiarism Detection with Approximate String Matching
Authors
Tetsuya Nakatoh
Toshiro Minami
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-72550-5_41

Premium Partner