Skip to main content

2020 | OriginalPaper | Buchkapitel

A Robust Approach to Plagiarism Detection in Handwritten Documents

verfasst von : Om Pandey, Ishan Gupta, Bhabani S. P. Mishra

Erschienen in: Advances in Visual Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Plagiarism detection is a widely used technique to uniquely identify quality of work. We address in this paper, the problem of predicting similarities amongst a collection of documents. This technique has widespread uses in academic institutions. In this paper, we propose a simple yet effective method for detection of plagiarism by using a robust word detection and segmentation procedure followed by a convolution neural network (CNN)—Bi-directional Long Short Term Memory (biLSTM) pipeline to extract the text. Our approach also extract and encodes common patterns like scratches in handwriting for improving accuracy on real-world use cases. The extracted information from multiple documents using comparison metrics are used to find the documents which have been plagiarized from a source. Extensive experiments in our research show that this approach may help simplify the examining process and can act as a cheap viable alternative to many modern approaches used to detect plagiarism from handwritten documents.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Tripathi, R., Tiwari, P., Nithyanandam, K.: Avoiding plagiarism in research through free online plagiarism tools. In: 4th International Symposium on Emerging Trends and Technologies in Libraries and Information Services, pp. 275–280 (2015) Tripathi, R., Tiwari, P., Nithyanandam, K.: Avoiding plagiarism in research through free online plagiarism tools. In: 4th International Symposium on Emerging Trends and Technologies in Libraries and Information Services, pp. 275–280 (2015)
2.
Zurück zum Zitat Rath, T.M., Manmatha, R.: Word spotting for historical documents. IJDAR (2007) Rath, T.M., Manmatha, R.: Word spotting for historical documents. IJDAR (2007)
3.
Zurück zum Zitat Rodriguez-Serrano, J.A., Perronnin, F.: A model-based sequence similarity with application to handwritten word spotting. PAMI (2012) Rodriguez-Serrano, J.A., Perronnin, F.: A model-based sequence similarity with application to handwritten word spotting. PAMI (2012)
4.
Zurück zum Zitat Rusinol, M., Aldavert, D., Toledo, R., Llados, J.: Efficient segmentation-free keyword spotting in historical document collections. PR (2015) Rusinol, M., Aldavert, D., Toledo, R., Llados, J.: Efficient segmentation-free keyword spotting in historical document collections. PR (2015)
5.
Zurück zum Zitat Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRef Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRef
6.
Zurück zum Zitat Potthast, M., et al.: Overview of the 6th International Competition on Plagiarism Detection. In: CLEF (2014) Potthast, M., et al.: Overview of the 6th International Competition on Plagiarism Detection. In: CLEF (2014)
7.
Zurück zum Zitat Gandhi, A., Jawahar, C.V.: Detection of cut-and-paste in document images. In: ICDAR (2013) Gandhi, A., Jawahar, C.V.: Detection of cut-and-paste in document images. In: ICDAR (2013)
8.
Zurück zum Zitat Krishnan, P., Jawahar, C.V.: Matching handwritten document images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol. 9905. Springer, Cham, Switzerland (2016) Krishnan, P., Jawahar, C.V.: Matching handwritten document images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol. 9905. Springer, Cham, Switzerland (2016)
9.
Zurück zum Zitat Jiao, L., et al.: A survey of deep learning-based object detection. IEEE Access (2019) Jiao, L., et al.: A survey of deep learning-based object detection. IEEE Access (2019)
10.
Zurück zum Zitat Wise, M.J.: YAP3: improved detection of similarities in computer program and other texts. In: Proceedings of SIGCSE’96 Technical Symposium (1996) Wise, M.J.: YAP3: improved detection of similarities in computer program and other texts. In: Proceedings of SIGCSE’96 Technical Symposium (1996)
11.
Zurück zum Zitat Batomalaque, M.B., Camacho, C.M.R., Dalida, M.J.P., Delmo, J.A.B.: Image to text conversion technique for anti-plagiarism system. In: International Journal of Advanced Science and Convergence (2019) Batomalaque, M.B., Camacho, C.M.R., Dalida, M.J.P., Delmo, J.A.B.: Image to text conversion technique for anti-plagiarism system. In: International Journal of Advanced Science and Convergence (2019)
12.
Zurück zum Zitat Gitchell, D., Tran, N.: Sim: A utility for detecting similarity in computer programs. In: Proceedings of the 30th SIGCSE Technical Symposium on Computer Science Education (1999) Gitchell, D., Tran, N.: Sim: A utility for detecting similarity in computer programs. In: Proceedings of the 30th SIGCSE Technical Symposium on Computer Science Education (1999)
13.
Zurück zum Zitat Zhao, Z.Q., Zheng, P., Zheng, P., Xu, S.T., Wu, X.: Object detection with deep learning: A review. IEEE Trans. Neural. Netw. Learn. Syst. 30(11), 3212–3232 (2019)CrossRef Zhao, Z.Q., Zheng, P., Zheng, P., Xu, S.T., Wu, X.: Object detection with deep learning: A review. IEEE Trans. Neural. Netw. Learn. Syst. 30(11), 3212–3232 (2019)CrossRef
14.
Zurück zum Zitat Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
15.
Zurück zum Zitat Liu, W., et al.: Ssd: Single shot multibox detector. In: ECCV (2016) Liu, W., et al.: Ssd: Single shot multibox detector. In: ECCV (2016)
17.
Zurück zum Zitat Xu, L., Ren, J., Liu, C., Jia, J.: Deep Convolutional Neural Network for Image Deconvolution. In: NIPS (2014) Xu, L., Ren, J., Liu, C., Jia, J.: Deep Convolutional Neural Network for Image Deconvolution. In: NIPS (2014)
18.
Zurück zum Zitat Ding, Z., Xia, R., Yu, J., Li, X., Yang, J.: Densely connected bidirectional lstm with applications to sentence classification. In: CCF International Conference on Natural Language Processing and Chinese Computing, Springer, Cham (2018) Ding, Z., Xia, R., Yu, J., Li, X., Yang, J.: Densely connected bidirectional lstm with applications to sentence classification. In: CCF International Conference on Natural Language Processing and Chinese Computing, Springer, Cham (2018)
19.
Zurück zum Zitat Loper, E., Bird, S.: NLTK: The Natural Language ToolKit. In: ETMTNLP’02 (2002) Loper, E., Bird, S.: NLTK: The Natural Language ToolKit. In: ETMTNLP’02 (2002)
22.
Zurück zum Zitat Marti, U., Bunke, H., Bunke, H.: The IAM-database: An english sentence database for off-line handwriting recognition. IJDAR 5 , 39–46 (2002)CrossRef Marti, U., Bunke, H., Bunke, H.: The IAM-database: An english sentence database for off-line handwriting recognition. IJDAR 5 , 39–46 (2002)CrossRef
23.
Zurück zum Zitat Poznanski, A., Wolf, L.: Cnn-n-gram for handwriting word recognition in CVPR (2016) Poznanski, A., Wolf, L.: Cnn-n-gram for handwriting word recognition in CVPR (2016)
24.
Zurück zum Zitat Castro, D., Bezerra, B.L.D., Valena, M.: Boosting the deep multidimensional long-short-term memory network for handwritten recognition systems. In: ICFHR (2018) Castro, D., Bezerra, B.L.D., Valena, M.: Boosting the deep multidimensional long-short-term memory network for handwritten recognition systems. In: ICFHR (2018)
25.
Zurück zum Zitat Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. ICDAR (2017) Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. ICDAR (2017)
26.
Zurück zum Zitat Voigtlaender, P., Doetsch, P., Ney, H.: Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. ICFHR (2016) Voigtlaender, P., Doetsch, P., Ney, H.: Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. ICFHR (2016)
27.
Zurück zum Zitat Ingle, R., Fujii, Y., Deselaers, T., Baccash, J., Popat, A.C.: A Scalable Handwritten Text Recognition System Google Research (2019) Ingle, R., Fujii, Y., Deselaers, T., Baccash, J., Popat, A.C.: A Scalable Handwritten Text Recognition System Google Research (2019)
28.
Zurück zum Zitat Balci, B., Saadati, D., Shiferaw, D.: Handwritten Text Recognition using Deep Learning Stanford Edu. (2017) Balci, B., Saadati, D., Shiferaw, D.: Handwritten Text Recognition using Deep Learning Stanford Edu. (2017)
29.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. NIPS (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. NIPS (2013)
30.
Zurück zum Zitat Kingma, D.P., Ba, J.L.: Adam: A method for stochastic optimization (2014) Kingma, D.P., Ba, J.L.: Adam: A method for stochastic optimization (2014)
31.
Zurück zum Zitat Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetMATH Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetMATH
32.
Zurück zum Zitat Lahitani, A.R., Permanasari, A.E., Setiawan, N.A.: Cosine similarity to determine similarity measure. In: ICIT (2016) Lahitani, A.R., Permanasari, A.E., Setiawan, N.A.: Cosine similarity to determine similarity measure. In: ICIT (2016)
Metadaten
Titel
A Robust Approach to Plagiarism Detection in Handwritten Documents
verfasst von
Om Pandey
Ishan Gupta
Bhabani S. P. Mishra
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-64559-5_54

Premium Partner