Skip to main content
Top

2021 | OriginalPaper | Chapter

A Deep Network Model for Paraphrase Detection in Punjabi

Authors : Arwinder Singh, Gurpreet Singh Josan

Published in: Recent Innovations in Computing

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Paraphrase refers to the text which tells the same meanings but with different expressions. It is important in NLP as it deals with many applications such as information retrieval, information extraction, machine translation, query expansion, question answering, summarization and plagiarism. Paraphrase detection is to find that given two texts are semantically similar or not similar. Though paraphrase detection has wide literature, there is no proper algorithm for paraphrase detection in Punjabi language. A new paraphrase detection model for Punjabi language is developed in this paper. We use two deep learning methods to map sentences as vectors, and these vectors are further used to detect paraphrases. Despite other implementations of paraphrase detection, our model is simple and efficient to detect paraphrases. Qualitative and quantitative evaluations prove the efficiency of the model and can be applied to various NLP applications. The proposed model is trained on Quora’s question pair dataset which makes new directions for paraphrasing in Indian languages.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Achananuparp, P., Hu, X., Zhou, X., Zhang, X.: Utilizing sentence similarity and question type similarity to response to similar questions in knowledge-sharing community. In: Proceedings of QAWeb 2008 Workshop, Beijing, China (to appear, 2008) (2008). Achananuparp, P., Hu, X., Zhou, X., Zhang, X.: Utilizing sentence similarity and question type similarity to response to similar questions in knowledge-sharing community. In: Proceedings of QAWeb 2008 Workshop, Beijing, China (to appear, 2008) (2008).
2.
go back to reference Agarwal, B., Ramampiaro, H., Langseth, H., Ruocco, M.: A deep network model for paraphrase detection in short text messages. Inf. Process. Manag. 54, 922–937 (2018)CrossRef Agarwal, B., Ramampiaro, H., Langseth, H., Ruocco, M.: A deep network model for paraphrase detection in short text messages. Inf. Process. Manag. 54, 922–937 (2018)CrossRef
3.
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015)
4.
go back to reference Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Józefowicz, R., Bengio, S.: Generating sentences from a continuous space. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp. 10–21. ACL (2016) Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Józefowicz, R., Bengio, S.: Generating sentences from a continuous space. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp. 10–21. ACL (2016)
5.
go back to reference Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., Li, H.: Context-aware query suggestion by mining click-through and session data. In: KDD ’08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 875–883 (2008) Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., Li, H.: Context-aware query suggestion by mining click-through and session data. In: KDD ’08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 875–883 (2008)
6.
go back to reference Chatterjee, N., Mohan, S.: Extraction-based single-document summarization using random indexing. In: Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence, vol. 02, pp. 448–455. IEEE Computer Society (2007) Chatterjee, N., Mohan, S.: Extraction-based single-document summarization using random indexing. In: Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence, vol. 02, pp. 448–455. IEEE Computer Society (2007)
7.
go back to reference Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A.C., Bengio, Y.: A recurrent latent variable model for sequential data. Adv. Neural. Inf. Process. Syst. 28, 2980–2988 (2015) Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A.C., Bengio, Y.: A recurrent latent variable model for sequential data. Adv. Neural. Inf. Process. Syst. 28, 2980–2988 (2015)
8.
go back to reference Das, D., Smith, N.A.: Paraphrase identification as probabilistic quasi-synchronous recognition. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 468–476. Association for Computational Linguistics, Suntec, Singapore (2009) Das, D., Smith, N.A.: Paraphrase identification as probabilistic quasi-synchronous recognition. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 468–476. Association for Computational Linguistics, Suntec, Singapore (2009)
9.
go back to reference Deerwester, S.: Improving information retrieval with latent semantic indexing. In: Proceedings of the 51st Annual Meeting of the American Society for Information Science, vol. 25, pp. 36–40 (1988) Deerwester, S.: Improving information retrieval with latent semantic indexing. In: Proceedings of the 51st Annual Meeting of the American Society for Information Science, vol. 25, pp. 36–40 (1988)
10.
go back to reference Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(41), 391–407 (1990)CrossRef Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(41), 391–407 (1990)CrossRef
11.
go back to reference Gharavi, E., Bijari, K., Zahirnia, K., Veisi, H.: A deep learning approach to Persian plagiarism detection. In: Working notes of FIRE 2016—Forum for Information Retrieval Evaluation, Kolkata, India, December 7–10, 2016, vol. 1737, pp. 154–159 (2016) Gharavi, E., Bijari, K., Zahirnia, K., Veisi, H.: A deep learning approach to Persian plagiarism detection. In: Working notes of FIRE 2016—Forum for Information Retrieval Evaluation, Kolkata, India, December 7–10, 2016, vol. 1737, pp. 154–159 (2016)
12.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
13.
go back to reference Huang, E.: Paraphrase detection using recursive autoencoder. In: Stanford NLP Group, Natural Language Processing, Final Projects Reports (Stanford University, Stanford, CA, 2011) (2011); Huang, E.: Paraphrase detection using recursive autoencoder. In: Stanford NLP Group, Natural Language Processing, Final Projects Reports (Stanford University, Stanford, CA, 2011) (2011) Huang, E.: Paraphrase detection using recursive autoencoder. In: Stanford NLP Group, Natural Language Processing, Final Projects Reports (Stanford University, Stanford, CA, 2011) (2011); Huang, E.: Paraphrase detection using recursive autoencoder. In: Stanford NLP Group, Natural Language Processing, Final Projects Reports (Stanford University, Stanford, CA, 2011) (2011)
15.
go back to reference Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 655–665. Association for Computational Linguistics (2014) Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 655–665. Association for Computational Linguistics (2014)
16.
go back to reference Kenter, T., Borisov, A., de Rijke, M.: Siamese CBOW: optimizing word embeddings for sentence representations. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 941–951. Association for Computational Linguistics, Berlin, Germany (2016) Kenter, T., Borisov, A., de Rijke, M.: Siamese CBOW: optimizing word embeddings for sentence representations. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 941–951. Association for Computational Linguistics, Berlin, Germany (2016)
17.
go back to reference Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1746–1751 (2014) Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1746–1751 (2014)
18.
go back to reference Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R.S., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In: NIPS, pp. 3294–3302 (2015) Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R.S., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In: NIPS, pp. 3294–3302 (2015)
19.
go back to reference Lin, R., Liu, S., Yang, M., Li, M., Zhou, M., Li, S.: Hierarchical recurrent neural network for document modeling. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 899–907. The Association for Computational Linguistics (2015) Lin, R., Liu, S., Yang, M., Li, M., Zhou, M., Li, S.: Hierarchical recurrent neural network for document modeling. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 899–907. The Association for Computational Linguistics (2015)
20.
go back to reference Madnani, N., Tetreault, J., Chodorow, M.: Re-examining machine translation metrics for paraphrase identification. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 182–190. Association for Computational Linguistics (2012) Madnani, N., Tetreault, J., Chodorow, M.: Re-examining machine translation metrics for paraphrase identification. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 182–190. Association for Computational Linguistics (2012)
21.
go back to reference Mani, I.: Summarization evaluation: An overview. In: In Proceedings of the North American chapter of the association for computational linguistics (NAACL). Workshop on Automatic Summarization (2001) Mani, I.: Summarization evaluation: An overview. In: In Proceedings of the North American chapter of the association for computational linguistics (NAACL). Workshop on Automatic Summarization (2001)
22.
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
23.
go back to reference Periwal, M.: Generating semantic sentences. In: Published in SSRN Electronics Journal (2017) Periwal, M.: Generating semantic sentences. In: Published in SSRN Electronics Journal (2017)
24.
go back to reference Rus, V., McCarthy, P., Lintean, M., McNamara, D., Graesser, A.: Paraphrase identification with lexico-syntactic graph subsumption. In: Proceedings of the 21th International Florida Artificial Intelligence Research Society Conference, FLAIRS-21, pp. 201–206 (2008) Rus, V., McCarthy, P., Lintean, M., McNamara, D., Graesser, A.: Paraphrase identification with lexico-syntactic graph subsumption. In: Proceedings of the 21th International Florida Artificial Intelligence Research Society Conference, FLAIRS-21, pp. 201–206 (2008)
25.
go back to reference Sahlgren, M.: An introduct ion to random indexing. In: Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005 (2005) Sahlgren, M.: An introduct ion to random indexing. In: Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005 (2005)
26.
go back to reference Sahlgren, M.: The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces (2006) Sahlgren, M.: The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces (2006)
27.
go back to reference Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRef Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRef
28.
go back to reference Schütze, H.: Word space. Adv. Neur. Inf. Process. Syst. 5, 895–902 (1993) Schütze, H.: Word space. Adv. Neur. Inf. Process. Syst. 5, 895–902 (1993)
29.
go back to reference Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 27, 3104–3112 (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 27, 3104–3112 (2014)
30.
go back to reference Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1556–1566. Association for Computational Linguistics, Beijing, China (2015) Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1556–1566. Association for Computational Linguistics, Beijing, China (2015)
31.
go back to reference Tellex, S., Katz, B., Lin, J., Fern, A., Marton, G.: Quantitative evaluation of passage retrieval algorithms for question answering. In: Proceedings of the 26th Annual International ACM SIGIR Conference, pp. 41–47 (2003) Tellex, S., Katz, B., Lin, J., Fern, A., Marton, G.: Quantitative evaluation of passage retrieval algorithms for question answering. In: Proceedings of the 26th Annual International ACM SIGIR Conference, pp. 41–47 (2003)
32.
go back to reference White, L., Togneri, R., Liu, W., Bennamoun, M.: How well sentence embeddings capture meaning. In: ADCS, pp. 9:1–9:8. ACM (2015) White, L., Togneri, R., Liu, W., Bennamoun, M.: How well sentence embeddings capture meaning. In: ADCS, pp. 9:1–9:8. ACM (2015)
33.
go back to reference Yang, R., Zhang, J., Gao, X., Ji, F., Chen, H.: Simple and effective text matching with richer alignment features. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4699–4709. Association for Computational Linguistics, Florence, Italy (2019) Yang, R., Zhang, J., Gao, X., Ji, F., Chen, H.: Simple and effective text matching with richer alignment features. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4699–4709. Association for Computational Linguistics, Florence, Italy (2019)
34.
go back to reference Tau Yih, W., Toutanova, K., Platt, J.C., Meek, C.: Learning discriminative projections for text similarity measures. In: CoNLL, pp. 247–256. ACL (2011). Tau Yih, W., Toutanova, K., Platt, J.C., Meek, C.: Learning discriminative projections for text similarity measures. In: CoNLL, pp. 247–256. ACL (2011).
35.
go back to reference Yin, W., Schütze, H., Xiang, B., Zhou, B.: Abcnn: Attention-based convolutional neural network for modelling sentence pairs. Trans. Assoc. Comput. Linguist. 4, 259–272 (2016)CrossRef Yin, W., Schütze, H., Xiang, B., Zhou, B.: Abcnn: Attention-based convolutional neural network for modelling sentence pairs. Trans. Assoc. Comput. Linguist. 4, 259–272 (2016)CrossRef
36.
go back to reference Zhang, C., Sah, S., Nguyen, T., Peri, D., Loui, A., Salvaggio, C., Ptucha, R.W.: Semantic sentence embeddings for paraphrasing and text summarization. In: 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP) abs/1809.10267, pp. 705–709 (2018) Zhang, C., Sah, S., Nguyen, T., Peri, D., Loui, A., Salvaggio, C., Ptucha, R.W.: Semantic sentence embeddings for paraphrasing and text summarization. In: 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP) abs/1809.10267, pp. 705–709 (2018)
37.
go back to reference Xu, S., Shen, X., Fukumoto, F., Li, J., Suzuki, Y., Nishizaki, H.: Paraphrase identification with Lexical, syntactic and sentential encodings. Appl. Sci. 10, 4144 (2020) Xu, S., Shen, X., Fukumoto, F., Li, J., Suzuki, Y., Nishizaki, H.: Paraphrase identification with Lexical, syntactic and sentential encodings. Appl. Sci. 10, 4144 (2020)
38.
go back to reference Yinfei, Y., Yuan, Z., Chris, T., Jason, B.: PAWS-X: a Cross-lingual Adversarial Dataset for Paraphrase Identification. CoRR, Abs/ 1908(11828), 1–6 (2019) Yinfei, Y., Yuan, Z., Chris, T., Jason, B.: PAWS-X: a Cross-lingual Adversarial Dataset for Paraphrase Identification. CoRR, Abs/ 1908(11828), 1–6 (2019)
39.
go back to reference Mohamed, I., Hosam, W.: Exploring the recent trends of paraphrase detection. Int. J. Comput. Appl. 182, 1–5 (2019) Mohamed, I., Hosam, W.: Exploring the recent trends of paraphrase detection. Int. J. Comput. Appl. 182, 1–5 (2019)
40.
go back to reference Dhall, D., Kaur R., Juneja M.: Machine learning: a review of the algorithms and its applications. In: Singh, P., Kar, A.,Singh, Y., Kolekar, M., Tranwar, S. (eds) Proceedings of ICRIC 2019. Lecture Notes in Electrical Engineering, vol. 597, pp. 47–63. Springer, Cham Dhall, D., Kaur R., Juneja M.: Machine learning: a review of the algorithms and its applications. In: Singh, P., Kar, A.,Singh, Y., Kolekar, M., Tranwar, S. (eds) Proceedings of ICRIC 2019. Lecture Notes in Electrical Engineering, vol. 597, pp. 47–63. Springer, Cham
Metadata
Title
A Deep Network Model for Paraphrase Detection in Punjabi
Authors
Arwinder Singh
Gurpreet Singh Josan
Copyright Year
2021
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-15-8297-4_15

Premium Partner