Skip to main content
Top
Published in: Soft Computing 11/2020

12-11-2019 | Focus

A deep extraction model for an unseen keyphrase detection

Authors: Amin Ghazi Zahedi, Morteza Zahedi, Mansoor Fateh

Published in: Soft Computing | Issue 11/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The keyphrase represents the basic concepts for a text. In many natural language processing tasks, it is necessary to extract qualitative keyphrases. Considering previous studies regarding text modeling, the meanings and concepts associated with the text had not been particularly considered as significant. According to recent research, cluster-related documents have a good subscription, especially in the keyphrases that are not directly appearing in a text document. Therefore, in this study, the main structure of the proposed model is based on the keyphrases disappearing in the document. We called it unseen keyphrase. Considering the proposed method, a model is developed to extract the basic concepts of the text using the same text estimates and through adding keyphrases to the deep network hidden layers of training. The main purpose of this structure is to first make visible unseen keyphrase and then to use an RNN to predict them. Considering the proposed method, the problem of not representing basic concepts and the unseen keyphrase are significantly solved. This study provides new insight into the concept of text. This mechanism is used by highlighting the role of unseen keyphrase that appears directly without the need for external knowledge. This method is tested on four public datasets in this field. The results revealed an average improvement of 12% compared to the public methods such as TF-IDF, KEA, and RNN.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Alrehamy H, Walker C (2018) Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction. Soft Comput 22(21):7041–7057CrossRef Alrehamy H, Walker C (2018) Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction. Soft Comput 22(21):7041–7057CrossRef
go back to reference Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: International conference on machine learning, vol 28, pp 1247–55 Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: International conference on machine learning, vol 28, pp 1247–55
go back to reference Atarashi K (2018) Semi-supervised learning from crowds using deep generative models, pp 1555–1562 Atarashi K (2018) Semi-supervised learning from crowds using deep generative models, pp 1555–1562
go back to reference Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate, pp 1–15 Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate, pp 1–15
go back to reference Bruna J, Mallat S (2013) Invariant scattering convolution networks. IEEE Trans Pattern Anal Mach Intell 35(8):1872–1886CrossRef Bruna J, Mallat S (2013) Invariant scattering convolution networks. IEEE Trans Pattern Anal Mach Intell 35(8):1872–1886CrossRef
go back to reference Cai D, He X, Han J (2011) Locally consistent concept factorization for document clustering. IEEE Trans Knowl Data Eng 23(6):902–913CrossRef Cai D, He X, Han J (2011) Locally consistent concept factorization for document clustering. IEEE Trans Knowl Data Eng 23(6):902–913CrossRef
go back to reference De Soete G, Carroll JD (1994) K-means clustering in a low-dimensional Euclidean space. In: New approaches in classification and data analysis. Springer, pp 212–19 De Soete G, Carroll JD (1994) K-means clustering in a low-dimensional Euclidean space. In: New approaches in classification and data analysis. Springer, pp 212–19
go back to reference Grineva M, Grinev M, Lizorkin D (2009) Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th international conference on world wide web—WWW ’09. ACM Press, New York Grineva M, Grinev M, Lizorkin D (2009) Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th international conference on world wide web—WWW ’09. ACM Press, New York
go back to reference Gu J, Lu Z, Li H, Li VOK (2016) Incorporating copying mechanism in sequence-to-sequence learning. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Stroudsburg, pp 1631–1640 Gu J, Lu Z, Li H, Li VOK (2016) Incorporating copying mechanism in sequence-to-sequence learning. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Stroudsburg, pp 1631–1640
go back to reference Hasan KS, Ng V (2010) Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In 23rd international conference on computational linguistics association for computational linguistics coling 2010, pp 365–73 Hasan KS, Ng V (2010) Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In 23rd international conference on computational linguistics association for computational linguistics coling 2010, pp 365–73
go back to reference Hershey JR, Chen Z, Roux JL, Watanabe S (2016) Deep clustering: discriminative embeddings for segmentation and separation. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) IEEE, pp 31–35 Hershey JR, Chen Z, Roux JL, Watanabe S (2016) Deep clustering: discriminative embeddings for segmentation and separation. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) IEEE, pp 31–35
go back to reference Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRef Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRef
go back to reference Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
go back to reference Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366CrossRef Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366CrossRef
go back to reference Hulth A, Megyesi BB (2006) A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the ACL–ACL ’06. Association for Computational Linguistics, Morristown, pp 537–44 Hulth A, Megyesi BB (2006) A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the ACL–ACL ’06. Association for Computational Linguistics, Morristown, pp 537–44
go back to reference Liu Z, Chen X, Zheng Y, Sun M (2011) Automatic keyphrase extraction by bridging vocabulary gap. In: Proceedings of the fifteenth conference on computational natural language learning. Association for Computational Linguistics, pp 135–44 Liu Z, Chen X, Zheng Y, Sun M (2011) Automatic keyphrase extraction by bridging vocabulary gap. In: Proceedings of the fifteenth conference on computational natural language learning. Association for Computational Linguistics, pp 135–44
go back to reference Liu J, Ren H, Wu M, Wang J, Kim H (2017) Multiple relations extraction among multiple entities in unstructured text. Soft Comput 22:4295–4305CrossRef Liu J, Ren H, Wu M, Wang J, Kim H (2017) Multiple relations extraction among multiple entities in unstructured text. Soft Comput 22:4295–4305CrossRef
go back to reference Medelyan O, Frank E, Witten IH (2009) Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 3. Association for Computational Linguistics, pp 1318–27 Medelyan O, Frank E, Witten IH (2009) Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 3. Association for Computational Linguistics, pp 1318–27
go back to reference Meng R, Zhao S, Han S, He D, Brusilovsky P, Chi Y (2017) Deep keyphrase generation. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Stroudsburg, pp 582–92 Meng R, Zhao S, Han S, He D, Brusilovsky P, Chi Y (2017) Deep keyphrase generation. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Stroudsburg, pp 582–92
go back to reference Mihalcea R (2004) Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL 2004 on interactive poster and demonstration sessions, vol 85. Association for Computational Linguistics, Morristown, pp 20 Mihalcea R (2004) Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL 2004 on interactive poster and demonstration sessions, vol 85. Association for Computational Linguistics, Morristown, pp 20
go back to reference Mihalcea R, Tarau P (2004) TextRank: bringing order into texts. Proc EMNLP 85:404–411 Mihalcea R, Tarau P (2004) TextRank: bringing order into texts. Proc EMNLP 85:404–411
go back to reference Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–14 Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–14
go back to reference Ng A (2011) Sparse autoencoder. CS294A Lecture Notes, pp 1–19 Ng A (2011) Sparse autoencoder. CS294A Lecture Notes, pp 1–19
go back to reference Patel VM, Van Nguyen H, Vidal RR, Van Nguyen H, Vidal RR (2013) Latent space sparse subspace clustering. In: Proceedings of the IEEE international conference on computer vision, pp 225–32 Patel VM, Van Nguyen H, Vidal RR, Van Nguyen H, Vidal RR (2013) Latent space sparse subspace clustering. In: Proceedings of the IEEE international conference on computer vision, pp 225–32
go back to reference Shen S, Cheng Y, He Z, He W, Wu H, Sun M, Liu Y (2015) Minimum risk training for neural machine translation. ArXiv Preprint arXiv:1512.02433 Shen S, Cheng Y, He Z, He W, Wu H, Sun M, Liu Y (2015) Minimum risk training for neural machine translation. ArXiv Preprint arXiv:​1512.​02433
go back to reference Van Merri B, Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. ArXiv Preprint arXiv:1406.1078 Van Merri B, Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. ArXiv Preprint arXiv:​1406.​1078
go back to reference Vinyals O, Kaiser Ł, Koo T, Petrov S, Sutskever I, Hinton G (2015) Grammar as a foreign language. In: Advances in neural information processing systems, pp 2773–2781 Vinyals O, Kaiser Ł, Koo T, Petrov S, Sutskever I, Hinton G (2015) Grammar as a foreign language. In: Advances in neural information processing systems, pp 2773–2781
go back to reference Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, vol 48, pp 478–87 Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, vol 48, pp 478–87
go back to reference Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization.” In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 267–73 Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization.” In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 267–73
go back to reference Yang J, Parikh D, Batra D (2016) Joint unsupervised learning of deep representations and image clusters. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 5147–56 Yang J, Parikh D, Batra D (2016) Joint unsupervised learning of deep representations and image clusters. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 5147–56
go back to reference Yang B, Xiao F, Sidiropoulos ND (2017a) Learning from hidden traits: joint factor analysis and latent clustering. IEEE Trans Signal Process 65(1):256–269MathSciNetCrossRef Yang B, Xiao F, Sidiropoulos ND (2017a) Learning from hidden traits: joint factor analysis and latent clustering. IEEE Trans Signal Process 65(1):256–269MathSciNetCrossRef
go back to reference Yang B, Fu X, Sidiropoulos ND, Hong M (2017b) Towards K-means-friendly spaces: simultaneous deep learning and clustering. In: 34th international conference on machine learning, ICML 2017, 8, pp 5888–5901 Yang B, Fu X, Sidiropoulos ND, Hong M (2017b) Towards K-means-friendly spaces: simultaneous deep learning and clustering. In: 34th international conference on machine learning, ICML 2017, 8, pp 5888–5901
go back to reference Zhang Q, Wang Y, Gong Y, Huang X (2016) Keyphrase extraction using deep recurrent neural networks on twitter. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 836–45 Zhang Q, Wang Y, Gong Y, Huang X (2016) Keyphrase extraction using deep recurrent neural networks on twitter. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 836–45
go back to reference Zreik C, Bouveyron P, Latouche R (2016) The stochastic topic block model for the clustering of vertices in networks with textual edges. Statistics and Computing 28:11–31MathSciNetMATH Zreik C, Bouveyron P, Latouche R (2016) The stochastic topic block model for the clustering of vertices in networks with textual edges. Statistics and Computing 28:11–31MathSciNetMATH
Metadata
Title
A deep extraction model for an unseen keyphrase detection
Authors
Amin Ghazi Zahedi
Morteza Zahedi
Mansoor Fateh
Publication date
12-11-2019
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 11/2020
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-04486-2

Other articles of this Issue 11/2020

Soft Computing 11/2020 Go to the issue

Premium Partner