Skip to main content

2019 | OriginalPaper | Buchkapitel

Effectively Representing Short Text via the Improved Semantic Feature Space Mapping

verfasst von : Ting Tuo, Huifang Ma, Haijiao Liu, Jiahui Wei

Erschienen in: Trends and Applications in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Short text representation (STR) has attracted increasing interests recently with the rapid growth of Web and social media data existing in short text form. In this paper, we present a new method using an improved semantic feature space mapping to effectively represent short texts. Firstly, semantic clustering of terms is performed based on statistical analysis and word2vec, and the semantic feature space can then be represented via the cluster center. Then, the context information of terms is integrated with the semantic feature space, based on which three improved similarity calculation methods are established. Thereafter the text mapping matrix is constructed for short text representation learning. Experiments on both Chinese and English test collections show that the proposed method can well reflect the semantic information of short texts and represent the short texts reasonably and effectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lu, H.Y., Xie, L.Y., Kang, N, et al.: Don’t forget the quantifiable relationship between words: using recurrent neural network for short text topic discovery. In: AAAI 2017, pp. 1192–1198 (2017) Lu, H.Y., Xie, L.Y., Kang, N, et al.: Don’t forget the quantifiable relationship between words: using recurrent neural network for short text topic discovery. In: AAAI 2017, pp. 1192–1198 (2017)
2.
Zurück zum Zitat Piao, G.Y, Breslin, J.G.: User modeling on Twitter with WordNet Synsets and DBpedia concepts for personalized recommendations. In: CIKM 2016, pp. 2057–2060 (2016) Piao, G.Y, Breslin, J.G.: User modeling on Twitter with WordNet Synsets and DBpedia concepts for personalized recommendations. In: CIKM 2016, pp. 2057–2060 (2016)
3.
Zurück zum Zitat Li, P., Wang, H., Zhu, K.Q., et al.: A large probabilistic semantic network based approach to compute term similarity. IEEE Trans. Knowl. Data Eng. 27(10), 2604–2617 (2015)CrossRef Li, P., Wang, H., Zhu, K.Q., et al.: A large probabilistic semantic network based approach to compute term similarity. IEEE Trans. Knowl. Data Eng. 27(10), 2604–2617 (2015)CrossRef
4.
Zurück zum Zitat Kumar, S., Rengarajan, P., Annie, A.X.: Using Wikipedia category network to generate topic trees. In: AAAI 2017, pp. 4951–4952 (2017) Kumar, S., Rengarajan, P., Annie, A.X.: Using Wikipedia category network to generate topic trees. In: AAAI 2017, pp. 4951–4952 (2017)
5.
Zurück zum Zitat Shen, J., Wu, Z., Lei, D., Shang, J., Ren, X., Han, J.: SetExpan: corpus-based set expansion via context feature selection and rank ensemble. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10534, pp. 288–304. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71249-9_18CrossRef Shen, J., Wu, Z., Lei, D., Shang, J., Ren, X., Han, J.: SetExpan: corpus-based set expansion via context feature selection and rank ensemble. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10534, pp. 288–304. Springer, Cham (2017). https://​doi.​org/​10.​1007/​978-3-319-71249-9_​18CrossRef
6.
Zurück zum Zitat Wang, D.Z.: Archimedes: efficient query processing over probabilistic knowledge bases. ACM SIGMOD 46(2), 30–35 (2017)CrossRef Wang, D.Z.: Archimedes: efficient query processing over probabilistic knowledge bases. ACM SIGMOD 46(2), 30–35 (2017)CrossRef
7.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: NIPS 2013, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: NIPS 2013, pp. 3111–3119 (2013)
8.
Zurück zum Zitat Jiang, H.D., Turki, T., Wang J.T.L.: Reverse engineering regulatory networks in cells using a dynamic bayesian network and mutual information scoring function. In: ICMLA 2017, pp. 761–764 (2017) Jiang, H.D., Turki, T., Wang J.T.L.: Reverse engineering regulatory networks in cells using a dynamic bayesian network and mutual information scoring function. In: ICMLA 2017, pp. 761–764 (2017)
9.
Zurück zum Zitat Amagata, D., Hara, T.: Mining top-k co-occurrence patterns across multiple streams. IEEE Trans. Knowl. Data Eng. 29(10), 2249–2262 (2017)CrossRef Amagata, D., Hara, T.: Mining top-k co-occurrence patterns across multiple streams. IEEE Trans. Knowl. Data Eng. 29(10), 2249–2262 (2017)CrossRef
10.
Zurück zum Zitat Ma, H.F., Xing, Y., Wang, S., et al.: Leveraging term co-occurrence distance and strong classification features for short text feature selection. In: KSEM 2017, pp. 67–75(2017)CrossRef Ma, H.F., Xing, Y., Wang, S., et al.: Leveraging term co-occurrence distance and strong classification features for short text feature selection. In: KSEM 2017, pp. 67–75(2017)CrossRef
11.
Zurück zum Zitat Song, H., Lee, J.G., Han, W.S.: PAMAE: parallel k-medoids clustering with high accuracy and efficiency. In: KDD 2017, pp. 1087–1096 (2017) Song, H., Lee, J.G., Han, W.S.: PAMAE: parallel k-medoids clustering with high accuracy and efficiency. In: KDD 2017, pp. 1087–1096 (2017)
14.
Zurück zum Zitat Ali, C.M., Khalid, S., Aslam, M.H.: Pattern based comprehensive urdu stemmer and short text classification. IEEE Access 6, 7374–7389 (2018)CrossRef Ali, C.M., Khalid, S., Aslam, M.H.: Pattern based comprehensive urdu stemmer and short text classification. IEEE Access 6, 7374–7389 (2018)CrossRef
Metadaten
Titel
Effectively Representing Short Text via the Improved Semantic Feature Space Mapping
verfasst von
Ting Tuo
Huifang Ma
Haijiao Liu
Jiahui Wei
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-26142-9_27