Skip to main content

2018 | OriginalPaper | Buchkapitel

Introducing Semantics in Short Text Classification

verfasst von : Ameni Bouaziz, Célia da Costa Pereira, Christel Dartigues-Pallez, Frédéric Precioso

Erschienen in: Computational Linguistics and Intelligent Text Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

To overcome short text classification issues due to shortness and sparseness, the enrichment process is classically proposed: topics (word clusters) are extracted from external knowledge sources using Latent Dirichlet Allocation. All the words, associated to topics which encompass short text words, are added to the initial short text content. We propose (i) an explicit representation of a two-level enrichment method in which the enrichment is considered either with respect to each word in the text or to the global semantic meaning of the short text and (ii) a new semantic Random Forest kind in which semantic relations between features are taken into account at node level rather than at tree level as it was recently proposed in the literature to avoid potential tree correlation. We demonstrate that our enrichment method is valid not only for Random Forest based methods but also for other methods like MaxEnt, SVM and Naive Bayes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text and web with hidden topics from large-scale data collections. In: International Conference on World Wide Web, pp. 91–100. ACM (2008) Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text and web with hidden topics from large-scale data collections. In: International Conference on World Wide Web, pp. 91–100. ACM (2008)
2.
Zurück zum Zitat Yang, L., Li, C., Ding, Q., Li, L.: Combining lexical and semantic features for short text classification. Procedia Comput. Sci. 22, 78–86 (2013)CrossRef Yang, L., Li, C., Ding, Q., Li, L.: Combining lexical and semantic features for short text classification. Procedia Comput. Sci. 22, 78–86 (2013)CrossRef
3.
Zurück zum Zitat Amaratunga, D., Cabrera, J., Lee, Y.S.: Enriched random forests. Bioinformatics 24, 2010–2014 (2008)CrossRef Amaratunga, D., Cabrera, J., Lee, Y.S.: Enriched random forests. Bioinformatics 24, 2010–2014 (2008)CrossRef
4.
Zurück zum Zitat Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: IJCAI, pp. 1776–1781 (2011) Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: IJCAI, pp. 1776–1781 (2011)
5.
Zurück zum Zitat Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. In: IJCAI, pp. 2330–2336. AAAI Press (2011) Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. In: IJCAI, pp. 2330–2336. AAAI Press (2011)
6.
7.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)MATH
8.
Zurück zum Zitat Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)MATH Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)MATH
10.
Zurück zum Zitat Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22, 39–71 (1996) Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22, 39–71 (1996)
12.
Zurück zum Zitat Hu, X., Zhang, X., Lu, C., Park, E.K., Zhou, X.: Exploiting Wikipedia as external knowledge for document clustering. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 389–396. ACM (2009) Hu, X., Zhang, X., Lu, C., Park, E.K., Zhou, X.: Exploiting Wikipedia as external knowledge for document clustering. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 389–396. ACM (2009)
13.
Zurück zum Zitat Hu, X., Sun, N., Zhang, C., Chua, T.S.: Exploiting internal and external semantics for the clustering of short texts using world knowledge. In: ACM Conference on Information and Knowledge Management, pp. 919–928 (2009) Hu, X., Sun, N., Zhang, C., Chua, T.S.: Exploiting internal and external semantics for the clustering of short texts using world knowledge. In: ACM Conference on Information and Knowledge Management, pp. 919–928 (2009)
14.
Zurück zum Zitat Dumais, S., Furnas, G., Landauer, T., Deerwester, S., Deerwester, S., et al.: Latent semantic indexing. In: Proceedings of the Text Retrieval Conference (1995) Dumais, S., Furnas, G., Landauer, T., Deerwester, S., Deerwester, S., et al.: Latent semantic indexing. In: Proceedings of the Text Retrieval Conference (1995)
15.
Zurück zum Zitat Song, G., Ye, Y., Du, X., Huang, X., Bie, S.: Short text classification: a survey. J. Multimed. 9, 635–643 (2014)CrossRef Song, G., Ye, Y., Du, X., Huang, X., Bie, S.: Short text classification: a survey. J. Multimed. 9, 635–643 (2014)CrossRef
16.
Zurück zum Zitat Rafeeque, P., Sendhilkumar, S.: A survey on short text analysis in web. In: IEEE International Conference on Advanced Computing (ICoAC), pp. 365–371 (2011) Rafeeque, P., Sendhilkumar, S.: A survey on short text analysis in web. In: IEEE International Conference on Advanced Computing (ICoAC), pp. 365–371 (2011)
17.
Zurück zum Zitat Sun, A.: Short text classification using very few words. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1145–1146 (2012) Sun, A.: Short text classification using very few words. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1145–1146 (2012)
18.
Zurück zum Zitat Vo, D.T., Ock, C.Y.: Learning to classify short text from scientific documents using topic models with various types of knowledge. Expert Syst. Appl. 42, 1684–1698 (2015)CrossRef Vo, D.T., Ock, C.Y.: Learning to classify short text from scientific documents using topic models with various types of knowledge. Expert Syst. Appl. 42, 1684–1698 (2015)CrossRef
19.
Zurück zum Zitat Caragea, D., Bahirwani, V., Aljandal, W., Hsu, W.H.: Ontology-based link prediction in the livejournal social network. In: SARA, vol. 9 (2009) Caragea, D., Bahirwani, V., Aljandal, W., Hsu, W.H.: Ontology-based link prediction in the livejournal social network. In: SARA, vol. 9 (2009)
20.
Zurück zum Zitat Chen, Z., Zhang, W.: Integrative analysis using module-guided random forests reveals correlated genetic factors related to mouse weight. PLoS Comput. Biol. 9 (2013) Chen, Z., Zhang, W.: Integrative analysis using module-guided random forests reveals correlated genetic factors related to mouse weight. PLoS Comput. Biol. 9 (2013)
Metadaten
Titel
Introducing Semantics in Short Text Classification
verfasst von
Ameni Bouaziz
Célia da Costa Pereira
Christel Dartigues-Pallez
Frédéric Precioso
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-75487-1_34

Premium Partner