Skip to main content

2018 | Supplement | Buchkapitel

Inference Algorithms in Latent Dirichlet Allocation for Semantic Classification

verfasst von : Wan Mohammad Aflah Mohammad Zubir, Izzatdin Abdul Aziz, Jafreezal Jaafar, Mohd Hilmi Hasan

Erschienen in: Applied Computational Intelligence and Mathematical Methods

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

There are existing implementations of Latent Dirichlet Allocation (LDA) algorithm as a semantic classifier to arrange the data for efficient retrieval. However, the problem of learning or inferencing the posterior distribution of the algorithm is trivial. Inferencing directly the prior distribution could lead to time taken to increase exponentially. It is due to the coupling of the hyperparameters. Several inference algorithms have been implemented together with LDA to solve this issue. The inference algorithm used in this research work is Gibbs sampling. Research using Gibbs sampling shows promising results in comparison to other inference algorithms, especially in the performance of the algorithm. It still takes a long time to compute the topic distribution of the data. There are still room for improvement in the time taken for the algorithm to complete the topic distribution. Using two datasets, an evaluation of the performance of the algorithm has been conducted. Results show that Gibbs sampling as the inference algorithm provides a better prediction on the optimal number of topic of the data in comparison to Variational Expectation Maximization (VEM).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Arora, R., Ravindran, B.: Latent Dirichlet allocation based multi-document summarization. In: Paper Presented at the Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data (2008) Arora, R., Ravindran, B.: Latent Dirichlet allocation based multi-document summarization. In: Paper Presented at the Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data (2008)
Zurück zum Zitat Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and IR precision-recall measures. In: Paper Presented at the Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada (2003) Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and IR precision-recall measures. In: Paper Presented at the Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada (2003)
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
Zurück zum Zitat Girolami, M., Kabán, A.: On an equivalence between PLSI and LDA. In: Paper Presented at the Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2003) Girolami, M., Kabán, A.: On an equivalence between PLSI and LDA. In: Paper Presented at the Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2003)
Zurück zum Zitat Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(Suppl. 1), 5228–5235 (2004)CrossRef Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(Suppl. 1), 5228–5235 (2004)CrossRef
Zurück zum Zitat Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Paper Presented at the Advances in Neural Information Processing Systems (2010) Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Paper Presented at the Advances in Neural Information Processing Systems (2010)
Zurück zum Zitat Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)CrossRefMATH Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)CrossRefMATH
Zurück zum Zitat Hu, D.J.: Latent Dirichlet allocation for text, images, and music. University of California, San Diego (2009). Accessed 26 Apr. 2013 Hu, D.J.: Latent Dirichlet allocation for text, images, and music. University of California, San Diego (2009). Accessed 26 Apr. 2013
Zurück zum Zitat Kim, D.-K., Motoyama, M., Voelker, G.M., Saul, L.K.: Topic modeling of freelance job postings to monitor web service abuse. In: Paper Presented at the Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence (2011) Kim, D.-K., Motoyama, M., Voelker, G.M., Saul, L.K.: Topic modeling of freelance job postings to monitor web service abuse. In: Paper Presented at the Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence (2011)
Zurück zum Zitat Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W.: Handbook of Latent Semantic Analysis. Psychology Press, Hove (2013) Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W.: Handbook of Latent Semantic Analysis. Psychology Press, Hove (2013)
Zurück zum Zitat Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Paper Presented at the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, Portland, OR (2011) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Paper Presented at the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, Portland, OR (2011)
Zurück zum Zitat PETRONAS: Final well report ANGSI (2001) PETRONAS: Final well report ANGSI (2001)
Zurück zum Zitat Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Paper Presented at the Proceedings of the 26th Annual International Conference on Machine Learning (2009) Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Paper Presented at the Proceedings of the 26th Annual International Conference on Machine Learning (2009)
Zurück zum Zitat Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Paper Presented at the Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2006) Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Paper Presented at the Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2006)
Zurück zum Zitat Wu, L., Hoi, S.C., Yu, N.: Semantics-preserving bag-of-words models and applications. IEEE Trans. Image Process. 19(7), 1908–1920 (2010)MathSciNetCrossRef Wu, L., Hoi, S.C., Yu, N.: Semantics-preserving bag-of-words models and applications. IEEE Trans. Image Process. 19(7), 1908–1920 (2010)MathSciNetCrossRef
Zurück zum Zitat Yildirim, I.: Bayesian inference: Gibbs sampling. Technical note, University of Rochester (2012) Yildirim, I.: Bayesian inference: Gibbs sampling. Technical note, University of Rochester (2012)
Zurück zum Zitat Yu, J., Mohan, S., Putthividhya, D.P., Wong, W.-K.: Latent Dirichlet allocation based diversified retrieval for e-commerce search. In: Paper Presented at the Proceedings of the 7th ACM International Conference on Web Search and Data Mining (2014) Yu, J., Mohan, S., Putthividhya, D.P., Wong, W.-K.: Latent Dirichlet allocation based diversified retrieval for e-commerce search. In: Paper Presented at the Proceedings of the 7th ACM International Conference on Web Search and Data Mining (2014)
Zurück zum Zitat Zhao, W., Chen, J.J., Perkins, R., Liu, Z., Ge, W., Ding, Y., Zou, W.: A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinform. 16(13), S8 (2015)CrossRef Zhao, W., Chen, J.J., Perkins, R., Liu, Z., Ge, W., Ding, Y., Zou, W.: A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinform. 16(13), S8 (2015)CrossRef
Metadaten
Titel
Inference Algorithms in Latent Dirichlet Allocation for Semantic Classification
verfasst von
Wan Mohammad Aflah Mohammad Zubir
Izzatdin Abdul Aziz
Jafreezal Jaafar
Mohd Hilmi Hasan
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-67621-0_16