Skip to main content
Top

2018 | Supplement | Chapter

Inference Algorithms in Latent Dirichlet Allocation for Semantic Classification

Authors : Wan Mohammad Aflah Mohammad Zubir, Izzatdin Abdul Aziz, Jafreezal Jaafar, Mohd Hilmi Hasan

Published in: Applied Computational Intelligence and Mathematical Methods

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

There are existing implementations of Latent Dirichlet Allocation (LDA) algorithm as a semantic classifier to arrange the data for efficient retrieval. However, the problem of learning or inferencing the posterior distribution of the algorithm is trivial. Inferencing directly the prior distribution could lead to time taken to increase exponentially. It is due to the coupling of the hyperparameters. Several inference algorithms have been implemented together with LDA to solve this issue. The inference algorithm used in this research work is Gibbs sampling. Research using Gibbs sampling shows promising results in comparison to other inference algorithms, especially in the performance of the algorithm. It still takes a long time to compute the topic distribution of the data. There are still room for improvement in the time taken for the algorithm to complete the topic distribution. Using two datasets, an evaluation of the performance of the algorithm has been conducted. Results show that Gibbs sampling as the inference algorithm provides a better prediction on the optimal number of topic of the data in comparison to Variational Expectation Maximization (VEM).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Arora, R., Ravindran, B.: Latent Dirichlet allocation based multi-document summarization. In: Paper Presented at the Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data (2008) Arora, R., Ravindran, B.: Latent Dirichlet allocation based multi-document summarization. In: Paper Presented at the Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data (2008)
go back to reference Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and IR precision-recall measures. In: Paper Presented at the Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada (2003) Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and IR precision-recall measures. In: Paper Presented at the Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada (2003)
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
go back to reference Girolami, M., Kabán, A.: On an equivalence between PLSI and LDA. In: Paper Presented at the Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2003) Girolami, M., Kabán, A.: On an equivalence between PLSI and LDA. In: Paper Presented at the Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2003)
go back to reference Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(Suppl. 1), 5228–5235 (2004)CrossRef Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(Suppl. 1), 5228–5235 (2004)CrossRef
go back to reference Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Paper Presented at the Advances in Neural Information Processing Systems (2010) Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Paper Presented at the Advances in Neural Information Processing Systems (2010)
go back to reference Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)CrossRefMATH Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)CrossRefMATH
go back to reference Hu, D.J.: Latent Dirichlet allocation for text, images, and music. University of California, San Diego (2009). Accessed 26 Apr. 2013 Hu, D.J.: Latent Dirichlet allocation for text, images, and music. University of California, San Diego (2009). Accessed 26 Apr. 2013
go back to reference Kim, D.-K., Motoyama, M., Voelker, G.M., Saul, L.K.: Topic modeling of freelance job postings to monitor web service abuse. In: Paper Presented at the Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence (2011) Kim, D.-K., Motoyama, M., Voelker, G.M., Saul, L.K.: Topic modeling of freelance job postings to monitor web service abuse. In: Paper Presented at the Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence (2011)
go back to reference Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W.: Handbook of Latent Semantic Analysis. Psychology Press, Hove (2013) Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W.: Handbook of Latent Semantic Analysis. Psychology Press, Hove (2013)
go back to reference Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Paper Presented at the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, Portland, OR (2011) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Paper Presented at the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, Portland, OR (2011)
go back to reference Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Paper Presented at the Proceedings of the 26th Annual International Conference on Machine Learning (2009) Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Paper Presented at the Proceedings of the 26th Annual International Conference on Machine Learning (2009)
go back to reference Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Paper Presented at the Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2006) Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Paper Presented at the Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2006)
go back to reference Wu, L., Hoi, S.C., Yu, N.: Semantics-preserving bag-of-words models and applications. IEEE Trans. Image Process. 19(7), 1908–1920 (2010)MathSciNetCrossRef Wu, L., Hoi, S.C., Yu, N.: Semantics-preserving bag-of-words models and applications. IEEE Trans. Image Process. 19(7), 1908–1920 (2010)MathSciNetCrossRef
go back to reference Yildirim, I.: Bayesian inference: Gibbs sampling. Technical note, University of Rochester (2012) Yildirim, I.: Bayesian inference: Gibbs sampling. Technical note, University of Rochester (2012)
go back to reference Yu, J., Mohan, S., Putthividhya, D.P., Wong, W.-K.: Latent Dirichlet allocation based diversified retrieval for e-commerce search. In: Paper Presented at the Proceedings of the 7th ACM International Conference on Web Search and Data Mining (2014) Yu, J., Mohan, S., Putthividhya, D.P., Wong, W.-K.: Latent Dirichlet allocation based diversified retrieval for e-commerce search. In: Paper Presented at the Proceedings of the 7th ACM International Conference on Web Search and Data Mining (2014)
go back to reference Zhao, W., Chen, J.J., Perkins, R., Liu, Z., Ge, W., Ding, Y., Zou, W.: A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinform. 16(13), S8 (2015)CrossRef Zhao, W., Chen, J.J., Perkins, R., Liu, Z., Ge, W., Ding, Y., Zou, W.: A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinform. 16(13), S8 (2015)CrossRef
Metadata
Title
Inference Algorithms in Latent Dirichlet Allocation for Semantic Classification
Authors
Wan Mohammad Aflah Mohammad Zubir
Izzatdin Abdul Aziz
Jafreezal Jaafar
Mohd Hilmi Hasan
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-67621-0_16

Premium Partner