Skip to main content
Erschienen in: Information Systems Frontiers 5/2020

21.07.2020

Online Variational Learning of Dirichlet Process Mixtures of Scaled Dirichlet Distributions

verfasst von: Narges Manouchehri, Hieu Nguyen, Pantea Koochemeshkian, Nizar Bouguila, Wentao Fan

Erschienen in: Information Systems Frontiers | Ausgabe 5/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data clustering as an unsupervised method has been one of the main attention-grabbing techniques and a large class of tasks can be formulated by this method. Mixture models as a branch of clustering methods have been used in various fields of research such as computer vision and pattern recognition. To apply these models, we need to address some problems such as finding a proper distribution that properly fits data, defining model complexity and estimating the model parameters. In this paper, we apply scaled Dirichlet distribution to tackle the first challenge and propose a novel online variational method to mitigate the other two issues simultaneously. The effectiveness of the proposed work is evaluated by four challenging real applications, namely, text and image spam categorization, diabetes and hepatitis detection.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Attias, H. (2000). A variational baysian framework for graphical models. In Advances in neural information processing systems (pp. 209–215). Attias, H. (2000). A variational baysian framework for graphical models. In Advances in neural information processing systems (pp. 209–215).
Zurück zum Zitat Bdiri, T., & Bouguila, N. (2011). An infinite mixture of inverted dirichlet distributions. In International Conference on Neural Information Processing (pp. 71–78). New York: Springer. Bdiri, T., & Bouguila, N. (2011). An infinite mixture of inverted dirichlet distributions. In International Conference on Neural Information Processing (pp. 71–78). New York: Springer.
Zurück zum Zitat Bdiri, T., & Bouguila, B. (2012). Positive vectors clustering using inverted dirichlet finite mixture models. Expert Systems with Applications, 39(2), 1869–1882. Bdiri, T., & Bouguila, B. (2012). Positive vectors clustering using inverted dirichlet finite mixture models. Expert Systems with Applications, 39(2), 1869–1882.
Zurück zum Zitat Blanzieri, E., & Bryl, A. (2008). A survey of learning-based techniques of email spam filtering. Artificial Intelligence Review, 29(1), 63–92. Blanzieri, E., & Bryl, A. (2008). A survey of learning-based techniques of email spam filtering. Artificial Intelligence Review, 29(1), 63–92.
Zurück zum Zitat Blei, D.M., & Jordan, M.I. (2004). Variational methods for the dirichlet process. In proceedings of the twenty-first international conference on Machine learning (pp. 12–20). ACM. Blei, D.M., & Jordan, M.I. (2004). Variational methods for the dirichlet process. In proceedings of the twenty-first international conference on Machine learning (pp. 12–20). ACM.
Zurück zum Zitat Blei, D.M., Jordan, M.I., & et al. (2006). Variational inference for dirichlet process mixtures. Bayesian Analysis, 1(1), 121–143. Blei, D.M., Jordan, M.I., & et al. (2006). Variational inference for dirichlet process mixtures. Bayesian Analysis, 1(1), 121–143.
Zurück zum Zitat Bouguila, N. (2012). Infinite liouville mixture models with application to text and texture categorization. Pattern Recognition Letters, 33(2), 103–110. Bouguila, N. (2012). Infinite liouville mixture models with application to text and texture categorization. Pattern Recognition Letters, 33(2), 103–110.
Zurück zum Zitat Bouguila, N., & Ziou, D. (2004). Dirichlet-based probability model applied to human skin detection [image skin detection]. In IEEE International Conference on Acoustics, Speech, and Signal Processing, (Vol. 5 pp V–521). IEEE, 2004. Bouguila, N., & Ziou, D. (2004). Dirichlet-based probability model applied to human skin detection [image skin detection]. In IEEE International Conference on Acoustics, Speech, and Signal Processing, (Vol. 5 pp V–521). IEEE, 2004.
Zurück zum Zitat Bouguila, N., & Ziou, D. (2006). Unsupervised selection of a finite dirichlet mixture model: an mml-based approach. IEEE Transactions on Knowledge and Data Engineering, 18(8), 993–1009. Bouguila, N., & Ziou, D. (2006). Unsupervised selection of a finite dirichlet mixture model: an mml-based approach. IEEE Transactions on Knowledge and Data Engineering, 18(8), 993–1009.
Zurück zum Zitat Bouguila, N., & Ziou, D. (2007). High-dimensional unsupervised selection and estimation of a finite generalized dirichlet mixture model based on minimum message length. IEEE transactions on pattern analysis and machine intelligence, 29(10), 1716–1731. Bouguila, N., & Ziou, D. (2007). High-dimensional unsupervised selection and estimation of a finite generalized dirichlet mixture model based on minimum message length. IEEE transactions on pattern analysis and machine intelligence, 29(10), 1716–1731.
Zurück zum Zitat Bouguila, N., & Ziou, D. (2008). A dirichlet process mixture of dirichlet distributions for classification and prediction. In IEEE workshop on machine learning for signal processing, IEEE, 2008 (pp. 297–302). Bouguila, N., & Ziou, D. (2008). A dirichlet process mixture of dirichlet distributions for classification and prediction. In IEEE workshop on machine learning for signal processing, IEEE, 2008 (pp. 297–302).
Zurück zum Zitat Bouguila, N., & Ziou, D. (2012). A countably infinite mixture model for clustering and feature selection. Knowledge Inform System, 33(2), 351–370. Bouguila, N., & Ziou, D. (2012). A countably infinite mixture model for clustering and feature selection. Knowledge Inform System, 33(2), 351–370.
Zurück zum Zitat Bishop, C.M. (2006). Pattern recognition and machine learning. Berlin: Springer. Bishop, C.M. (2006). Pattern recognition and machine learning. Berlin: Springer.
Zurück zum Zitat Dredze, M., Gevaryahu, R., & Elias-Bachrach, A. (2007). Learning fast classifiers for image spam. In CEAS (pp. 2007–487). Dredze, M., Gevaryahu, R., & Elias-Bachrach, A. (2007). Learning fast classifiers for image spam. In CEAS (pp. 2007–487).
Zurück zum Zitat Fan, W, & Bouguila, N. (2011). Infinite dirichlet mixture model and its application via variational bayes. In 2011 10th International Conference on Machine Learning and Applications and Workshops, IEEE, (Vol. 1 pp. 129–132). Fan, W, & Bouguila, N. (2011). Infinite dirichlet mixture model and its application via variational bayes. In 2011 10th International Conference on Machine Learning and Applications and Workshops, IEEE, (Vol. 1 pp. 129–132).
Zurück zum Zitat Fan, W., & Bouguila, N. (2012). Online variational learning of finite dirichlet mixture models. Evolving Systems, 3(3), 153–165. Fan, W., & Bouguila, N. (2012). Online variational learning of finite dirichlet mixture models. Evolving Systems, 3(3), 153–165.
Zurück zum Zitat Fan, W., & Bouguila, N. (2012). Online learning of a dirichlet process mixture of generalized dirichlet distributions for simultaneous clustering and localized feature selection. In Asian Conference on Machine Learning (pp. 113–128). Fan, W., & Bouguila, N. (2012). Online learning of a dirichlet process mixture of generalized dirichlet distributions for simultaneous clustering and localized feature selection. In Asian Conference on Machine Learning (pp. 113–128).
Zurück zum Zitat Fan, W., & Bouguila, N. (2013). Learning finite beta-liouville mixture models via variational bayes for proportional data clustering. In Twenty-Third International Joint Conference on Artificial Intelligence (pp. 1323–1329). Fan, W., & Bouguila, N. (2013). Learning finite beta-liouville mixture models via variational bayes for proportional data clustering. In Twenty-Third International Joint Conference on Artificial Intelligence (pp. 1323–1329).
Zurück zum Zitat Fan, W., & Bouguila, N. (2014). Online variational learning of generalized dirichlet mixture models with feature selection. Neurocomputing, 126, 166–179. Fan, W., & Bouguila, N. (2014). Online variational learning of generalized dirichlet mixture models with feature selection. Neurocomputing, 126, 166–179.
Zurück zum Zitat Hoffman, M., Bach, F.R., & Blei, D.M. (2010). Online learning for latent dirichlet allocation. In Advances in neural information processing systems (pp. 856–864). Hoffman, M., Bach, F.R., & Blei, D.M. (2010). Online learning for latent dirichlet allocation. In Advances in neural information processing systems (pp. 856–864).
Zurück zum Zitat Ishwaran, H., & James, L.F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453), 161–173. Ishwaran, H., & James, L.F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453), 161–173.
Zurück zum Zitat Jain, A.K., Murty, M.N., & Flynn, P.J. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264–323. Jain, A.K., Murty, M.N., & Flynn, P.J. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264–323.
Zurück zum Zitat Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., & Saul, L.K. (1999). An introduction to variational methods for graphical models. Machine learning, 37(2), 183–233. Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., & Saul, L.K. (1999). An introduction to variational methods for graphical models. Machine learning, 37(2), 183–233.
Zurück zum Zitat Kaufman, L., & Rousseeuw, P.J. (2009). Finding groups in data: an introduction to cluster analysis Vol. 344. New York: Wiley. Kaufman, L., & Rousseeuw, P.J. (2009). Finding groups in data: an introduction to cluster analysis Vol. 344. New York: Wiley.
Zurück zum Zitat Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., & Chouvarda, I. (2017). Machine learning and data mining methods in diabetes research. Computational and Structural Biotechnology Journal, 15, 104–116. Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., & Chouvarda, I. (2017). Machine learning and data mining methods in diabetes research. Computational and Structural Biotechnology Journal, 15, 104–116.
Zurück zum Zitat Kushner, H., & Yin, G.G. (2003). Stochastic approximation and recursive algorithms and applications, volume 35 Springer Science & Business Media. Kushner, H., & Yin, G.G. (2003). Stochastic approximation and recursive algorithms and applications, volume 35 Springer Science & Business Media.
Zurück zum Zitat Lowe, D.G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91–110. Lowe, D.G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91–110.
Zurück zum Zitat McLachlan, G.J. (2015). Mixture models in statistics. McLachlan, G.J. (2015). Mixture models in statistics.
Zurück zum Zitat McLachlan, G.J., & Krishnan, T. (2007). The EM algorithm and extensions Vol. 382. New York: Wiley. McLachlan, G.J., & Krishnan, T. (2007). The EM algorithm and extensions Vol. 382. New York: Wiley.
Zurück zum Zitat McLachlan, G., & Peel, D. (2004). Finite mixture models. John Wiley & Sons: New York. McLachlan, G., & Peel, D. (2004). Finite mixture models. John Wiley & Sons: New York.
Zurück zum Zitat Nguyen, H., Kalra, M., Azam, M., & Bouguila, N. (2019). Data clustering using online variational learning of finite scaled dirichlet mixture models. In 20th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2019 (pp. 267–274). Nguyen, H., Kalra, M., Azam, M., & Bouguila, N. (2019). Data clustering using online variational learning of finite scaled dirichlet mixture models. In 20th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2019 (pp. 267–274).
Zurück zum Zitat Opper, M, & Saad, D. (2001). Advanced mean field methods: theory and practice neural information processing series. Opper, M, & Saad, D. (2001). Advanced mean field methods: theory and practice neural information processing series.
Zurück zum Zitat Rasmussen, C.E. (1996). A practical monte carlo implementation of bayesian learning. In Advances in Neural Information Processing Systems (pp. 598–604). Rasmussen, C.E. (1996). A practical monte carlo implementation of bayesian learning. In Advances in Neural Information Processing Systems (pp. 598–604).
Zurück zum Zitat Rasmussen, C.E. (2000). The infinite gaussian mixture model. In Advances in neural information processing systems (pp. 554–560). Rasmussen, C.E. (2000). The infinite gaussian mixture model. In Advances in neural information processing systems (pp. 554–560).
Zurück zum Zitat Rasmussen, C.E., & Ghahramani, Z. (2002). Infinite mixtures of gaussian process experts. In Advances in neural information processing systems (pp. 881–888). Rasmussen, C.E., & Ghahramani, Z. (2002). Infinite mixtures of gaussian process experts. In Advances in neural information processing systems (pp. 881–888).
Zurück zum Zitat Sato, M.A. (2001). Online model selection based on the variational bayes. Neural Computation, 13(7), 1649–1681. Sato, M.A. (2001). Online model selection based on the variational bayes. Neural Computation, 13(7), 1649–1681.
Zurück zum Zitat Sethuraman, J. (1994). A constructive definition of dirichlet priors. Statistica Sinica. 639–650. Sethuraman, J. (1994). A constructive definition of dirichlet priors. Statistica Sinica. 639–650.
Zurück zum Zitat Zhu, Y., & Tan, Y. (2010). A local-concentration-based feature extraction approach for spam filtering. IEEE Transactions on Information Forensics and Security, 6(2), 486–497. Zhu, Y., & Tan, Y. (2010). A local-concentration-based feature extraction approach for spam filtering. IEEE Transactions on Information Forensics and Security, 6(2), 486–497.
Metadaten
Titel
Online Variational Learning of Dirichlet Process Mixtures of Scaled Dirichlet Distributions
verfasst von
Narges Manouchehri
Hieu Nguyen
Pantea Koochemeshkian
Nizar Bouguila
Wentao Fan
Publikationsdatum
21.07.2020
Verlag
Springer US
Erschienen in
Information Systems Frontiers / Ausgabe 5/2020
Print ISSN: 1387-3326
Elektronische ISSN: 1572-9419
DOI
https://doi.org/10.1007/s10796-020-10027-2

Weitere Artikel der Ausgabe 5/2020

Information Systems Frontiers 5/2020 Zur Ausgabe

Premium Partner