Skip to main content
Erschienen in: Knowledge and Information Systems 1/2019

19.05.2018 | Regular Paper

Multi-label classification using stacked hierarchical Dirichlet processes with reduced sampling complexity

verfasst von: Sophie Burkhardt, Stefan Kramer

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nonparametric topic models based on hierarchical Dirichlet processes (HDPs) allow for the number of topics to be automatically discovered from the data. The computational complexity of standard Gibbs sampling techniques for model training is linear in the number of topics. Recently, it was reduced to be linear in the number of topics per word using a technique called alias sampling combined with Metropolis Hastings (MH) sampling. We propose a different proposal distribution for the MH step based on the observation that distributions on the upper hierarchy level change slower than the document-specific distributions at the lower level. This reduces the sampling complexity, making it linear in the number of topics per document by using an approximation based on Metropolis–Hastings sampling. By utilizing a single global distribution, we are able to further improve the test set log-likelihood of this approximation. Furthermore, we propose a novel model of stacked HDPs utilizing this sampling method. An extensive analysis reveals the importance of the correct setting of hyperparameters for classification and shows the convergence properties of our method. Experiments demonstrate the effectiveness of the proposed approach in the context of multi-label classification as compared to previous Dependency-LDA models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
See Buntine and Hutter [3] for an efficient way to compute ratios of these numbers. They can be precomputed once and subsequently retrieved in O(1). Note that it may be necessary to store large values sparsely if the number of tokens in a restaurant becomes very large.
 
2
This improved method can also be applied if \(a>0\), i.e., we are dealing with a hierarchical Poisson–Dirichlet topic model. In this case, we need to divide q by \((b_1+M_d)\) and remultiply this factor when subtracting q from p.
 
3
see Papanikolaou et al. [15] for a formal justification of this approach.
 
Literatur
1.
Zurück zum Zitat Antoniak CE (1974) Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Ann Stat 2(6):1152–1174MathSciNetCrossRefMATH Antoniak CE (1974) Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Ann Stat 2(6):1152–1174MathSciNetCrossRefMATH
2.
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022MATH
4.
Zurück zum Zitat Buntine WL, Mishra S (2014) Experiments with non-parametric topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 881–890 Buntine WL, Mishra S (2014) Experiments with non-parametric topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 881–890
5.
Zurück zum Zitat Burkhardt S, Kramer S (2017) Multi-label classification using stacked hierarchical dirichlet processes with reduced sampling complexity. In: ICBK 2017—international conference on big knowledge, IEEE, pp 1–8 Burkhardt S, Kramer S (2017) Multi-label classification using stacked hierarchical dirichlet processes with reduced sampling complexity. In: ICBK 2017—international conference on big knowledge, IEEE, pp 1–8
6.
Zurück zum Zitat Burkhardt S, Kramer S (2017) Online sparse collapsed hybrid variational-gibbs algorithm for hierarchical dirichlet process topic models. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Proceedings of ECML-PKDD 2017. Springer International Publishing, Cham, pp 189–204 Burkhardt S, Kramer S (2017) Online sparse collapsed hybrid variational-gibbs algorithm for hierarchical dirichlet process topic models. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Proceedings of ECML-PKDD 2017. Springer International Publishing, Cham, pp 189–204
7.
Zurück zum Zitat Chen C, Du L, Buntine W (2011) Sampling table configurations for the hierarchical Poisson–Dirichlet process. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Proceedings of ECML-PKDD. Springer, Heidelberg, pp 296–311 Chen C, Du L, Buntine W (2011) Sampling table configurations for the hierarchical Poisson–Dirichlet process. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Proceedings of ECML-PKDD. Springer, Heidelberg, pp 296–311
8.
Zurück zum Zitat Katakis I, Tsoumakas G, Vlahavas I (2008) Multilabel text classification for automated tag suggestion. In: ECML-PKDD discovery challenge, vol 75 Katakis I, Tsoumakas G, Vlahavas I (2008) Multilabel text classification for automated tag suggestion. In: ECML-PKDD discovery challenge, vol 75
9.
Zurück zum Zitat Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397 Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397
10.
Zurück zum Zitat Li AQ, Ahmed A, Ravi S, Smola AJ (2014) Reducing the sampling complexity of topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 891–900 Li AQ, Ahmed A, Ravi S, Smola AJ (2014) Reducing the sampling complexity of topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 891–900
11.
Zurück zum Zitat Li C, Cheung WK, Ye Y, Zhang X, Chu D, Li X (2015) The author-topic-community model for author interest profiling and community discovery. Knowl Inf Syst 44(2):359–383CrossRef Li C, Cheung WK, Ye Y, Zhang X, Chu D, Li X (2015) The author-topic-community model for author interest profiling and community discovery. Knowl Inf Syst 44(2):359–383CrossRef
12.
Zurück zum Zitat Li W (2007) Pachinko allocation: DAG-structured mixture models of topic correlations. Ph.D. thesis, University of Massachusetts Amherst Li W (2007) Pachinko allocation: DAG-structured mixture models of topic correlations. Ph.D. thesis, University of Massachusetts Amherst
13.
Zurück zum Zitat Loza Mencía E, Fürnkranz J (2010) Efficient multilabel classification algorithms for large-scale problems in the legal domain. In: Francesconi E, Montemagni S, Peters W, Tiscornia D (eds) Semantic processing of legal texts—where the language of law meets the law of language. Lecture notes in artificial intelligence, vol 6036, 1st edn. Springer, pp 192–215 Loza Mencía E, Fürnkranz J (2010) Efficient multilabel classification algorithms for large-scale problems in the legal domain. In: Francesconi E, Montemagni S, Peters W, Tiscornia D (eds) Semantic processing of legal texts—where the language of law meets the law of language. Lecture notes in artificial intelligence, vol 6036, 1st edn. Springer, pp 192–215
14.
Zurück zum Zitat Nam J, Kim J, Loza Mencía E, Gurevych I, Fürnkranz J (2014) Large-scale multi-label text classification—revisiting neural networks. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Proceedings of ECML-PKDD, part II. Springer, Heidelberg, pp 437–452 Nam J, Kim J, Loza Mencía E, Gurevych I, Fürnkranz J (2014) Large-scale multi-label text classification—revisiting neural networks. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Proceedings of ECML-PKDD, part II. Springer, Heidelberg, pp 437–452
15.
Zurück zum Zitat Papanikolaou Y, Foulds JR, Rubin, TN, Tsoumakas G (2015) Dense distributions from sparse samples: improved Gibbs sampling parameter estimators for LDA. ArXiv e-prints Papanikolaou Y, Foulds JR, Rubin, TN, Tsoumakas G (2015) Dense distributions from sparse samples: improved Gibbs sampling parameter estimators for LDA. ArXiv e-prints
16.
Zurück zum Zitat Prabhu Y, Varma M (2014) Fastxml: a fast, accurate and stable tree-classifier for extreme multi-label learning. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 263–272 Prabhu Y, Varma M (2014) Fastxml: a fast, accurate and stable tree-classifier for extreme multi-label learning. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 263–272
17.
Zurück zum Zitat Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 1, EMNLP ’09, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 248–256 Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 1, EMNLP ’09, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 248–256
18.
Zurück zum Zitat Ramage D, Manning CD, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11, ACM, New York, NY, USA, pp 457–465 Ramage D, Manning CD, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11, ACM, New York, NY, USA, pp 457–465
19.
Zurück zum Zitat Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359MathSciNetCrossRef Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359MathSciNetCrossRef
20.
Zurück zum Zitat Ren L, Dunson DB, Carin L (2008) The dynamic hierarchical Dirichlet process. In: Proceedings of the 25th ICML international conference on machine learning, ACM, pp 824–831 Ren L, Dunson DB, Carin L (2008) The dynamic hierarchical Dirichlet process. In: Proceedings of the 25th ICML international conference on machine learning, ACM, pp 824–831
22.
Zurück zum Zitat Rubin TN, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multi-label document classification. Mach Learn 88(1–2):157–208MathSciNetCrossRefMATH Rubin TN, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multi-label document classification. Mach Learn 88(1–2):157–208MathSciNetCrossRefMATH
23.
Zurück zum Zitat Salakhutdinov R, Tenenbaum JB, Torralba A (2013) Learning with hierarchical-deep models. IEEE Trans Pattern Anal Mach Intell 35(8):1958–1971CrossRef Salakhutdinov R, Tenenbaum JB, Torralba A (2013) Learning with hierarchical-deep models. IEEE Trans Pattern Anal Mach Intell 35(8):1958–1971CrossRef
24.
Zurück zum Zitat Shimosaka M, Tsukiji T, Tominaga S, Tsubouchi K (2016) Coupled hierarchical Dirichlet process mixtures for simultaneous clustering and topic modeling. In: Frasconi P, Landwehr N, Manco G, Vreeken J (eds) Proceedings of ECML-PKDD. Springer International Publishing, Cham, pp 230–246 Shimosaka M, Tsukiji T, Tominaga S, Tsubouchi K (2016) Coupled hierarchical Dirichlet process mixtures for simultaneous clustering and topic modeling. In: Frasconi P, Landwehr N, Manco G, Vreeken J (eds) Proceedings of ECML-PKDD. Springer International Publishing, Cham, pp 230–246
25.
26.
Zurück zum Zitat Tsoumakas G, Katakis I, Vlahavas IP (2008) Effective and efficient multilabel classification in domains with large number of labels. In: ECML/PKDD 2008 workshop on mining multidimensional data Tsoumakas G, Katakis I, Vlahavas IP (2008) Effective and efficient multilabel classification in domains with large number of labels. In: ECML/PKDD 2008 workshop on mining multidimensional data
27.
Zurück zum Zitat Wood F, Archambeau C, Gasthaus J, James L, Teh YW (2009) A stochastic memoizer for sequence data. In: Proceedings of the 26th ICML international conference on machine learning, ACM, pp 1129–1136 Wood F, Archambeau C, Gasthaus J, James L, Teh YW (2009) A stochastic memoizer for sequence data. In: Proceedings of the 26th ICML international conference on machine learning, ACM, pp 1129–1136
28.
Zurück zum Zitat Yen IEH, Huang X, Ravikumar P, Zhong K, Dhillon I (2016) Pd-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: Proceedings of the 33rd international conference on machine learning, ACM, pp 3069–3077 Yen IEH, Huang X, Ravikumar P, Zhong K, Dhillon I (2016) Pd-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: Proceedings of the 33rd international conference on machine learning, ACM, pp 3069–3077
29.
Zurück zum Zitat Zuo Y, Zhao J, Xu K (2016) Word network topic model: a simple but general solution for short and imbalanced texts. Knowl Inf Syst 48(2):379–398CrossRef Zuo Y, Zhao J, Xu K (2016) Word network topic model: a simple but general solution for short and imbalanced texts. Knowl Inf Syst 48(2):379–398CrossRef
Metadaten
Titel
Multi-label classification using stacked hierarchical Dirichlet processes with reduced sampling complexity
verfasst von
Sophie Burkhardt
Stefan Kramer
Publikationsdatum
19.05.2018
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 1/2019
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-018-1204-z

Weitere Artikel der Ausgabe 1/2019

Knowledge and Information Systems 1/2019 Zur Ausgabe