nach oben

Knowledge and Information Systems

Erschienen in:

19.05.2018 | Regular Paper

Multi-label classification using stacked hierarchical Dirichlet processes with reduced sampling complexity

verfasst von: Sophie Burkhardt, Stefan Kramer

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Nonparametric topic models based on hierarchical Dirichlet processes (HDPs) allow for the number of topics to be automatically discovered from the data. The computational complexity of standard Gibbs sampling techniques for model training is linear in the number of topics. Recently, it was reduced to be linear in the number of topics per word using a technique called alias sampling combined with Metropolis Hastings (MH) sampling. We propose a different proposal distribution for the MH step based on the observation that distributions on the upper hierarchy level change slower than the document-specific distributions at the lower level. This reduces the sampling complexity, making it linear in the number of topics per document by using an approximation based on Metropolis–Hastings sampling. By utilizing a single global distribution, we are able to further improve the test set log-likelihood of this approximation. Furthermore, we propose a novel model of stacked HDPs utilizing this sampling method. An extensive analysis reveals the importance of the correct setting of hyperparameters for classification and shows the convergence properties of our method. Experiments demonstrate the effectiveness of the proposed approach in the context of multi-label classification as compared to previous Dependency-LDA models.

Vorheriger Artikel Fast crawling methods of exploring content distributed over large graphs

Nächster Artikel A large margin time series nearest neighbour classification under locally weighted time warps

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

See Buntine and Hutter [3] for an efficient way to compute ratios of these numbers. They can be precomputed once and subsequently retrieved in O(1). Note that it may be necessary to store large values sparsely if the number of tokens in a restaurant becomes very large.

This improved method can also be applied if \(a>0\), i.e., we are dealing with a hierarchical Poisson–Dirichlet topic model. In this case, we need to divide q by \((b_1+M_d)\) and remultiply this factor when subtracting q from p.

see Papanikolaou et al. [15] for a formal justification of this approach.

Antoniak CE (1974) Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Ann Stat 2(6):1152–1174MathSciNetCrossRefMATH

Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022MATH

Buntine W, Hutter M (2010) A Bayesian view of the Poisson–Dirichlet process. arXiv preprint arXiv:1007.0296

Buntine WL, Mishra S (2014) Experiments with non-parametric topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 881–890

Burkhardt S, Kramer S (2017) Multi-label classification using stacked hierarchical dirichlet processes with reduced sampling complexity. In: ICBK 2017—international conference on big knowledge, IEEE, pp 1–8

Burkhardt S, Kramer S (2017) Online sparse collapsed hybrid variational-gibbs algorithm for hierarchical dirichlet process topic models. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Proceedings of ECML-PKDD 2017. Springer International Publishing, Cham, pp 189–204

Chen C, Du L, Buntine W (2011) Sampling table configurations for the hierarchical Poisson–Dirichlet process. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Proceedings of ECML-PKDD. Springer, Heidelberg, pp 296–311

Katakis I, Tsoumakas G, Vlahavas I (2008) Multilabel text classification for automated tag suggestion. In: ECML-PKDD discovery challenge, vol 75

Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397

10.

Li AQ, Ahmed A, Ravi S, Smola AJ (2014) Reducing the sampling complexity of topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 891–900

11.

Li C, Cheung WK, Ye Y, Zhang X, Chu D, Li X (2015) The author-topic-community model for author interest profiling and community discovery. Knowl Inf Syst 44(2):359–383CrossRef

12.

Li W (2007) Pachinko allocation: DAG-structured mixture models of topic correlations. Ph.D. thesis, University of Massachusetts Amherst

13.

Loza Mencía E, Fürnkranz J (2010) Efficient multilabel classification algorithms for large-scale problems in the legal domain. In: Francesconi E, Montemagni S, Peters W, Tiscornia D (eds) Semantic processing of legal texts—where the language of law meets the law of language. Lecture notes in artificial intelligence, vol 6036, 1st edn. Springer, pp 192–215

14.

Nam J, Kim J, Loza Mencía E, Gurevych I, Fürnkranz J (2014) Large-scale multi-label text classification—revisiting neural networks. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Proceedings of ECML-PKDD, part II. Springer, Heidelberg, pp 437–452

15.

Papanikolaou Y, Foulds JR, Rubin, TN, Tsoumakas G (2015) Dense distributions from sparse samples: improved Gibbs sampling parameter estimators for LDA. ArXiv e-prints

16.

Prabhu Y, Varma M (2014) Fastxml: a fast, accurate and stable tree-classifier for extreme multi-label learning. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 263–272

17.

Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 1, EMNLP ’09, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 248–256

18.

Ramage D, Manning CD, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11, ACM, New York, NY, USA, pp 457–465

19.

Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359MathSciNetCrossRef

20.

Ren L, Dunson DB, Carin L (2008) The dynamic hierarchical Dirichlet process. In: Proceedings of the 25th ICML international conference on machine learning, ACM, pp 824–831

21.

Rodríguez A, Dunson DB, Gelfand AE (2008) The nested Dirichlet process. J Am Stat Assoc 103(483):1131–1154MathSciNetCrossRefMATH

22.

Rubin TN, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multi-label document classification. Mach Learn 88(1–2):157–208MathSciNetCrossRefMATH

23.

Salakhutdinov R, Tenenbaum JB, Torralba A (2013) Learning with hierarchical-deep models. IEEE Trans Pattern Anal Mach Intell 35(8):1958–1971CrossRef

24.

Shimosaka M, Tsukiji T, Tominaga S, Tsubouchi K (2016) Coupled hierarchical Dirichlet process mixtures for simultaneous clustering and topic modeling. In: Frasconi P, Landwehr N, Manco G, Vreeken J (eds) Proceedings of ECML-PKDD. Springer International Publishing, Cham, pp 230–246

25.

Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101(476):1566–1581MathSciNetCrossRefMATH

26.

Tsoumakas G, Katakis I, Vlahavas IP (2008) Effective and efficient multilabel classification in domains with large number of labels. In: ECML/PKDD 2008 workshop on mining multidimensional data

27.

Wood F, Archambeau C, Gasthaus J, James L, Teh YW (2009) A stochastic memoizer for sequence data. In: Proceedings of the 26th ICML international conference on machine learning, ACM, pp 1129–1136

28.

Yen IEH, Huang X, Ravikumar P, Zhong K, Dhillon I (2016) Pd-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: Proceedings of the 33rd international conference on machine learning, ACM, pp 3069–3077

29.

Zuo Y, Zhao J, Xu K (2016) Word network topic model: a simple but general solution for short and imbalanced texts. Knowl Inf Syst 48(2):379–398CrossRef

Titel: Multi-label classification using stacked hierarchical Dirichlet processes with reduced sampling complexity
verfasst von: Sophie Burkhardt
Stefan Kramer
Publikationsdatum: 19.05.2018
Verlag: Springer London
Erschienen in: Knowledge and Information Systems / Ausgabe 1/2019
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-018-1204-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2019

RTCRelief-F: an effective clustering and ordering-based ensemble pruning algorithm for facial expression recognition

A comprehensive empirical comparison of hubness reduction in high-dimensional spaces

A smoothed monotonic regression via L2 regularization

DIS-C: conceptual distance in ontologies, a graph-based approach

Fast detection of community structures using graph traversal in social networks

Expert deduction rules in data mining with association rules: a case study