Skip to main content
Erschienen in: Journal of Intelligent Information Systems 1/2016

01.02.2016

Labelset topic model for multi-label document classification

verfasst von: Ximing Li, Jihong Ouyang, Xiaotang Zhou

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

It has recently been suggested that assuming independence between labels is not suitable for real-world multi-label classification. To account for label dependencies, this paper proposes a supervised topic modeling algorithm, namely labelset topic model (LsTM). Our algorithm uses two labelset layers to capture label dependencies. LsTM offers two major advantages over existing supervised topic modeling algorithms: it is straightforward to interpret and it allows words to be assigned to combinations of labels, rather than a single label. We have performed extensive experiments on several well-known multi-label datasets. Experimental results indicate that the proposed model achieves performance on par with and often exceeding that of state-of-the-art methods both qualitatively and quantitatively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 993–1022. Blei, D.M., Ng, A.Y., Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 993–1022.
Zurück zum Zitat Blei, D.M., & Lafferty, J.D. (2007). A correlated topic model fo science. The Annals of Applied Statistics, 17–35. Blei, D.M., & Lafferty, J.D. (2007). A correlated topic model fo science. The Annals of Applied Statistics, 17–35.
Zurück zum Zitat Boutell, M.R., Luo, J., Shen, X., Browna, C.M. (2004). Learning multi-label scene classification. Pattern Recognition, 1757–1771. Boutell, M.R., Luo, J., Shen, X., Browna, C.M. (2004). Learning multi-label scene classification. Pattern Recognition, 1757–1771.
Zurück zum Zitat Brinker, K., & Hullermeier, E. (2007). Case-based multilabel ranking. In International joint conference on artificial intelligence (pp. 702–707). Brinker, K., & Hullermeier, E. (2007). Case-based multilabel ranking. In International joint conference on artificial intelligence (pp. 702–707).
Zurück zum Zitat Clare, A., & King, R.D. (2001). Knowledge discovery in multi-label phenotype data. Principles of Data Mining and Dnowledge Discovery, 42–53. Clare, A., & King, R.D. (2001). Knowledge discovery in multi-label phenotype data. Principles of Data Mining and Dnowledge Discovery, 42–53.
Zurück zum Zitat Elisseeff, A. (2002). JasonWeston: a kernel method for multi-labelled classification. In Neural information processing systems. Elisseeff, A. (2002). JasonWeston: a kernel method for multi-labelled classification. In Neural information processing systems.
Zurück zum Zitat Fan, J., Gao, Y., Luo, H. (2007). Hierarchical classifcation for automatic image annotation. In International ACM SIGIR conference on research and development in information retrieval (pp. 111–118). Fan, J., Gao, Y., Luo, H. (2007). Hierarchical classifcation for automatic image annotation. In International ACM SIGIR conference on research and development in information retrieval (pp. 111–118).
Zurück zum Zitat Griffiths, T.L., & Steyvers, M. (2004). Finding scientific topics. In National academy of sciences of the United States of America (Vol. 101–101, pp. 5228–5235). Griffiths, T.L., & Steyvers, M. (2004). Finding scientific topics. In National academy of sciences of the United States of America (Vol. 101–101, pp. 5228–5235).
Zurück zum Zitat Guo, Y., & Gu, S. (2011). Multi-label classification using conditional dependency networks. In International joint conference on artificial intelligence (pp. 1300–1305). Guo, Y., & Gu, S. (2011). Multi-label classification using conditional dependency networks. In International joint conference on artificial intelligence (pp. 1300–1305).
Zurück zum Zitat Hofmann, T (1999). Probabilistic latent semantic indexing. In ACM SIGIR international conference on research and development in information retrieval (pp. 50–57). Hofmann, T (1999). Probabilistic latent semantic indexing. In ACM SIGIR international conference on research and development in information retrieval (pp. 50–57).
Zurück zum Zitat Ji, S., Tang, L., Yu, S., Ye, J (2008). Extracting shared subspace for multi-label classification. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 381–389). Ji, S., Tang, L., Yu, S., Ye, J (2008). Extracting shared subspace for multi-label classification. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 381–389).
Zurück zum Zitat Jiang, W., & Ras, Z.W. (2013). Multi-label automatic indexing of music by cascade classifiers. Web Intelligence and Agent Systems International Journal, 149–170. Jiang, W., & Ras, Z.W. (2013). Multi-label automatic indexing of music by cascade classifiers. Web Intelligence and Agent Systems International Journal, 149–170.
Zurück zum Zitat Kazawa, H., Izumitani, T., Taira, H., Maeda, E. (2004). Maximal margin labeling for multi-topic text categorization. In Neural information processing systems (pp. 649–656). Kazawa, H., Izumitani, T., Taira, H., Maeda, E. (2004). Maximal margin labeling for multi-topic text categorization. In Neural information processing systems (pp. 649–656).
Zurück zum Zitat Kim, D., Kim, S., Oh, A. (2012). Dirichlet process with mixed random measures: a nonparametric topic model for labeled data. In International conference on machine learning (pp. 727–734). Kim, D., Kim, S., Oh, A. (2012). Dirichlet process with mixed random measures: a nonparametric topic model for labeled data. In International conference on machine learning (pp. 727–734).
Zurück zum Zitat Lewis, D.D., Yang, Y., Rose, T.G., Li, F. (2004). Rcv1: a new benchmark collection for text categorization research. Journal of Machine Learning Research, 361–397. Lewis, D.D., Yang, Y., Rose, T.G., Li, F. (2004). Rcv1: a new benchmark collection for text categorization research. Journal of Machine Learning Research, 361–397.
Zurück zum Zitat Li, T., & Ogihara, M. (2006). Towards intelligent music information retrieval. IEEE Transactions on Multimedia, 564–574. Li, T., & Ogihara, M. (2006). Towards intelligent music information retrieval. IEEE Transactions on Multimedia, 564–574.
Zurück zum Zitat Li, W., & McCallum, A. (2006). Pachinko allocation: dag-structured mixture models of topic correlations. In International conference on machine learning (pp. 577–584). Li, W., & McCallum, A. (2006). Pachinko allocation: dag-structured mixture models of topic correlations. In International conference on machine learning (pp. 577–584).
Zurück zum Zitat Nguyen, V.A., Boyd-Graber, J., Chang, J., Resnik, P. (2013). Tree-based label dependency topic models. In Neural information processins systems workshop on topic models. Nguyen, V.A., Boyd-Graber, J., Chang, J., Resnik, P. (2013). Tree-based label dependency topic models. In Neural information processins systems workshop on topic models.
Zurück zum Zitat Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J (2007). Correlative multi-label video annotation. In International conference on music information retrieval (pp. 17–26). Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J (2007). Correlative multi-label video annotation. In International conference on music information retrieval (pp. 17–26).
Zurück zum Zitat Ramage, D., Hall, D., Nallapati, R., Manning, C.D. (2009). Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In Conference on empirical methods in natural language processing (pp. 248–256). Ramage, D., Hall, D., Nallapati, R., Manning, C.D. (2009). Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In Conference on empirical methods in natural language processing (pp. 248–256).
Zurück zum Zitat Ramage, D., Manning, C.D., Dumais, S. (2011). Partially labeled topic models for interpretable text mining. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 457–465). Ramage, D., Manning, C.D., Dumais, S. (2011). Partially labeled topic models for interpretable text mining. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 457–465).
Zurück zum Zitat Read, J., Pfahringer, B., Holmes, G., Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 1–27. Read, J., Pfahringer, B., Holmes, G., Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 1–27.
Zurück zum Zitat Rubin, T.N., Chambers, A., Smyth, P., Steyvers, M. (2012). Statistical topic models for multi-label document classification. Machine learning, 157–208. Rubin, T.N., Chambers, A., Smyth, P., Steyvers, M. (2012). Statistical topic models for multi-label document classification. Machine learning, 157–208.
Zurück zum Zitat Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 1566–1581. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 1566–1581.
Zurück zum Zitat Trohidis, K., Tsoumakas, G., Kalliris, G., Vlahavas, I. (2008). Multilabel classification of music into emotions. In International conference on music information retrieval. Trohidis, K., Tsoumakas, G., Kalliris, G., Vlahavas, I. (2008). Multilabel classification of music into emotions. In International conference on music information retrieval.
Zurück zum Zitat Tsoumakas, G., & Katakis, I. (2007). Multi label classification: an overview. International Journal of Data Warehousing and Mining, 1–13. Tsoumakas, G., & Katakis, I. (2007). Multi label classification: an overview. International Journal of Data Warehousing and Mining, 1–13.
Zurück zum Zitat Tsoumakas, G., Katakis, I., Vlahavas, I. (2011). Random k-labelsets for multi-label classification. IEEE Transactions on Knowledge and Data Engineering, 1079–1089. Tsoumakas, G., Katakis, I., Vlahavas, I. (2011). Random k-labelsets for multi-label classification. IEEE Transactions on Knowledge and Data Engineering, 1079–1089.
Zurück zum Zitat Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I. (2011). Mulan: a java library for multi-label learning. Journal of Machine Learning Research, 2411–2414. Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I. (2011). Mulan: a java library for multi-label learning. Journal of Machine Learning Research, 2411–2414.
Zurück zum Zitat Ueda, N., & Saito, K. (2002). Single-shot detection of multiple categories of text using parametric mixture models. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 626–631). Ueda, N., & Saito, K. (2002). Single-shot detection of multiple categories of text using parametric mixture models. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 626–631).
Zurück zum Zitat Wang, C., Yan, S., Zhang, L., Zhang, H.J. (2009). Multi-label sparse coding for automatic image annotation. In IEEE conference on computer vision and pattern recognition (pp. 1643–1650). Wang, C., Yan, S., Zhang, L., Zhang, H.J. (2009). Multi-label sparse coding for automatic image annotation. In IEEE conference on computer vision and pattern recognition (pp. 1643–1650).
Zurück zum Zitat Yuret, D., Yatbaz, M.A., Ural, A.E. (2008). Discriminative vs. generative approaches in semantic role labeling. In Conference on computational natural language learning (pp. 223–227). Yuret, D., Yatbaz, M.A., Ural, A.E. (2008). Discriminative vs. generative approaches in semantic role labeling. In Conference on computational natural language learning (pp. 223–227).
Zurück zum Zitat Zhang, M.L., & Zhou, Z.H. (2006). Multi-label neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering, 1338–1351. Zhang, M.L., & Zhou, Z.H. (2006). Multi-label neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering, 1338–1351.
Zurück zum Zitat Zhang, Y., Burer, S., Street, W.N. (2006). Ensemble pruning via semi-definite programming. Journal of Machine Learning Research, 1315–1338. Zhang, Y., Burer, S., Street, W.N. (2006). Ensemble pruning via semi-definite programming. Journal of Machine Learning Research, 1315–1338.
Zurück zum Zitat Zhang, M.L., & Zhou, Z.H. (2007). Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognition, 2038–2048. Zhang, M.L., & Zhou, Z.H. (2007). Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognition, 2038–2048.
Zurück zum Zitat Zhang, M.L. (2009). Ml-rbf: Rbf neural networks for multi-label learning. Neural Processing Letters, 61–74. Zhang, M.L. (2009). Ml-rbf: Rbf neural networks for multi-label learning. Neural Processing Letters, 61–74.
Metadaten
Titel
Labelset topic model for multi-label document classification
verfasst von
Ximing Li
Jihong Ouyang
Xiaotang Zhou
Publikationsdatum
01.02.2016
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 1/2016
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-014-0352-1

Weitere Artikel der Ausgabe 1/2016

Journal of Intelligent Information Systems 1/2016 Zur Ausgabe