Skip to main content

2018 | OriginalPaper | Buchkapitel

Exploiting Instance Relationship for Effective Extreme Multi-label Learning

verfasst von : Feifei Li, Hongyan Liu, Jun He, Xiaoyong Du

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Extreme multi-label classification is an important data mining technique, which can be used to label each unseen instance with a subset of labels from a large label set. It has wide applications and many methods have been proposed in recent years. Existing methods either seek to compress label space or train a classifier based on instances’ features, among which tree-based classifiers enjoy the advantages of better efficiency and accuracy. In many real world applications, instances are not independent and relationship between instances is very useful information. However, how to utilize relationship between instances in extreme multi-label classification is less studied. Exploiting such relationship may help improve prediction accuracy, especially in the circumstance that feature space is very sparse. In this paper, we study how to utilize the similarity between instances to build more accurate tree-based extreme multi-label classifiers. To this end, we introduce the utilization of relationship between instances to state-of-the-art models in two ways: feature engineering and collaborative labeling. Extensive experiments conducted on three real world datasets demonstrate that our proposed method achieves higher accuracy than the state-of-the-art models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal, R., Gupta, A., Prabhu, Y., Varma, M.: Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 13–24. ACM (2013) Agrawal, R., Gupta, A., Prabhu, Y., Varma, M.: Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 13–24. ACM (2013)
2.
Zurück zum Zitat Beygelzimer, A., Langford, J., Lifshits, Y., Sorkin, G., Strehl, A.: Conditional probability tree estimation analysis and algorithms. Eprint Arxiv, pp. 51–58 (2009) Beygelzimer, A., Langford, J., Lifshits, Y., Sorkin, G., Strehl, A.: Conditional probability tree estimation analysis and algorithms. Eprint Arxiv, pp. 51–58 (2009)
3.
Zurück zum Zitat Bhatia, K., Jain, H., Kar, P., Varma, M., Jain, P.: Sparse local embeddings for extreme multi-label classification. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems vol. 28, pp. 730–738. Curran Associates, Inc. (2015) Bhatia, K., Jain, H., Kar, P., Varma, M., Jain, P.: Sparse local embeddings for extreme multi-label classification. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems vol. 28, pp. 730–738. Curran Associates, Inc. (2015)
4.
Zurück zum Zitat Bi, W., Kwok, J.: Efficient multi-label classification with many labels. In: International Conference on Machine Learning, pp. 405–413 (2013) Bi, W., Kwok, J.: Efficient multi-label classification with many labels. In: International Conference on Machine Learning, pp. 405–413 (2013)
5.
Zurück zum Zitat Dembczynski, K., Cheng, W., Hllermeier, E.: Bayes optimal multilabel classification via probabilistic classifier chains. In: International Conference on Machine Learning, pp. 279–286 (2010) Dembczynski, K., Cheng, W., Hllermeier, E.: Bayes optimal multilabel classification via probabilistic classifier chains. In: International Conference on Machine Learning, pp. 279–286 (2010)
6.
Zurück zum Zitat Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10(18), 2899–2934 (2009)MathSciNetMATH Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10(18), 2899–2934 (2009)MathSciNetMATH
7.
Zurück zum Zitat Getoor, L.: Introduction to Statistical Relational Learning. MIT press, Cambridge (2007)CrossRef Getoor, L.: Introduction to Statistical Relational Learning. MIT press, Cambridge (2007)CrossRef
8.
Zurück zum Zitat He, W., Liu, H., He, J., Tang, S., Du, X.: Extracting interest tags for non-famous users in social network. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 861–870. ACM (2015) He, W., Liu, H., He, J., Tang, S., Du, X.: Extracting interest tags for non-famous users in social network. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 861–870. ACM (2015)
9.
Zurück zum Zitat Jain, H., Prabhu, Y., Varma, M.: Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–944. ACM (2016) Jain, H., Prabhu, Y., Varma, M.: Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–944. ACM (2016)
10.
Zurück zum Zitat Jain, P., Meka, R., Dhillon, I.S.: Guaranteed rank minimization via singular value projection. In: Advances in Neural Information Processing Systems, pp. 937–945 (2010) Jain, P., Meka, R., Dhillon, I.S.: Guaranteed rank minimization via singular value projection. In: Advances in Neural Information Processing Systems, pp. 937–945 (2010)
11.
Zurück zum Zitat Jasinska, K., Ski, K.D., Busa-Fekete, R., Pfannschmidt, K., Klerx, T., Llermeier, E.H.: Extreme f-measure maximization using sparse probability estimates. In: ICML (2016) Jasinska, K., Ski, K.D., Busa-Fekete, R., Pfannschmidt, K., Klerx, T., Llermeier, E.H.: Extreme f-measure maximization using sparse probability estimates. In: ICML (2016)
12.
Zurück zum Zitat Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(Apr), 361–397 (2004) Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(Apr), 361–397 (2004)
13.
Zurück zum Zitat Lin, Z., Ding, G., Hu, M., Wang, J.: Multi-label classification via feature-aware implicit label space encoding. In: ICML, pp. 325–333 (2014) Lin, Z., Ding, G., Hu, M., Wang, J.: Multi-label classification via feature-aware implicit label space encoding. In: ICML, pp. 325–333 (2014)
14.
Zurück zum Zitat Macskassy, S.A., Provost, F.: A simple relational classifier. Technical report, DTIC Document (2003) Macskassy, S.A., Provost, F.: A simple relational classifier. Technical report, DTIC Document (2003)
15.
Zurück zum Zitat McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001)CrossRef McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001)CrossRef
16.
Zurück zum Zitat Prabhu, Y., Varma, M.: Fastxml: a fast, accurate and stable tree-classifier for extreme multi-label learning. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, pp. 263–272. ACM, New York (2014) Prabhu, Y., Varma, M.: Fastxml: a fast, accurate and stable tree-classifier for extreme multi-label learning. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, pp. 263–272. ACM, New York (2014)
17.
Zurück zum Zitat Tai, F., Lin, H.T.: Multilabel classification with principal label space transformation. Neural Comput. 24(9), 2508–2542 (2012)MathSciNetCrossRef Tai, F., Lin, H.T.: Multilabel classification with principal label space transformation. Neural Comput. 24(9), 2508–2542 (2012)MathSciNetCrossRef
18.
Zurück zum Zitat Tang, L., Liu, H.: Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 817–826. ACM (2009) Tang, L., Liu, H.: Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 817–826. ACM (2009)
19.
Zurück zum Zitat Tang, L., Liu, H.: Leveraging social media networks for classification. Data Min. Knowl. Discov. 23(3), 447–478 (2011)MathSciNetCrossRef Tang, L., Liu, H.: Leveraging social media networks for classification. Data Min. Knowl. Discov. 23(3), 447–478 (2011)MathSciNetCrossRef
20.
Zurück zum Zitat Wang, X., Sukthankar, G.: Multi-label relational neighbor classification using social context features. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 464–472. ACM (2013) Wang, X., Sukthankar, G.: Multi-label relational neighbor classification using social context features. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 464–472. ACM (2013)
21.
Zurück zum Zitat Wang, Y., Wang, L., Li, Y., He, D., Liu, T.Y., Chen, W.: A theoretical analysis of NDCG type ranking measures. arXiv preprint arXiv:1304.6480 (2013) Wang, Y., Wang, L., Li, Y., He, D., Liu, T.Y., Chen, W.: A theoretical analysis of NDCG type ranking measures. arXiv preprint arXiv:​1304.​6480 (2013)
22.
Zurück zum Zitat Yen, I.E.H., Huang, X., Ravikumar, P., Zhong, K., Dhillon, I.: PD-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: International Conference on Machine Learning, pp. 3069–3077 (2016) Yen, I.E.H., Huang, X., Ravikumar, P., Zhong, K., Dhillon, I.: PD-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: International Conference on Machine Learning, pp. 3069–3077 (2016)
Metadaten
Titel
Exploiting Instance Relationship for Effective Extreme Multi-label Learning
verfasst von
Feifei Li
Hongyan Liu
Jun He
Xiaoyong Du
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-91458-9_27