Skip to main content
Top
Published in: International Journal on Digital Libraries 1/2021

12-10-2020

Multilabel graph-based classification for missing labels

Authors: Yasunobu Sumikawa, Tatsurou Miyazaki

Published in: International Journal on Digital Libraries | Issue 1/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Assigning several labels to digital data is becoming easier as this can be achieved in a collaborative manner with Internet users. However, this process is still a challenge, especially in cases where several labels are assigned to each datum, as some suitable labels may be missed. The missing labels lead to inaccuracies in classification. In this study, we propose a novel graph-based multi-label classifier that exhibits stability for obtaining high-accuracy results; this is achieved even where there are missing labels in training data. The core process of our algorithm is to smoothen the label values of the training data from their top-k similar data by propagating their values and averaging them to generate values for the missing labels in the training data. In experimental evaluations, we used multi-labeled document and image datasets to evaluate classifiers, and then measured micro-averaged F-scores for eight classifiers. Even though we incrementally removed correct labels from the two datasets, the proposed algorithm tended to maintain the F-scores, whereas other classifiers decreased the scores. In addition, we evaluated the algorithm using Wikipedia, which comprises a real dataset that includes missing labels, in order to determine how well the algorithm predicted the correct labels and how useful it was for manual annotations, as initial decisions. We have confirmed that LPAC is useful for not only automatic annotation, but also the facilitation of decision making in the initial manual category assignment.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Barforoush, A., Shirazi, H., Emami, H.: A new classification framework to evaluate the entity profiling on the web: Past, present and future. ACM Comput. Surv. 50(3), 39:1–39:39 (2017)CrossRef Barforoush, A., Shirazi, H., Emami, H.: A new classification framework to evaluate the entity profiling on the web: Past, present and future. ACM Comput. Surv. 50(3), 39:1–39:39 (2017)CrossRef
2.
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
3.
go back to reference Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 31:1–31:50 (2016)CrossRef Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 31:1–31:50 (2016)CrossRef
4.
go back to reference Cardoso-Cachopo, A., Oliveira, A.L.: Semi-supervised single-label text categorization using centroid-based classifiers. In: SAC’07, pp. 844–851. ACM, New York, NY, USA (2007) Cardoso-Cachopo, A., Oliveira, A.L.: Semi-supervised single-label text categorization using centroid-based classifiers. In: SAC’07, pp. 844–851. ACM, New York, NY, USA (2007)
5.
go back to reference Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: NIPS’02, pp. 601–608. MIT Press, Cambridge, MA, USA (2002) Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: NIPS’02, pp. 601–608. MIT Press, Cambridge, MA, USA (2002)
6.
go back to reference Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Mach. Learn. 76(2), 211–225 (2009)CrossRef Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Mach. Learn. 76(2), 211–225 (2009)CrossRef
7.
go back to reference Cong, G., Lee, W.S., Wu, H., Liu, B.: Semi-supervised Text Classification Using Partitioned EM. Database Systems for Advanced Applications, pp. 482–493. Springer, Berlin (2004)CrossRef Cong, G., Lee, W.S., Wu, H., Liu, B.: Semi-supervised Text Classification Using Partitioned EM. Database Systems for Advanced Applications, pp. 482–493. Springer, Berlin (2004)CrossRef
8.
go back to reference Ghani, R.: Combining labeled and unlabeled data for multiclass text categorization. In: ICML’02, pp. 187–194. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2002) Ghani, R.: Combining labeled and unlabeled data for multiclass text categorization. In: ICML’02, pp. 187–194. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2002)
9.
go back to reference Kang, F., Jin, R., Sukthankar, R.: Correlated label propagation with application to multi-label learning. In: CVPR’06, pp. 1719–1726. New York, NY, USA (2006) Kang, F., Jin, R., Sukthankar, R.: Correlated label propagation with application to multi-label learning. In: CVPR’06, pp. 1719–1726. New York, NY, USA (2006)
10.
go back to reference Kong, X., Ng, M.K., Zhou, Z.: Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 25(3), 704–719 (2013)CrossRef Kong, X., Ng, M.K., Zhou, Z.: Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 25(3), 704–719 (2013)CrossRef
11.
go back to reference Košmerlj, A., Belyaeva, E., Leban, G., Grobelnik, M., Fortuna, B.: Towards a complete event type taxonomy. In: WWW’15 Companion, pp. 899–902. ACM, New York, NY, USA (2015) Košmerlj, A., Belyaeva, E., Leban, G., Grobelnik, M., Fortuna, B.: Towards a complete event type taxonomy. In: WWW’15 Companion, pp. 899–902. ACM, New York, NY, USA (2015)
12.
go back to reference Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML’14, pp. II–1188–II–1196. JMLR.org (2014) Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML’14, pp. II–1188–II–1196. JMLR.org (2014)
13.
go back to reference Lo, H., Lin, S., Wang, H.: Generalized k-labelsets ensemble for multi-label and cost-sensitive classification. IEEE Trans. Knowl. Data Eng. 26(7), 1679–1691 (2014)CrossRef Lo, H., Lin, S., Wang, H.: Generalized k-labelsets ensemble for multi-label and cost-sensitive classification. IEEE Trans. Knowl. Data Eng. 26(7), 1679–1691 (2014)CrossRef
14.
go back to reference Menc’ia, E.L., Park, S., Fürnkranz, J.: Efficient voting prediction for pairwise multilabel classification. Neurocomputing 73(7–9), 1164–1176 (2010)CrossRef Menc’ia, E.L., Park, S., Fürnkranz, J.: Efficient voting prediction for pairwise multilabel classification. Neurocomputing 73(7–9), 1164–1176 (2010)CrossRef
15.
go back to reference Mikolov, T., Kai, C., Suchanek Greg, C., Dean, J.: Linguistic regularities in continuous space word representations. In: NAACL-HLT’13, pp. 746–751 (2013) Mikolov, T., Kai, C., Suchanek Greg, C., Dean, J.: Linguistic regularities in continuous space word representations. In: NAACL-HLT’13, pp. 746–751 (2013)
16.
go back to reference Mikolov, T., Sutskever, I., Chen, K., S. Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS’13, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., S. Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS’13, pp. 3111–3119 (2013)
17.
go back to reference Mikolov, T., Yih, W.t., Zweig, G.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013) Mikolov, T., Yih, W.t., Zweig, G.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)
18.
go back to reference Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)CrossRef Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)CrossRef
19.
go back to reference Pardalos, P.M., Vavasis, S.A.: Quadratic programming with one negative eigenvalue is NP-hard. J. Glob. Optim. 1, 15–22 (1991)MathSciNetCrossRef Pardalos, P.M., Vavasis, S.A.: Quadratic programming with one negative eigenvalue is NP-hard. J. Glob. Optim. 1, 15–22 (1991)MathSciNetCrossRef
20.
go back to reference Pise, N.N., Kulkarni, P.: A survey of semi-supervised learning methods. In: 2008 International Conference on Computational Intelligence and Security, CISIS’08, vol. 2, pp. 30–34 (2008) Pise, N.N., Kulkarni, P.: A survey of semi-supervised learning methods. In: 2008 International Conference on Computational Intelligence and Security, CISIS’08, vol. 2, pp. 30–34 (2008)
21.
go back to reference Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. 41(2), 12:1–312:1 (2009)CrossRef Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. 41(2), 12:1–312:1 (2009)CrossRef
22.
go back to reference Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)MathSciNetCrossRef Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)MathSciNetCrossRef
23.
go back to reference Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRef Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRef
24.
go back to reference Seyedi, S.A., Lotfi, A., Moradi, P., Qader, N.N.: Dynamic graph-based label propagation for density peaks clustering. Expert Syst. Appl. 115, 314–328 (2019)CrossRef Seyedi, S.A., Lotfi, A., Moradi, P., Qader, N.N.: Dynamic graph-based label propagation for density peaks clustering. Expert Syst. Appl. 115, 314–328 (2019)CrossRef
25.
go back to reference Sumikawa, Y., Jatowt, A.: Classifying short descriptions of past events. In: ECIR’18, pp. 729–736 (2018) Sumikawa, Y., Jatowt, A.: Classifying short descriptions of past events. In: ECIR’18, pp. 729–736 (2018)
26.
go back to reference Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data, pp. 667–685 (2010) Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data, pp. 667–685 (2010)
27.
go back to reference Wang, B., Tsotsos, J.: Dynamic label propagation for semi-supervised multi-class multi-label classification. Pattern Recognit. 52, 75–84 (2016)CrossRef Wang, B., Tsotsos, J.: Dynamic label propagation for semi-supervised multi-class multi-label classification. Pattern Recognit. 52, 75–84 (2016)CrossRef
28.
go back to reference Wang, F., Zhang, C.: Label propagation through linear neighborhoods. In: ICML’06, pp. 985–992. ACM, New York, NY, USA (2006) Wang, F., Zhang, C.: Label propagation through linear neighborhoods. In: ICML’06, pp. 985–992. ACM, New York, NY, USA (2006)
29.
go back to reference Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)CrossRef Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)CrossRef
30.
go back to reference Zhang, M.L., Zhou, Z.H.: Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)CrossRef Zhang, M.L., Zhou, Z.H.: Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)CrossRef
31.
go back to reference Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: NIPS’04, pp. 321–328. MIT Press (2004) Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: NIPS’04, pp. 321–328. MIT Press (2004)
32.
go back to reference Zhu, X.: Semi-supervised learning with graphs. Ph.D. thesis, Pittsburgh, PA, USA (2005) Zhu, X.: Semi-supervised learning with graphs. Ph.D. thesis, Pittsburgh, PA, USA (2005)
33.
go back to reference Zhu, X.: Semi-supervised learning literature survey. Comput. Sci. 2, 4 (2008) Zhu, X.: Semi-supervised learning literature survey. Comput. Sci. 2, 4 (2008)
34.
go back to reference Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Intell. Mach. Learn. 3, 1–130 (2009)MATH Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Intell. Mach. Learn. 3, 1–130 (2009)MATH
35.
go back to reference Zoidi, O., Fotiadou, E., Nikolaidis, N., Pitas, I.: Graph-based label propagation in digital media: a review. ACM Comput. Surv. 47(3), 48:1–48:35 (2015)CrossRef Zoidi, O., Fotiadou, E., Nikolaidis, N., Pitas, I.: Graph-based label propagation in digital media: a review. ACM Comput. Surv. 47(3), 48:1–48:35 (2015)CrossRef
Metadata
Title
Multilabel graph-based classification for missing labels
Authors
Yasunobu Sumikawa
Tatsurou Miyazaki
Publication date
12-10-2020
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Digital Libraries / Issue 1/2021
Print ISSN: 1432-5012
Electronic ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-020-00295-3

Other articles of this Issue 1/2021

International Journal on Digital Libraries 1/2021 Go to the issue

Premium Partner