Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 1/2006

01.01.2006

Data Clustering with Partial Supervision

verfasst von: ABDELHAMID BOUCHACHIA, WITOLD PEDRYCZ

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 1/2006

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Clustering with partial supervision finds its application in situations where data is neither entirely nor accurately labeled. This paper discusses a semi-supervised clustering algorithm based on a modified version of the fuzzy C-Means (FCM) algorithm. The objective function of the proposed algorithm consists of two components. The first concerns traditional unsupervised clustering while the second tracks the relationship between classes (available labels) and the clusters generated by the first component. The balance between the two components is tuned by a scaling factor. Comprehensive experimental studies are presented. First, the discrimination of the proposed algorithm is discussed before its reformulation as a classifier is addressed. The induced classifier is evaluated on completely labeled data and validated by comparison against some fully supervised classifiers, namely support vector machines and neural networks. This classifier is then evaluated and compared against three semi-supervised algorithms in the context of learning from partly labeled data. In addition, the behavior of the algorithm is discussed and the relation between classes and clusters is investigated using a linear regression model. Finally, the complexity of the algorithm is briefly discussed.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Amini, M. and Gallinari, P. 2003. Semi-supervised learning with explicit misclassification modeling. Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 555–561. Amini, M. and Gallinari, P. 2003. Semi-supervised learning with explicit misclassification modeling. Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 555–561.
Zurück zum Zitat Basu, S., Banerjee, A., and Mooney, R. 2002. Semi-supervised clustering by seeding. Proceedings of the Int. Conference on Machine Learning, pp. 19–26. Basu, S., Banerjee, A., and Mooney, R. 2002. Semi-supervised clustering by seeding. Proceedings of the Int. Conference on Machine Learning, pp. 19–26.
Zurück zum Zitat Bennett, K. and Demiriz, A. 1999. Semi-supervised support vector machines. Advances in Neural Information Processing Systems 11:368–374. Bennett, K. and Demiriz, A. 1999. Semi-supervised support vector machines. Advances in Neural Information Processing Systems 11:368–374.
Zurück zum Zitat Bezdek, J.C. 1981. Pattern recognition with fuzzy objective function algorithms. Plenum, New York. Bezdek, J.C. 1981. Pattern recognition with fuzzy objective function algorithms. Plenum, New York.
Zurück zum Zitat Bishop, C. 1995. Neural networks for pattern recognition. Oxford press, New York. Bishop, C. 1995. Neural networks for pattern recognition. Oxford press, New York.
Zurück zum Zitat Blum, A. and Mitchell, T. 1998. Combining labeled and unlabaled data with co-training. Proceedings of the 11th Annual Conference on Computatioonal Learning Theory, pp. 92–100. Blum, A. and Mitchell, T. 1998. Combining labeled and unlabaled data with co-training. Proceedings of the 11th Annual Conference on Computatioonal Learning Theory, pp. 92–100.
Zurück zum Zitat Blum, A., Lafferty, J., Rwebangira, M., and Reddy, R. 2004. Cluster kernels for semi-supervised learning. Proceedings of the 21th International Conference on Machine Learning, pp. 92–100. Blum, A., Lafferty, J., Rwebangira, M., and Reddy, R. 2004. Cluster kernels for semi-supervised learning. Proceedings of the 21th International Conference on Machine Learning, pp. 92–100.
Zurück zum Zitat Bouchachia, A. 2005a. RBF networks for learning from partially labeled data. Proceedings of the workshop on learning with partially classified training data at the 22nd international conference on machine learning,Bonn pp. 10–18. Bouchachia, A. 2005a. RBF networks for learning from partially labeled data. Proceedings of the workshop on learning with partially classified training data at the 22nd international conference on machine learning,Bonn pp. 10–18.
Zurück zum Zitat Bouchachia, A. 2005b. Learning with hybrid data. Proceedings of the 5th International IEEE Conference on Intelligent Hybrid Systems, pp. 193–198, IEEE Computer Society. Bouchachia, A. 2005b. Learning with hybrid data. Proceedings of the 5th International IEEE Conference on Intelligent Hybrid Systems, pp. 193–198, IEEE Computer Society.
Zurück zum Zitat Chapelle, O., Weston, J., and Schölkopf, B. 2002. Semi-supervised learning using randomized mincuts. Advances in Neural Information Processing Systems, 15:585–592. Chapelle, O., Weston, J., and Schölkopf, B. 2002. Semi-supervised learning using randomized mincuts. Advances in Neural Information Processing Systems, 15:585–592.
Zurück zum Zitat Demiriz, A., Bennett, K., and Embrechts, M. 1999. Semi-supervised clustering using genetic algorithms. Intelligent Engineering Systems, pp. 809–814. Demiriz, A., Bennett, K., and Embrechts, M. 1999. Semi-supervised clustering using genetic algorithms. Intelligent Engineering Systems, pp. 809–814.
Zurück zum Zitat Guyon, I., Matic, N., and Vapnik, V. 1996. Discovering information patterns and data cleaning. Advances in Knowledge Discovery and Data Mining. U. Fayyad et al. (eds.) AAAI Press, pp. 181–203. Guyon, I., Matic, N., and Vapnik, V. 1996. Discovering information patterns and data cleaning. Advances in Knowledge Discovery and Data Mining. U. Fayyad et al. (eds.) AAAI Press, pp. 181–203.
Zurück zum Zitat Hathaway, R.J., Bezdek, J., and Hu, Y. 2000. Generalized fuzzy C-Means clustering strategies using \(L_p\)-norm distances. IEEE Transaction on Fuzzy Systems, 8(5):576–582. Hathaway, R.J., Bezdek, J., and Hu, Y. 2000. Generalized fuzzy C-Means clustering strategies using \(L_p\)-norm distances. IEEE Transaction on Fuzzy Systems, 8(5):576–582.
Zurück zum Zitat Jeon, B. and Landgrebe, D. 1999. Partially supervised classification using weighted unsupervised clustering. IEEE Transactions on Geoscience and Remote Sensing, 37(2):1073–1079. Jeon, B. and Landgrebe, D. 1999. Partially supervised classification using weighted unsupervised clustering. IEEE Transactions on Geoscience and Remote Sensing, 37(2):1073–1079.
Zurück zum Zitat Klinkenberg, R. 2001. Using labeled and unlabeled data to learn drifting concepts. Proceedings of the Workshop on Learning from Temporal and Spatial Data, pp. 16–24. Klinkenberg, R. 2001. Using labeled and unlabeled data to learn drifting concepts. Proceedings of the Workshop on Learning from Temporal and Spatial Data, pp. 16–24.
Zurück zum Zitat Mason, R., Lind, D., and Marchal, W. 1983. Statistics: An Introduction. Harcourt Brace Jovanovich, Inc. Mason, R., Lind, D., and Marchal, W. 1983. Statistics: An Introduction. Harcourt Brace Jovanovich, Inc.
Zurück zum Zitat Nigam, K., McCallum, A., Thrun, S., and Mitchell, T. 2000. Text classification from labeled and unlabeled documents using Expectation-Maximization. Machine Learning, 39(2/3):103–134. Nigam, K., McCallum, A., Thrun, S., and Mitchell, T. 2000. Text classification from labeled and unlabeled documents using Expectation-Maximization. Machine Learning, 39(2/3):103–134.
Zurück zum Zitat Pedrycz, W. and Waletzky, J. 1997. Fuzzy clustering with partial supervision. IEEE Transactions on Systems Man and Cybernetics, B27(5):787–795. Pedrycz, W. and Waletzky, J. 1997. Fuzzy clustering with partial supervision. IEEE Transactions on Systems Man and Cybernetics, B27(5):787–795.
Zurück zum Zitat Pizzi, N. 1999. Fuzzy pre-processing of gold standards as applied to biomedical spectra classification. Artificial Intelligence in Medicine, 16:171–182. Pizzi, N. 1999. Fuzzy pre-processing of gold standards as applied to biomedical spectra classification. Artificial Intelligence in Medicine, 16:171–182.
Zurück zum Zitat Snedecor, G. and Cochran, W. 1989. Statistical Methods. 8th edition, Iowa State University Press. Snedecor, G. and Cochran, W. 1989. Statistical Methods. 8th edition, Iowa State University Press.
Zurück zum Zitat Suykens, J. and Vandewalle, J. 1999. Least squares support vector machine classifiers. Neural Processing Letters, 9(3):293–300. Suykens, J. and Vandewalle, J. 1999. Least squares support vector machine classifiers. Neural Processing Letters, 9(3):293–300.
Zurück zum Zitat Zhu, X., Kandola, J., Ghahramani, Z., and Lafferty, J. 2005. Nonparametric transforms of graph kernels for semi-supervised learning. Advances in Neural Information Processing Systems, 17:1641–1648. Zhu, X., Kandola, J., Ghahramani, Z., and Lafferty, J. 2005. Nonparametric transforms of graph kernels for semi-supervised learning. Advances in Neural Information Processing Systems, 17:1641–1648.
Metadaten
Titel
Data Clustering with Partial Supervision
verfasst von
ABDELHAMID BOUCHACHIA
WITOLD PEDRYCZ
Publikationsdatum
01.01.2006
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 1/2006
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-005-0019-1

Weitere Artikel der Ausgabe 1/2006

Data Mining and Knowledge Discovery 1/2006 Zur Ausgabe