Skip to main content

2020 | OriginalPaper | Buchkapitel

Efficient Algorithms for Constrained Clustering with Side Information

verfasst von : Zhendong Hao, Longkun Guo, Pei Yao, Peihuang Huang, Huihong Peng

Erschienen in: Parallel Architectures, Algorithms and Programming

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Clustering as an unsupervised machine learning method has broad applications within the area of data science and natural language processing. In this paper, we use background knowledge or side information of the data as constraints to improve clustering accuracy. Following the representation method as in [15], we first format the side information as must-link set and cannot-link set. Then we propose a constrained k-means algorithm for clustering the data. The key idea of our algorithm for clustering must-link data sets is to treat each set as a data with large volume, which is, to assign a set of must-link data as a whole to the center closest to its mass center. In contrast, the key for clustering cannot-link data set is to transform the assignment of the involved data points to the computation of a minimum weight perfect matching. At last, we carried out numerical simulation to evaluate our algorithms for constrained k-means on UCI datasets. The experimental results demonstrate that our method outperforms the previous constrained k-means as well as the classical k-means in both clustering accuracy and runtime.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007) Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
2.
Zurück zum Zitat Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012)CrossRef Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012)CrossRef
3.
Zurück zum Zitat Cao, X., Zhang, C., Zhou, C., Huazhu, F., Foroosh, H.: Constrained multi-view video face clustering. IEEE Trans. Image Process. 24(11), 4381–4393 (2015)MathSciNetCrossRef Cao, X., Zhang, C., Zhou, C., Huazhu, F., Foroosh, H.: Constrained multi-view video face clustering. IEEE Trans. Image Process. 24(11), 4381–4393 (2015)MathSciNetCrossRef
4.
Zurück zum Zitat Chehreghan, A., Abbaspour, R.A.: An improvement on the clustering of high-resolution satellite images using a hybrid algorithm. J. Indian Soc. Remote Sens. 45(4), 579–590 (2017)CrossRef Chehreghan, A., Abbaspour, R.A.: An improvement on the clustering of high-resolution satellite images using a hybrid algorithm. J. Indian Soc. Remote Sens. 45(4), 579–590 (2017)CrossRef
5.
Zurück zum Zitat Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier (2011) Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier (2011)
6.
Zurück zum Zitat Jothi, R., Mohanty, S.K., Ojha, A.: Dk-means: a deterministic k-means clustering algorithm for gene expression analysis. Pattern Anal. Appl. 22(2), 649–667 (2019)MathSciNetCrossRef Jothi, R., Mohanty, S.K., Ojha, A.: Dk-means: a deterministic k-means clustering algorithm for gene expression analysis. Pattern Anal. Appl. 22(2), 649–667 (2019)MathSciNetCrossRef
7.
Zurück zum Zitat Lai, Y., Liu, J.: Optimization study on initial center of k-means algorithm. Comput. Eng. Appl. 44(10), 147–149 (2008) Lai, Y., Liu, J.: Optimization study on initial center of k-means algorithm. Comput. Eng. Appl. 44(10), 147–149 (2008)
8.
Zurück zum Zitat Liu, H., Shao, M., Ding, Z., Yun, F.: Structure-preserved unsupervised domain adaptation. IEEE Trans. Knowl. Data Eng. 31(4), 799–812 (2018)CrossRef Liu, H., Shao, M., Ding, Z., Yun, F.: Structure-preserved unsupervised domain adaptation. IEEE Trans. Knowl. Data Eng. 31(4), 799–812 (2018)CrossRef
9.
Zurück zum Zitat MacQueen, J. et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967) MacQueen, J. et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)
10.
Zurück zum Zitat Marroquin, J.L., Girosi, F.: Some extensions of the k-means algorithm for image segmentation and pattern classification. Technical report, MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB (1993) Marroquin, J.L., Girosi, F.: Some extensions of the k-means algorithm for image segmentation and pattern classification. Technical report, MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB (1993)
11.
Zurück zum Zitat Mashtalir, S.V., Stolbovyi, M.I., Yakovlev, S.V.: Clustering video sequences by the method of harmonic k-means. Cybern. Syst. Anal. 55(2), 200–206 (2019)CrossRef Mashtalir, S.V., Stolbovyi, M.I., Yakovlev, S.V.: Clustering video sequences by the method of harmonic k-means. Cybern. Syst. Anal. 55(2), 200–206 (2019)CrossRef
12.
Zurück zum Zitat Melnykov, V., Zhu, X.: An extension of the k-means algorithm to clustering skewed data. Comput. Stat. 34(1), 373–394 (2019)MathSciNetCrossRef Melnykov, V., Zhu, X.: An extension of the k-means algorithm to clustering skewed data. Comput. Stat. 34(1), 373–394 (2019)MathSciNetCrossRef
13.
Zurück zum Zitat Tang, J., Chang, Y., Aggarwal, C., Liu, H.: A survey of signed network mining in social media. ACM Comput. Surv. (CSUR) 49(3), 42 (2016)CrossRef Tang, J., Chang, Y., Aggarwal, C., Liu, H.: A survey of signed network mining in social media. ACM Comput. Surv. (CSUR) 49(3), 42 (2016)CrossRef
14.
Zurück zum Zitat Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: AAAI/IAAI, vol. 1097, pp. 577–584 (2000) Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: AAAI/IAAI, vol. 1097, pp. 577–584 (2000)
15.
Zurück zum Zitat Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al.: Constrained k-means clustering with background knowledge. In: Icml, vol. 1, pp. 577–584 (2001) Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al.: Constrained k-means clustering with background knowledge. In: Icml, vol. 1, pp. 577–584 (2001)
16.
Zurück zum Zitat Zhang, L., Jin, M.: A constrained clustering-based blind detector for spatial modulation. IEEE Commun. Lett. 23(7), 1170–1173 (2019)CrossRef Zhang, L., Jin, M.: A constrained clustering-based blind detector for spatial modulation. IEEE Commun. Lett. 23(7), 1170–1173 (2019)CrossRef
Metadaten
Titel
Efficient Algorithms for Constrained Clustering with Side Information
verfasst von
Zhendong Hao
Longkun Guo
Pei Yao
Peihuang Huang
Huihong Peng
Copyright-Jahr
2020
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-15-2767-8_25

Neuer Inhalt