Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 1/2012

01.03.2012 | Original Article

Soft subspace clustering with an improved feature weight self-adjustment mechanism

verfasst von: Gongde Guo, Si Chen, Lifei Chen

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 1/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Traditional clustering algorithms are often defeated by high dimensionality. In order to find clusters hiding in different subspaces, soft subspace clustering has become an effective means of dealing with high dimensional data. However, most existing soft subspace clustering algorithms contain parameters which are difficult to be determined by users in real-world applications. A new soft subspace clustering algorithm named SC-IFWSA is proposed, which uses an improved feature weight self-adjustment mechanism IFWSA to update adaptively the weights of all features for each cluster according to the importance of the features to clustering quality and does not require users to set any parameter values. In addition, SC-IFWSA can overcome the traditional FWSA mechanism which may fail to calculate feature weights in some particular cases. In comparison with its related approaches, the experimental results carried out on ten data sets demonstrate the effectiveness and feasibility of the proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Steinbach M, Ertöz L, Kumar V (2004) The challenges of clustering high dimensional data. New directions in statistical physics: econophysics, bioinformatics, and pattern recognition, pp 273–308 Steinbach M, Ertöz L, Kumar V (2004) The challenges of clustering high dimensional data. New directions in statistical physics: econophysics, bioinformatics, and pattern recognition, pp 273–308
2.
Zurück zum Zitat Han JW, Kamber M (2007) Data mining: concepts and techniques, 2nd edn. China Machine Press, Beijing Han JW, Kamber M (2007) Data mining: concepts and techniques, 2nd edn. China Machine Press, Beijing
3.
Zurück zum Zitat Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inform Technol Decis Making 5(4):597–604CrossRef Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inform Technol Decis Making 5(4):597–604CrossRef
4.
Zurück zum Zitat Huang JZ, Ng MK, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. IEEE Trans Pattern Anal Mach Intell 27(5):1–12MATHCrossRef Huang JZ, Ng MK, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. IEEE Trans Pattern Anal Mach Intell 27(5):1–12MATHCrossRef
5.
Zurück zum Zitat Wang LJ (2010) An improved multiple fuzzy NNC system based on mutual information and fuzzy integral. Int J Mach Learn Cybern 2(1):25–36CrossRef Wang LJ (2010) An improved multiple fuzzy NNC system based on mutual information and fuzzy integral. Int J Mach Learn Cybern 2(1):25–36CrossRef
6.
Zurück zum Zitat Hu QH, Pan W, An S, Ma PJ, Wei JM (2010) An efficient gene selection technique for cancer recognition based on neighborhood mutual information. Int J Mach Learn Cybern 1(1–4):63–74CrossRef Hu QH, Pan W, An S, Ma PJ, Wei JM (2010) An efficient gene selection technique for cancer recognition based on neighborhood mutual information. Int J Mach Learn Cybern 1(1–4):63–74CrossRef
7.
Zurück zum Zitat Shah NH, Shukla KT (2010) Optimal production schedule in declining market for an imperfect production system. Int J Mach Learn Cybern 1(1–4):89–99CrossRef Shah NH, Shukla KT (2010) Optimal production schedule in declining market for an imperfect production system. Int J Mach Learn Cybern 1(1–4):89–99CrossRef
8.
Zurück zum Zitat Tsai CY, Chiu CC (2008) Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm. Comput Stat Data Anal 52:4658–4672MathSciNetMATHCrossRef Tsai CY, Chiu CC (2008) Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm. Comput Stat Data Anal 52:4658–4672MathSciNetMATHCrossRef
9.
Zurück zum Zitat Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithm for projected clustering. In: Proceedings of the ACM SIGMOD, pp 61–72 Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithm for projected clustering. In: Proceedings of the ACM SIGMOD, pp 61–72
10.
Zurück zum Zitat Woo KG, Lee JH, Kim MH, Lee YJ (2004) FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting. Inform Softw Technol 46(4):255–271CrossRef Woo KG, Lee JH, Kim MH, Lee YJ (2004) FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting. Inform Softw Technol 46(4):255–271CrossRef
11.
Zurück zum Zitat Yip KY, Cheung DW, Ng MK (2004) A practical projected clustering algorithm. IEEE Trans Knowl Data Eng 16(11):1387–1397CrossRef Yip KY, Cheung DW, Ng MK (2004) A practical projected clustering algorithm. IEEE Trans Knowl Data Eng 16(11):1387–1397CrossRef
12.
Zurück zum Zitat Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl 6(1):90–105CrossRef Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl 6(1):90–105CrossRef
13.
Zurück zum Zitat Chan EY, Ching WK, Ng MK, Huang JZ (2004) An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recogn 37(5):943–952MATHCrossRef Chan EY, Ching WK, Ng MK, Huang JZ (2004) An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recogn 37(5):943–952MATHCrossRef
14.
Zurück zum Zitat Jing L, Ng MK, Huang JZ (2007) An entropy weighting K-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng 19(8):1–16MATHCrossRef Jing L, Ng MK, Huang JZ (2007) An entropy weighting K-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng 19(8):1–16MATHCrossRef
15.
Zurück zum Zitat Domeniconi C, Gunopulos D, Ma S, Yan B, Al-Razgan M, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Disc 14(1):63–97MathSciNetCrossRef Domeniconi C, Gunopulos D, Ma S, Yan B, Al-Razgan M, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Disc 14(1):63–97MathSciNetCrossRef
16.
Zurück zum Zitat Jing L, Ng MK, Xu J, Huang JZ (2005) Subspace clustering of text documents with feature weighting k-means algorithm. Adv Knowl Discov Data Mining 3518:802–812CrossRef Jing L, Ng MK, Xu J, Huang JZ (2005) Subspace clustering of text documents with feature weighting k-means algorithm. Adv Knowl Discov Data Mining 3518:802–812CrossRef
17.
Zurück zum Zitat Gan G, Wu J, Yang Z (2006) A fuzzy subspace algorithm for clustering high dimensional data. In: Li X, Zaiane O, Li Z (eds) Lecture notes in artificial intelligence 4093. Springer, Berlin, pp 271–278 Gan G, Wu J, Yang Z (2006) A fuzzy subspace algorithm for clustering high dimensional data. In: Li X, Zaiane O, Li Z (eds) Lecture notes in artificial intelligence 4093. Springer, Berlin, pp 271–278
18.
Zurück zum Zitat Gan G, Wu J (2008) A convergence theorem for the fuzzy subspace clustering algorithm. Pattern Recogn 41:1939–1947MATHCrossRef Gan G, Wu J (2008) A convergence theorem for the fuzzy subspace clustering algorithm. Pattern Recogn 41:1939–1947MATHCrossRef
19.
Zurück zum Zitat Deng Z, Choi KS, Chung FL, Wang S (2010) Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recogn 43:767–781MATHCrossRef Deng Z, Choi KS, Chung FL, Wang S (2010) Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recogn 43:767–781MATHCrossRef
20.
Zurück zum Zitat Domeniconi C, Papadopoulos D, Gunopulos D, Ma S (2004) Subspace clustering of high dimensional data, In: Proceedings of the SIAM international conference on data mining Domeniconi C, Papadopoulos D, Gunopulos D, Ma S (2004) Subspace clustering of high dimensional data, In: Proceedings of the SIAM international conference on data mining
22.
Zurück zum Zitat Frigui H, Nasraoui O (2004) Unsupervised learning of prototypes and attribute weights. Pattern Recogn 37(3):567–581CrossRef Frigui H, Nasraoui O (2004) Unsupervised learning of prototypes and attribute weights. Pattern Recogn 37(3):567–581CrossRef
23.
Zurück zum Zitat Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn edn. Morgan Kaufmann, San FranciscMATH Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn edn. Morgan Kaufmann, San FranciscMATH
25.
Zurück zum Zitat Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams, In Proc. of ACM International Conference on Knowledge Discovery and Data Mining, ACM Press: 97-106 Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams, In Proc. of ACM International Conference on Knowledge Discovery and Data Mining, ACM Press: 97-106
Metadaten
Titel
Soft subspace clustering with an improved feature weight self-adjustment mechanism
verfasst von
Gongde Guo
Si Chen
Lifei Chen
Publikationsdatum
01.03.2012
Verlag
Springer-Verlag
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 1/2012
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-011-0038-8

Weitere Artikel der Ausgabe 1/2012

International Journal of Machine Learning and Cybernetics 1/2012 Zur Ausgabe

Neuer Inhalt