Skip to main content
Erschienen in: Neural Computing and Applications 2/2014

01.08.2014 | Original Article

Adaptive subspace learning: an iterative approach for document clustering

verfasst von: Xian Wu, Xiaoming Chen, Xiang Li, Lingli Zhou, Jianhuang Lai

Erschienen in: Neural Computing and Applications | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The performance of clustering in document space can be influenced by the high dimension of the vectors, because there exists a great deal of redundant information in the high-dimensional vectors, which may make the similarity between vectors inaccurate. Hence, it is very considerable to derive a low-dimensional subspace that contains less redundant information, so that document vectors can be grouped more reasonably. In general, learning a subspace and clustering vectors are treated as two independent steps; in this case, we cannot estimate whether the subspace is appropriate for the method of clustering or vice versa. To overcome this drawback, this paper combines subspace learning and clustering into an iterative procedure named adaptive subspace learning (ASL). Firstly, the intracluster similarity and the intercluster separability of vectors can be increased via the initial cluster indicators in the step of subspace learning, and then affinity propagation is adopted to partition the vectors into a specific number of clusters, so as to update the cluster indicators and repeat subspace learning. In ASL, the obtained subspace can become more suitable for the clustering with the iterative optimization. The proposed method is evaluated using NG20, Classic3 and K1b datasets, and the results are shown to be superior to the conventional methods of document clustering.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRef Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRef
2.
Zurück zum Zitat Zhong S, Ghosh J (2005) Generative model-based document clustering: a comparative study. Knowl Inf Syst 8(3):374–384CrossRef Zhong S, Ghosh J (2005) Generative model-based document clustering: a comparative study. Knowl Inf Syst 8(3):374–384CrossRef
3.
Zurück zum Zitat Andrews NO, Fox EA (2007) Recent developments in document clustering, Technical Report TR-07-35, Computer Science Andrews NO, Fox EA (2007) Recent developments in document clustering, Technical Report TR-07-35, Computer Science
4.
Zurück zum Zitat Premalatha K, Natarajan AM (2010) A literature review on document clustering. Inf Technol J 9(5):993–1002CrossRef Premalatha K, Natarajan AM (2010) A literature review on document clustering. Inf Technol J 9(5):993–1002CrossRef
5.
Zurück zum Zitat Jing L, Ng MK, Huang JZ (2010) Knowledge-based vector space model for text clustering. Knowl Inf Syst 25(1):35–55CrossRef Jing L, Ng MK, Huang JZ (2010) Knowledge-based vector space model for text clustering. Knowl Inf Syst 25(1):35–55CrossRef
6.
Zurück zum Zitat Sjöberg M, Laaksonen J, Honkela T, Pöllä M (2008) Inferring semantics from textual information in multimedia retrieval. Neurocomputing 71(13):2576–2586CrossRef Sjöberg M, Laaksonen J, Honkela T, Pöllä M (2008) Inferring semantics from textual information in multimedia retrieval. Neurocomputing 71(13):2576–2586CrossRef
7.
Zurück zum Zitat Ding C, He X (2004) K-means clustering via principal component analysis. ACM international conference on machine learning Ding C, He X (2004) K-means clustering via principal component analysis. ACM international conference on machine learning
8.
Zurück zum Zitat Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. ACM international conference on machine learning Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. ACM international conference on machine learning
9.
Zurück zum Zitat Shahnaz F, Berry MW, Pauca VP, Plemmons RJ (2006) Document clustering using nonnegative matrix factorization. Inf Process Manag 42(2):373–386CrossRefMATH Shahnaz F, Berry MW, Pauca VP, Plemmons RJ (2006) Document clustering using nonnegative matrix factorization. Inf Process Manag 42(2):373–386CrossRefMATH
10.
Zurück zum Zitat Zhu Z, Guo YF, Zhu X, Xue X (2010) Normalized dimensionality reduction using nonnegative matrix factorization. Neurocomputing 73(10):1783–1793CrossRef Zhu Z, Guo YF, Zhu X, Xue X (2010) Normalized dimensionality reduction using nonnegative matrix factorization. Neurocomputing 73(10):1783–1793CrossRef
11.
Zurück zum Zitat Chen C, Zhang L, Bu J, Wang C, Chen W (2010) Constrained Laplacian eigenmap for dimensionality reduction. Neurocomputing 73(4–6):951–958CrossRef Chen C, Zhang L, Bu J, Wang C, Chen W (2010) Constrained Laplacian eigenmap for dimensionality reduction. Neurocomputing 73(4–6):951–958CrossRef
12.
Zurück zum Zitat Cai D, He X, Han JW (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1638CrossRef Cai D, He X, Han JW (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1638CrossRef
13.
Zurück zum Zitat Zhang T, Tang Y, Fang B, Xiang Y (2011) Document clustering in correlation similarity measure space. IEEE Trans Knowl Data Eng 99:1–13 Zhang T, Tang Y, Fang B, Xiang Y (2011) Document clustering in correlation similarity measure space. IEEE Trans Knowl Data Eng 99:1–13
14.
Zurück zum Zitat Ding C, He X, Zha H, Simon HD (2002) Adaptive dimension reduction for clustering high dimensional data. IEEE international conference on data mining, pp 147–154 Ding C, He X, Zha H, Simon HD (2002) Adaptive dimension reduction for clustering high dimensional data. IEEE international conference on data mining, pp 147–154
15.
Zurück zum Zitat Li T, Ma S, Ogihara M (2004) Document clustering via adaptive subspace iteration. ACM SIGIR international conference on research and development in information retrieval, pp 218–225 Li T, Ma S, Ogihara M (2004) Document clustering via adaptive subspace iteration. ACM SIGIR international conference on research and development in information retrieval, pp 218–225
16.
Zurück zum Zitat Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and K-means clustering. IEEE international conference on machine learning, pp 521–528 Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and K-means clustering. IEEE international conference on machine learning, pp 521–528
17.
Zurück zum Zitat Wang F, Zhang C (2007) Feature extraction by maximizing the average neighborhood margin. IEEE conference on computer vision and pattern recognition, pp 1–8 Wang F, Zhang C (2007) Feature extraction by maximizing the average neighborhood margin. IEEE conference on computer vision and pattern recognition, pp 1–8
19.
Zurück zum Zitat Guan R, Shi X, Marchese M, Yang C, Liang Y (2011) Text clustering with seeds affinity propagation. IEEE Trans Knowl Data Eng 23(4):627–637CrossRef Guan R, Shi X, Marchese M, Yang C, Liang Y (2011) Text clustering with seeds affinity propagation. IEEE Trans Knowl Data Eng 23(4):627–637CrossRef
20.
Zurück zum Zitat Sun C, Wang Y, Zhao H (2009) Web page clustering via partition adaptive affinity propagation. In: International symposium on neural networks, pp 727–736 Sun C, Wang Y, Zhao H (2009) Web page clustering via partition adaptive affinity propagation. In: International symposium on neural networks, pp 727–736
21.
Zurück zum Zitat Lu Z, Carreira-Perpinán MA (2008) Constrained spectral clustering through affinity propagation. IEEE international conference on computer vision and pattern recognition Lu Z, Carreira-Perpinán MA (2008) Constrained spectral clustering through affinity propagation. IEEE international conference on computer vision and pattern recognition
22.
Zurück zum Zitat Zhang X, Wang W, Norvag K, Sebag M (2010) K-AP: generating specified K clusters by efficient affinity propagation. IEEE international conference on data mining, pp 1187–1192 Zhang X, Wang W, Norvag K, Sebag M (2010) K-AP: generating specified K clusters by efficient affinity propagation. IEEE international conference on data mining, pp 1187–1192
23.
Zurück zum Zitat Han EH, Boley D, Gini M, Gross R, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J (1998) WebACE: a web agent for document categorization and exploration. ACM international conference on autonomous agents Han EH, Boley D, Gini M, Gross R, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J (1998) WebACE: a web agent for document categorization and exploration. ACM international conference on autonomous agents
24.
Zurück zum Zitat Dhillon IS, Guan Y (2003) Clustering large and sparse co-occurrence data. SIAM international conference on data mining Dhillon IS, Guan Y (2003) Clustering large and sparse co-occurrence data. SIAM international conference on data mining
25.
Zurück zum Zitat Wu JS, Lai JH, Wang CD (2011) A novel co-clustering method with intra-similarities. IEEE international conference on data mining workshops, pp 300–306 Wu JS, Lai JH, Wang CD (2011) A novel co-clustering method with intra-similarities. IEEE international conference on data mining workshops, pp 300–306
26.
Zurück zum Zitat Manevitz L, Yousef M (2007) One-class document classification via neural networks. Neurocomputing 70(7):1466–1481CrossRef Manevitz L, Yousef M (2007) One-class document classification via neural networks. Neurocomputing 70(7):1466–1481CrossRef
27.
Zurück zum Zitat Liu H, Wu Z (2010) Non-negative matrix factorization with constraints. AAAI conference on artificial intelligence Liu H, Wu Z (2010) Non-negative matrix factorization with constraints. AAAI conference on artificial intelligence
28.
Zurück zum Zitat Lovász L, Plummer MD (1986) Matching theory. North Holland, AmsterdamMATH Lovász L, Plummer MD (1986) Matching theory. North Holland, AmsterdamMATH
29.
Zurück zum Zitat Wang F, Wang X, Zhang D, Zhang C, Li T (2009) MarginFace: a novel face recognition method by average neighborhood margin maximization. Pattern Recognit 42(11):2863–2875CrossRefMATHMathSciNet Wang F, Wang X, Zhang D, Zhang C, Li T (2009) MarginFace: a novel face recognition method by average neighborhood margin maximization. Pattern Recognit 42(11):2863–2875CrossRefMATHMathSciNet
Metadaten
Titel
Adaptive subspace learning: an iterative approach for document clustering
verfasst von
Xian Wu
Xiaoming Chen
Xiang Li
Lingli Zhou
Jianhuang Lai
Publikationsdatum
01.08.2014
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 2/2014
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-013-1486-8

Weitere Artikel der Ausgabe 2/2014

Neural Computing and Applications 2/2014 Zur Ausgabe