Skip to main content

2017 | OriginalPaper | Buchkapitel

Topic-Level Clustering on Web Resources

verfasst von : Shiyu Zhao, Fu Lee Wang, Leung Pun Wong

Erschienen in: Emerging Technologies for Education

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The rapid development of Internet, social media, and news portals has provided a large amount of information in various aspects. Confronting such plenty of resources, it is valuable to develop effective clustering approaches. However, performance of traditional clustering models on web resources is not good enough due to the high dimension. In this paper, we propose a clustering model based on topic model and density peaks. Our model combines biterm topic model and clustering by fast search of density peaks, which firstly extract a set of features with the co-occurrence of two words from the original documents, followed by clustering analysis via topical features. Web resources are translated from raw data into clusters, and evaluation on clustering results of center part verifies the effectiveness of the proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bao, S., Xu, S.¸ Zhang, L., Yan, R., Su, Z., Han, D., Yu, Y.: Joint emotion-topic modeling for social affective text mining. In: Proceedings of the 9th IEEE International Conference on Data Mining (ICDM), pp. 699–704 (2009) Bao, S., Xu, S.¸ Zhang, L., Yan, R., Su, Z., Han, D., Yu, Y.: Joint emotion-topic modeling for social affective text mining. In: Proceedings of the 9th IEEE International Conference on Data Mining (ICDM), pp. 699–704 (2009)
2.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003) Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
3.
Zurück zum Zitat Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391 (1990)CrossRef Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391 (1990)CrossRef
4.
Zurück zum Zitat Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996) Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996)
5.
Zurück zum Zitat Fischer, G.: User modeling in humancomputer interaction. User Model. User-Adap. Inter. 11(1–2), 65–86 (2001)CrossRefMATH Fischer, G.: User modeling in humancomputer interaction. User Model. User-Adap. Inter. 11(1–2), 65–86 (2001)CrossRefMATH
6.
Zurück zum Zitat Fukunaga, K., Hostetler, L.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theory 21(1), 32–40 (1975)MathSciNetCrossRefMATH Fukunaga, K., Hostetler, L.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theory 21(1), 32–40 (1975)MathSciNetCrossRefMATH
7.
Zurück zum Zitat Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl. 1), 5228–5235 (2004)CrossRef Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl. 1), 5228–5235 (2004)CrossRef
8.
Zurück zum Zitat Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 50–57 (1999) Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 50–57 (1999)
9.
Zurück zum Zitat Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. Wiley, New York (2009)MATH Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. Wiley, New York (2009)MATH
10.
Zurück zum Zitat Kuang, W., Luo, N., Sun, Z.: Resource recommendation based on topic model for educational system. In: Proceedings of the 6th IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC), pp. 370–374 (2011) Kuang, W., Luo, N., Sun, Z.: Resource recommendation based on topic model for educational system. In: Proceedings of the 6th IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC), pp. 370–374 (2011)
11.
Zurück zum Zitat Lakiotaki, K., Matsatsinis, N.F., Tsoukiàs, A.: Multicriteria user modeling in recommender systems. IEEE Intell. Syst. 26(2), 64–76 (2011)CrossRef Lakiotaki, K., Matsatsinis, N.F., Tsoukiàs, A.: Multicriteria user modeling in recommender systems. IEEE Intell. Syst. 26(2), 64–76 (2011)CrossRef
12.
Zurück zum Zitat Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 375–384 (2009) Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 375–384 (2009)
13.
Zurück zum Zitat MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability: Statistics, vol. 1, pp. 281–297. University of California Press (1967) MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability: Statistics, vol. 1, pp. 281–297. University of California Press (1967)
14.
Zurück zum Zitat Martın-Guerrero, J.D., Palomares, A., Balaguer-Ballester, E., Soria-Olivas, E., Gómez-Sanchis, J., Soriano-Asensi, A.: Studying the feasibility of a recommender in a citizen web portal based on user modeling and clustering algorithms. Expert Syst. Appl. 30(2), 299–312 (2006) Martın-Guerrero, J.D., Palomares, A., Balaguer-Ballester, E., Soria-Olivas, E., Gómez-Sanchis, J., Soriano-Asensi, A.: Studying the feasibility of a recommender in a citizen web portal based on user modeling and clustering algorithms. Expert Syst. Appl. 30(2), 299–312 (2006)
15.
Zurück zum Zitat McLachlan, G., Krishnan, T.: The EM algorithm and extensions. Wiley, New York (2007)MATH McLachlan, G., Krishnan, T.: The EM algorithm and extensions. Wiley, New York (2007)MATH
16.
Zurück zum Zitat Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)CrossRef Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)CrossRef
17.
Zurück zum Zitat Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 487–494 (2004) Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 487–494 (2004)
18.
Zurück zum Zitat Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. (2012) Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. (2012)
19.
Zurück zum Zitat Thollard, F., Dupont, P., Higuera, C.D.L.: Probabilistic dfa inference using kullback-leibler divergence and minimality. In: Proceedings of the 17th International Conference on Machine Learning (ICML), pp. 975–982 (2000) Thollard, F., Dupont, P., Higuera, C.D.L.: Probabilistic dfa inference using kullback-leibler divergence and minimality. In: Proceedings of the 17th International Conference on Machine Learning (ICML), pp. 975–982 (2000)
20.
Zurück zum Zitat Trier, Ø.D., Jain, A.K., Taxt, T.: Feature extraction methods for character recognition-a survey. Pattern Recogn. 29(4), 641–662 (1996)CrossRef Trier, Ø.D., Jain, A.K., Taxt, T.: Feature extraction methods for character recognition-a survey. Pattern Recogn. 29(4), 641–662 (1996)CrossRef
21.
Zurück zum Zitat Wang, S., Tang, Z., Rao, Y., Xie, H., Wang, F.L.: A clustering algorithm based on minimum spanning tree with e-learning applications. In: Gong, Z., Chiu, D.K.W., Zou, D. (eds.) ICWL 2015. LNCS, vol. 9584, pp. 3–12. Springer, Heidelberg (2016). doi:10.1007/978-3-319-32865-2_1 CrossRef Wang, S., Tang, Z., Rao, Y., Xie, H., Wang, F.L.: A clustering algorithm based on minimum spanning tree with e-learning applications. In: Gong, Z., Chiu, D.K.W., Zou, D. (eds.) ICWL 2015. LNCS, vol. 9584, pp. 3–12. Springer, Heidelberg (2016). doi:10.​1007/​978-3-319-32865-2_​1 CrossRef
22.
Zurück zum Zitat Xie, H., Li, Q., Cai, Y.: Community-aware resource profiling for personalized search in folksonomy. J. Comput. Sci. Technol. 27(3), 599–610 (2012)CrossRefMATH Xie, H., Li, Q., Cai, Y.: Community-aware resource profiling for personalized search in folksonomy. J. Comput. Sci. Technol. 27(3), 599–610 (2012)CrossRefMATH
23.
Zurück zum Zitat Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Rao, Y.: Community-aware user profile enrichment in folksonomy. Neural Netw. 58, 111–121 (2014)CrossRef Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Rao, Y.: Community-aware user profile enrichment in folksonomy. Neural Netw. 58, 111–121 (2014)CrossRef
24.
Zurück zum Zitat Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRef Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRef
25.
Zurück zum Zitat Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web (WWW), pp. 1445–1456 (2013) Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web (WWW), pp. 1445–1456 (2013)
26.
Zurück zum Zitat Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. ACM Sigmod. Rec. 25(2), 103–114 (1996)CrossRef Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. ACM Sigmod. Rec. 25(2), 103–114 (1996)CrossRef
Metadaten
Titel
Topic-Level Clustering on Web Resources
verfasst von
Shiyu Zhao
Fu Lee Wang
Leung Pun Wong
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-52836-6_60