Skip to main content

2018 | OriginalPaper | Buchkapitel

An Approach to Fuzzy Hierarchical Clustering of Short Text Fragments Based on Fuzzy Graph Clustering

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper a novel approach to fuzzy hierarchical clustering of short text fragments is presented. Nowadays dataset which contains a large and even huge amount of short text fragments becomes quite a common object. Different kinds of short messages, paper or news headers are examples of this kind of objects. Authors have taken another similar object which is a dataset of key process indicators of Strategic Planning System of Russian Federation.
In order to reveal structure and thematic variety, fuzzy clustering approach is proposed. Fuzzy graph as a model has been chosen as the most natural view of connected set of words. Finally, hierarchy as a result of clustering obtained as desirable presentation structure of large amount of information.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Here and after all the examples translated into English from Russian, so some linguistic specific features could be lost.
 
2
For Russian language and quite large text corpuses the reasonable value will be in a range [0.4–0.5].
 
3
The reasonable value will be in a range [0.001, 0.05].
 
4
The python-program source codes are available in GitHub (https://​github.​com/​PavelDudarin/​sentence-clustering). There are two modules: working with RusVectores and clustering algorithm itself.
 
Literatur
1.
Zurück zum Zitat Ball, G.H., Hall, D.J.: Isodata: a method of data analysis and pattern classification, Stanford Research Institute, Menlo Park, United States. Office of Naval Research. Information Sciences Branch (1965) Ball, G.H., Hall, D.J.: Isodata: a method of data analysis and pattern classification, Stanford Research Institute, Menlo Park, United States. Office of Naval Research. Information Sciences Branch (1965)
2.
Zurück zum Zitat Chandrasekaran, E., Sathyaseelan, N.: Fuzzy node fuzzy graph and its cluster analysis. Int. J. Eng. Res. Appl. (IJERA) 2(3), 733–738 (2012). ISSN: 2248-9622 Chandrasekaran, E., Sathyaseelan, N.: Fuzzy node fuzzy graph and its cluster analysis. Int. J. Eng. Res. Appl. (IJERA) 2(3), 733–738 (2012). ISSN: 2248-9622
3.
Zurück zum Zitat Hou, D., Gu, Y.: An efficient successive iteration partial cluster algorithm for large datasets. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 557–562 (2010) Hou, D., Gu, Y.: An efficient successive iteration partial cluster algorithm for large datasets. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 557–562 (2010)
4.
Zurück zum Zitat Dudarin, P., Pinkov, A., Yarushkina, N.: Methodology and the algorithm for clustering economic analytics object. Autom. Control Processes 47(1), 85–93 (2017) Dudarin, P., Pinkov, A., Yarushkina, N.: Methodology and the algorithm for clustering economic analytics object. Autom. Control Processes 47(1), 85–93 (2017)
6.
Zurück zum Zitat Grechachin, V.A.: About text tokenization problem. Int. Sci. J. 6(48), 25–27 (2016). Part 4 Grechachin, V.A.: About text tokenization problem. Int. Sci. J. 6(48), 25–27 (2016). Part 4
7.
Zurück zum Zitat Zhang, J., Wang, Y., Feng, J.: A hybrid clustering algorithm based on PSO with dynamic crossover. Soft Comput. 18(5), 961–979 (2014)CrossRef Zhang, J., Wang, Y., Feng, J.: A hybrid clustering algorithm based on PSO with dynamic crossover. Soft Comput. 18(5), 961–979 (2014)CrossRef
8.
Zurück zum Zitat Kutuzov, A., Andreev, I.: Texts in, meaning out: neural language models in semantic similarity task for Russian. In: Proceedings of the Dialog 2015 Conference, Moscow, Russia (2015) Kutuzov, A., Andreev, I.: Texts in, meaning out: neural language models in semantic similarity task for Russian. In: Proceedings of the Dialog 2015 Conference, Moscow, Russia (2015)
9.
Zurück zum Zitat Mansoori, E.G.: GACH: a grid based algorithm for hierarchical clustering of high-dimensional data. Soft Comput. 18(5), 905–922 (2014)CrossRef Mansoori, E.G.: GACH: a grid based algorithm for hierarchical clustering of high-dimensional data. Soft Comput. 18(5), 905–922 (2014)CrossRef
10.
Zurück zum Zitat Novák, V., Perfilieva, I., Jarushkina, N.G.: A general methodology for managerial decision making using intelligent techniques. In: Recent Advances in Decision Making. Studies in Computational Intelligence, vol. 222, pp. 103–120 (2009) Novák, V., Perfilieva, I., Jarushkina, N.G.: A general methodology for managerial decision making using intelligent techniques. In: Recent Advances in Decision Making. Studies in Computational Intelligence, vol. 222, pp. 103–120 (2009)
11.
Zurück zum Zitat Yeh, R.T., Bang, S.Y.: Fuzzy relation, fuzzy graphs and their applications to clustering analysis. In: Fuzzy Sets and Their Applications to Cognitive and Decision Processes, pp. 125–149. Academic Press (1975). ISBN: 9780127752600 Yeh, R.T., Bang, S.Y.: Fuzzy relation, fuzzy graphs and their applications to clustering analysis. In: Fuzzy Sets and Their Applications to Cognitive and Decision Processes, pp. 125–149. Academic Press (1975). ISBN: 9780127752600
12.
Zurück zum Zitat Rosenfeld, A.: Fuzzy graphs. In: Zadeh, L.A., Fu, K.S., Tanaka, K., Shimura, M. (eds.) Fuzzy Sets and Their Applications to Cognitive and Decision Processes, pp. 77–95. Academic Press, New York (1975) Rosenfeld, A.: Fuzzy graphs. In: Zadeh, L.A., Fu, K.S., Tanaka, K., Shimura, M. (eds.) Fuzzy Sets and Their Applications to Cognitive and Decision Processes, pp. 77–95. Academic Press, New York (1975)
13.
14.
Zurück zum Zitat Russian Federation Government order. About the list of monoprofiled municipalities of Russian Federation (monocities). 29 June of 2014 № 1398-p. (2014) Russian Federation Government order. About the list of monoprofiled municipalities of Russian Federation (monocities). 29 June of 2014 № 1398-p. (2014)
15.
Zurück zum Zitat Sameena, K.: Clustering using strong arcs in fuzzy graphs. Gen. Math. Notes 30(1), 60–68 (2015). ISSN: 2219-7184 Sameena, K.: Clustering using strong arcs in fuzzy graphs. Gen. Math. Notes 30(1), 60–68 (2015). ISSN: 2219-7184
16.
Zurück zum Zitat Sandeep Narayan, K.R., Sunitha, M.S.: Connectivity in a fuzzy graph and its complement. Gen. Math. Notes 9(1), 38–43 (2012). ISSN: 2219-7184 Sandeep Narayan, K.R., Sunitha, M.S.: Connectivity in a fuzzy graph and its complement. Gen. Math. Notes 9(1), 38–43 (2012). ISSN: 2219-7184
18.
Zurück zum Zitat Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. (2008) Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. (2008)
19.
Zurück zum Zitat Li, W., Dong, L., Tao, J.: A fast global fuzzy clustering algorithm for the chemical gray box modeling. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 571–579 (2010) Li, W., Dong, L., Tao, J.: A fast global fuzzy clustering algorithm for the chemical gray box modeling. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 571–579 (2010)
22.
Zurück zum Zitat Han, X., Ma, J., Wu, Y., Cui, C.: A novel machine learning approach to rank web forum posts. Soft Comput. 18(5), 941–959 (2014)CrossRef Han, X., Ma, J., Wu, Y., Cui, C.: A novel machine learning approach to rank web forum posts. Soft Comput. 18(5), 941–959 (2014)CrossRef
23.
Zurück zum Zitat Dong, Y., Zhuang, Y., Chen, K., Tai, X.: A hierarchical clustering algorithm based on fuzzy graph connectedness. Fuzzy Sets Syst. 157(13), 1760–1774 (2006). ISSN: 0165-0114MathSciNetCrossRefMATH Dong, Y., Zhuang, Y., Chen, K., Tai, X.: A hierarchical clustering algorithm based on fuzzy graph connectedness. Fuzzy Sets Syst. 157(13), 1760–1774 (2006). ISSN: 0165-0114MathSciNetCrossRefMATH
24.
Zurück zum Zitat Chen, Y., Han, M., Zhu, H.: Ant spatial clustering based on fuzzy IF-THEN Rule. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 563–569 (2010) Chen, Y., Han, M., Zhu, H.: Ant spatial clustering based on fuzzy IF-THEN Rule. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 563–569 (2010)
Metadaten
Titel
An Approach to Fuzzy Hierarchical Clustering of Short Text Fragments Based on Fuzzy Graph Clustering
verfasst von
Pavel V. Dudarin
Nadezhda G. Yarushkina
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-68321-8_30