Skip to main content
Top

2018 | OriginalPaper | Chapter

An Approach to Fuzzy Hierarchical Clustering of Short Text Fragments Based on Fuzzy Graph Clustering

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper a novel approach to fuzzy hierarchical clustering of short text fragments is presented. Nowadays dataset which contains a large and even huge amount of short text fragments becomes quite a common object. Different kinds of short messages, paper or news headers are examples of this kind of objects. Authors have taken another similar object which is a dataset of key process indicators of Strategic Planning System of Russian Federation.
In order to reveal structure and thematic variety, fuzzy clustering approach is proposed. Fuzzy graph as a model has been chosen as the most natural view of connected set of words. Finally, hierarchy as a result of clustering obtained as desirable presentation structure of large amount of information.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Here and after all the examples translated into English from Russian, so some linguistic specific features could be lost.
 
2
For Russian language and quite large text corpuses the reasonable value will be in a range [0.4–0.5].
 
3
The reasonable value will be in a range [0.001, 0.05].
 
4
The python-program source codes are available in GitHub (https://​github.​com/​PavelDudarin/​sentence-clustering). There are two modules: working with RusVectores and clustering algorithm itself.
 
Literature
1.
go back to reference Ball, G.H., Hall, D.J.: Isodata: a method of data analysis and pattern classification, Stanford Research Institute, Menlo Park, United States. Office of Naval Research. Information Sciences Branch (1965) Ball, G.H., Hall, D.J.: Isodata: a method of data analysis and pattern classification, Stanford Research Institute, Menlo Park, United States. Office of Naval Research. Information Sciences Branch (1965)
2.
go back to reference Chandrasekaran, E., Sathyaseelan, N.: Fuzzy node fuzzy graph and its cluster analysis. Int. J. Eng. Res. Appl. (IJERA) 2(3), 733–738 (2012). ISSN: 2248-9622 Chandrasekaran, E., Sathyaseelan, N.: Fuzzy node fuzzy graph and its cluster analysis. Int. J. Eng. Res. Appl. (IJERA) 2(3), 733–738 (2012). ISSN: 2248-9622
3.
go back to reference Hou, D., Gu, Y.: An efficient successive iteration partial cluster algorithm for large datasets. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 557–562 (2010) Hou, D., Gu, Y.: An efficient successive iteration partial cluster algorithm for large datasets. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 557–562 (2010)
4.
go back to reference Dudarin, P., Pinkov, A., Yarushkina, N.: Methodology and the algorithm for clustering economic analytics object. Autom. Control Processes 47(1), 85–93 (2017) Dudarin, P., Pinkov, A., Yarushkina, N.: Methodology and the algorithm for clustering economic analytics object. Autom. Control Processes 47(1), 85–93 (2017)
6.
go back to reference Grechachin, V.A.: About text tokenization problem. Int. Sci. J. 6(48), 25–27 (2016). Part 4 Grechachin, V.A.: About text tokenization problem. Int. Sci. J. 6(48), 25–27 (2016). Part 4
7.
go back to reference Zhang, J., Wang, Y., Feng, J.: A hybrid clustering algorithm based on PSO with dynamic crossover. Soft Comput. 18(5), 961–979 (2014)CrossRef Zhang, J., Wang, Y., Feng, J.: A hybrid clustering algorithm based on PSO with dynamic crossover. Soft Comput. 18(5), 961–979 (2014)CrossRef
8.
go back to reference Kutuzov, A., Andreev, I.: Texts in, meaning out: neural language models in semantic similarity task for Russian. In: Proceedings of the Dialog 2015 Conference, Moscow, Russia (2015) Kutuzov, A., Andreev, I.: Texts in, meaning out: neural language models in semantic similarity task for Russian. In: Proceedings of the Dialog 2015 Conference, Moscow, Russia (2015)
9.
go back to reference Mansoori, E.G.: GACH: a grid based algorithm for hierarchical clustering of high-dimensional data. Soft Comput. 18(5), 905–922 (2014)CrossRef Mansoori, E.G.: GACH: a grid based algorithm for hierarchical clustering of high-dimensional data. Soft Comput. 18(5), 905–922 (2014)CrossRef
10.
go back to reference Novák, V., Perfilieva, I., Jarushkina, N.G.: A general methodology for managerial decision making using intelligent techniques. In: Recent Advances in Decision Making. Studies in Computational Intelligence, vol. 222, pp. 103–120 (2009) Novák, V., Perfilieva, I., Jarushkina, N.G.: A general methodology for managerial decision making using intelligent techniques. In: Recent Advances in Decision Making. Studies in Computational Intelligence, vol. 222, pp. 103–120 (2009)
11.
go back to reference Yeh, R.T., Bang, S.Y.: Fuzzy relation, fuzzy graphs and their applications to clustering analysis. In: Fuzzy Sets and Their Applications to Cognitive and Decision Processes, pp. 125–149. Academic Press (1975). ISBN: 9780127752600 Yeh, R.T., Bang, S.Y.: Fuzzy relation, fuzzy graphs and their applications to clustering analysis. In: Fuzzy Sets and Their Applications to Cognitive and Decision Processes, pp. 125–149. Academic Press (1975). ISBN: 9780127752600
12.
go back to reference Rosenfeld, A.: Fuzzy graphs. In: Zadeh, L.A., Fu, K.S., Tanaka, K., Shimura, M. (eds.) Fuzzy Sets and Their Applications to Cognitive and Decision Processes, pp. 77–95. Academic Press, New York (1975) Rosenfeld, A.: Fuzzy graphs. In: Zadeh, L.A., Fu, K.S., Tanaka, K., Shimura, M. (eds.) Fuzzy Sets and Their Applications to Cognitive and Decision Processes, pp. 77–95. Academic Press, New York (1975)
13.
14.
go back to reference Russian Federation Government order. About the list of monoprofiled municipalities of Russian Federation (monocities). 29 June of 2014 № 1398-p. (2014) Russian Federation Government order. About the list of monoprofiled municipalities of Russian Federation (monocities). 29 June of 2014 № 1398-p. (2014)
15.
go back to reference Sameena, K.: Clustering using strong arcs in fuzzy graphs. Gen. Math. Notes 30(1), 60–68 (2015). ISSN: 2219-7184 Sameena, K.: Clustering using strong arcs in fuzzy graphs. Gen. Math. Notes 30(1), 60–68 (2015). ISSN: 2219-7184
16.
go back to reference Sandeep Narayan, K.R., Sunitha, M.S.: Connectivity in a fuzzy graph and its complement. Gen. Math. Notes 9(1), 38–43 (2012). ISSN: 2219-7184 Sandeep Narayan, K.R., Sunitha, M.S.: Connectivity in a fuzzy graph and its complement. Gen. Math. Notes 9(1), 38–43 (2012). ISSN: 2219-7184
18.
go back to reference Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. (2008) Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. (2008)
19.
go back to reference Li, W., Dong, L., Tao, J.: A fast global fuzzy clustering algorithm for the chemical gray box modeling. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 571–579 (2010) Li, W., Dong, L., Tao, J.: A fast global fuzzy clustering algorithm for the chemical gray box modeling. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 571–579 (2010)
22.
go back to reference Han, X., Ma, J., Wu, Y., Cui, C.: A novel machine learning approach to rank web forum posts. Soft Comput. 18(5), 941–959 (2014)CrossRef Han, X., Ma, J., Wu, Y., Cui, C.: A novel machine learning approach to rank web forum posts. Soft Comput. 18(5), 941–959 (2014)CrossRef
23.
go back to reference Dong, Y., Zhuang, Y., Chen, K., Tai, X.: A hierarchical clustering algorithm based on fuzzy graph connectedness. Fuzzy Sets Syst. 157(13), 1760–1774 (2006). ISSN: 0165-0114MathSciNetCrossRefMATH Dong, Y., Zhuang, Y., Chen, K., Tai, X.: A hierarchical clustering algorithm based on fuzzy graph connectedness. Fuzzy Sets Syst. 157(13), 1760–1774 (2006). ISSN: 0165-0114MathSciNetCrossRefMATH
24.
go back to reference Chen, Y., Han, M., Zhu, H.: Ant spatial clustering based on fuzzy IF-THEN Rule. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 563–569 (2010) Chen, Y., Han, M., Zhu, H.: Ant spatial clustering based on fuzzy IF-THEN Rule. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 563–569 (2010)
Metadata
Title
An Approach to Fuzzy Hierarchical Clustering of Short Text Fragments Based on Fuzzy Graph Clustering
Authors
Pavel V. Dudarin
Nadezhda G. Yarushkina
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-68321-8_30

Premium Partner