Skip to main content
Top
Published in: Cluster Computing 5/2019

16-10-2017

Research on multi-feature fusion algorithm for subject words extraction and summary generation of text

Authors: Gui-Xian Xu, Hai-Shen Yao, Changzhi Wang

Published in: Cluster Computing | Special Issue 5/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Subject words represent the brief information of the text. Text automatic summary reflects its theme and core content. In this paper, the research is conducted on multi-feature fusion algorithm on subject words extraction and summary generation of Tibetan network text. Firstly, Tibetan web pages are collected and preprocessing is conducted to extract the useful information from web pages. Secondly, BCCF algorithm of word segmentation is utilized to cut the text’s words. Then multi-feature fusion algorithm is proposed to extract the subject words of the text. The algorithm takes into account the multi-factors such as the word’s frequency, length, type to calculate the words’ weight and effectively select the text’s subject words. For text summary generation, the algorithm of the sentence weight calculation is designed in terms of the word frequency, position and so on. The algorithm of text summary generation is to compute the sentences’ weight, remove the redundant sentences and form the text summary. The experiments show that multi-feature fusion algorithm of the subject words extraction and the summary generation have reached the better achievement. The research is useful and helpful to the study of Tibetan information processing.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Hu, X., Lin, Y., Wang, C., et al.: Summary of automatic text summarization techniques. J. Intell. 29(08), 144–147 (2010) Hu, X., Lin, Y., Wang, C., et al.: Summary of automatic text summarization techniques. J. Intell. 29(08), 144–147 (2010)
2.
go back to reference Hu, C., Luo, N., Zhao, Q.: Fast fuzzy trajectory clustering strategy based on data summarization and rough approximation. Clust. Comput. 19(3), 1–10 (2016)CrossRef Hu, C., Luo, N., Zhao, Q.: Fast fuzzy trajectory clustering strategy based on data summarization and rough approximation. Clust. Comput. 19(3), 1–10 (2016)CrossRef
3.
go back to reference Ohsawa, Y., Benson, N.E., Yachida, M.: KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proceedings of the Research and Technology Advances in Digital Libraries, pp. 12–18 (1998) Ohsawa, Y., Benson, N.E., Yachida, M.: KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proceedings of the Research and Technology Advances in Digital Libraries, pp. 12–18 (1998)
4.
go back to reference Manning, C., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRef Manning, C., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRef
5.
go back to reference Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Proceedings of EMNLP, pp. 404–411 (2004) Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Proceedings of EMNLP, pp. 404–411 (2004)
6.
go back to reference Si, X., Sun, M.: Tag-LDA for scalable real-time tag recommendation. J. Comput. Inf. Syst. 6(2), 23–31 (2009) Si, X., Sun, M.: Tag-LDA for scalable real-time tag recommendation. J. Comput. Inf. Syst. 6(2), 23–31 (2009)
7.
go back to reference Krestel, R., Fankhauser, P., Nejdl, W.: Latent Dirichlet allocation for tag recommendation. In: Proceedings of ACM Conference on Recommender Systems, pp. 61–68 (2009) Krestel, R., Fankhauser, P., Nejdl, W.: Latent Dirichlet allocation for tag recommendation. In: Proceedings of ACM Conference on Recommender Systems, pp. 61–68 (2009)
8.
go back to reference Bundschus, M., Yu, S., Tresp, V, et al.: Hierarchical Bayesian models for collaborative tagging systems. In: Proceedings of ICDM, pp. 728–733 (2009) Bundschus, M., Yu, S., Tresp, V, et al.: Hierarchical Bayesian models for collaborative tagging systems. In: Proceedings of ICDM, pp. 728–733 (2009)
9.
go back to reference State Administration of Press, Publication, Radio, Film, and Television of The People’s Republic of China: Rules for Abstracts and Abstracting (GB6447-86). Standards Press of China Press, Beijing, pp. 141–142 (1998) State Administration of Press, Publication, Radio, Film, and Television of The People’s Republic of China: Rules for Abstracts and Abstracting (GB6447-86). Standards Press of China Press, Beijing, pp. 141–142 (1998)
10.
go back to reference Ge, J.Y.: Research on Text Automatic Summarization Technology. Fudan University (2004) Ge, J.Y.: Research on Text Automatic Summarization Technology. Fudan University (2004)
11.
go back to reference Jin, B., Shi, Y.J., Teng, H.F., et al.: Automatic abstracting technology and its application. Appl. Res. Comput. 12, 13–15 (2004) Jin, B., Shi, Y.J., Teng, H.F., et al.: Automatic abstracting technology and its application. Appl. Res. Comput. 12, 13–15 (2004)
13.
go back to reference Baxendale, P.: Machine-made index for technical literatur—an experiment. IBM J. Res. Dev. 2(4), 354–361 (1958)CrossRef Baxendale, P.: Machine-made index for technical literatur—an experiment. IBM J. Res. Dev. 2(4), 354–361 (1958)CrossRef
14.
go back to reference Aone, C., Okurowski, M.E., Gorlinsky, J., et al.: A trainable summarizer with knowledge acquired from robust NLP techniques. In: Mani, I., Maybury, M.T. (eds.) Advances in Automatic text Summarization, pp. 71–80. MIT Press, Cambridge (1999) Aone, C., Okurowski, M.E., Gorlinsky, J., et al.: A trainable summarizer with knowledge acquired from robust NLP techniques. In: Mani, I., Maybury, M.T. (eds.) Advances in Automatic text Summarization, pp. 71–80. MIT Press, Cambridge (1999)
15.
go back to reference Lin, C.Y.: Training a selection function for extraction. In: Eighth International Conference on Information and Knowledge Management. ACM, pp. 55-62 (1999) Lin, C.Y.: Training a selection function for extraction. In: Eighth International Conference on Information and Knowledge Management. ACM, pp. 55-62 (1999)
16.
go back to reference Conroy, J.M., O’Leary, D.P.: Text summarization via hidden Markov models. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 406-407 (2001) Conroy, J.M., O’Leary, D.P.: Text summarization via hidden Markov models. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 406-407 (2001)
17.
go back to reference Su, H.Y., Wang, Y.C.: The automatic creation of the abstracts of Chinese scientific and technical literature. J. China Soc. Sci. Tech. Inf. 8, 433–439 (1989) Su, H.Y., Wang, Y.C.: The automatic creation of the abstracts of Chinese scientific and technical literature. J. China Soc. Sci. Tech. Inf. 8, 433–439 (1989)
18.
go back to reference Mo, Y., Wang, Y.C.: Automatic abstract of Chinese documents. New Technol. Libr. Inf. Serv. 3, 10–12 (1999) Mo, Y., Wang, Y.C.: Automatic abstract of Chinese documents. New Technol. Libr. Inf. Serv. 3, 10–12 (1999)
19.
go back to reference Wang, Y.C., Xu, H.M.: The OA-1.4 automatic abstraction system on Chinese documents. High Technol. Lett. 1, 19–23 (1998) Wang, Y.C., Xu, H.M.: The OA-1.4 automatic abstraction system on Chinese documents. High Technol. Lett. 1, 19–23 (1998)
20.
go back to reference Wu, Y.: HIT-97 type English automatic abstracting system. J. China Soc. Sci. Tech. Inf. 17(5), 358–364 (1998) Wu, Y.: HIT-97 type English automatic abstracting system. J. China Soc. Sci. Tech. Inf. 17(5), 358–364 (1998)
21.
go back to reference An-JianCaiRang: Research on automatic abstract of web document summarization of Tibetan search engine. Microprocessors 31(5), 77–80 (2010) An-JianCaiRang: Research on automatic abstract of web document summarization of Tibetan search engine. Microprocessors 31(5), 77–80 (2010)
22.
go back to reference Yang, D.Z., Zhao, G., Wang, T.: Application of WebCrawler in information search and data mining. Comput. Eng. Des. 30(24), 5658–5662 (2009) Yang, D.Z., Zhao, G., Wang, T.: Application of WebCrawler in information search and data mining. Comput. Eng. Des. 30(24), 5658–5662 (2009)
23.
go back to reference Swaraj, K.P., Manjula, D.: A fast approach to identify trending articles in hot topics from XML based big bibliographic datasets. Clust. Comput. 19(2), 837–848 (2016)CrossRef Swaraj, K.P., Manjula, D.: A fast approach to identify trending articles in hot topics from XML based big bibliographic datasets. Clust. Comput. 19(2), 837–848 (2016)CrossRef
24.
go back to reference Jiang, D.: The method and process of the definition to grammatical chunks in modern Tibetan. Minor. Lang. China 04, 30–39 (2003) Jiang, D.: The method and process of the definition to grammatical chunks in modern Tibetan. Minor. Lang. China 04, 30–39 (2003)
25.
go back to reference Chen, Y.Z., Li, B.L., et al.: An automatic Tibetan segmentation scheme based on case-auxiliary words and continuous features. Appl. Linguist. 01, 75–82 (2003) Chen, Y.Z., Li, B.L., et al.: An automatic Tibetan segmentation scheme based on case-auxiliary words and continuous features. Appl. Linguist. 01, 75–82 (2003)
26.
go back to reference He, X.Z., Li, Y.C., Ma, N., Yu, H.Z.: Study on Tibetan automatic word segmentation as syllable tagging. Appl. Res. Comput. 32(7), 1989–1991 (2015) He, X.Z., Li, Y.C., Ma, N., Yu, H.Z.: Study on Tibetan automatic word segmentation as syllable tagging. Appl. Res. Comput. 32(7), 1989–1991 (2015)
27.
go back to reference Zhu, J., Li, T.R.: Research on Tibetan stop words selection and automatic processing method. J. Chin. Inf. Process. 29(2), 125–132 (2015) Zhu, J., Li, T.R.: Research on Tibetan stop words selection and automatic processing method. J. Chin. Inf. Process. 29(2), 125–132 (2015)
28.
go back to reference Powers, D.M.W.: Applications and explanations of Zipf’s law. Adv. Neural Inf. Process. Syst. 5(4), 595–599 (1998) Powers, D.M.W.: Applications and explanations of Zipf’s law. Adv. Neural Inf. Process. Syst. 5(4), 595–599 (1998)
Metadata
Title
Research on multi-feature fusion algorithm for subject words extraction and summary generation of text
Authors
Gui-Xian Xu
Hai-Shen Yao
Changzhi Wang
Publication date
16-10-2017
Publisher
Springer US
Published in
Cluster Computing / Issue Special Issue 5/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1219-3

Other articles of this Special Issue 5/2019

Cluster Computing 5/2019 Go to the issue

Premium Partner