Skip to main content

2015 | OriginalPaper | Buchkapitel

An Improvised Extractive Approach to Hindi Text Summarization

verfasst von : K. Vimal Kumar, Divakar Yadav

Erschienen in: Information Systems Design and Intelligent Applications

Verlag: Springer India

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Text summarization is defined as a task of minimizing a text that is produced from one or more texts such that the actual significant information in the texts is not lost. A text summarization tool compresses the text and displays only the important content to the user. Using text summarization, decisions can be made in lesser time and the core of the document be understood. This paper emphasizes on an extractive approach and its implementation on Java. The extractive approach selects the significant sentences based on a thematic approach. Before selecting the thematic words the Hindi stop-words was removed and also the stemming process to retrieve the root words in the sentences under consideration. Stop-word elimination eliminates the semantically null words from the input document and stemming helps in clustering together words with the same radix term. The system is based on an algorithm for scoring the sentences based on occurrence of the radix of thematic words. The sentences with highest score are added to the summary. The generated summary is further processed based on removal of extraneous phrases from the previously selected summary sentences so as to bring the sentences closer to human generated summary. The testing of the accuracy of the system can be made by using a technique called The Expert Game. In expert game, experts underline and extract the most interesting or informative fragments of the text. The recall and precision of the system’s summary is measured against the human’s extract. Based on the testing, the system is found to be 85 % accurate.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lloret, E., Palomar, M.: Finding the best approach for multi-lingual text summarisation: a comparative analysis. In: Proceedings of Recent Advances in Natural Language Processing (RANLP), Hissar, Bulgaria (2011) Lloret, E., Palomar, M.: Finding the best approach for multi-lingual text summarisation: a comparative analysis. In: Proceedings of Recent Advances in Natural Language Processing (RANLP), Hissar, Bulgaria (2011)
2.
Zurück zum Zitat Lloret, E., Palomar, M.: Text summarisation in progress: a literature review. Artif. Intell. Rev. 37(1), pp. 1–41 (2012). ISSN: 0269-2821 Lloret, E., Palomar, M.: Text summarisation in progress: a literature review. Artif. Intell. Rev. 37(1), pp. 1–41 (2012). ISSN: 0269-2821
3.
Zurück zum Zitat Alguliev, R.M., Aliguliyev, R.M.: Effective summarization method of text documents. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (WI’05), pp. 1–8 (2005) Alguliev, R.M., Aliguliyev, R.M.: Effective summarization method of text documents. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (WI’05), pp. 1–8 (2005)
4.
Zurück zum Zitat Mangairkarasi, S., Gunasundari, S.: Semantic based text summarization using universal networking language. Int. J. Appl. Inf. Syst. 3(8), 18–23 (2012) (Published by Foundation of Computer Science, New York, USA, August 2012) Mangairkarasi, S., Gunasundari, S.: Semantic based text summarization using universal networking language. Int. J. Appl. Inf. Syst. 3(8), 18–23 (2012) (Published by Foundation of Computer Science, New York, USA, August 2012)
5.
Zurück zum Zitat Juneja, V., Germesin, S., Kleinbauer, T.: A learning-based sampling approach to extractive summarization. In: Proceedings of the NAACL HLT 2010 Student Research Workshop, pp. 34–39 (2010) Juneja, V., Germesin, S., Kleinbauer, T.: A learning-based sampling approach to extractive summarization. In: Proceedings of the NAACL HLT 2010 Student Research Workshop, pp. 34–39 (2010)
6.
Zurück zum Zitat Gupta, V., Lehal, G.S.: Survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2(3), pp. 258–268 (2010) Gupta, V., Lehal, G.S.: Survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2(3), pp. 258–268 (2010)
7.
Zurück zum Zitat Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarization text documents: sentence selection and evaluation metrics. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99), Berkeley, USA, 15–19 Aug 1999, pp. 121–128 Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarization text documents: sentence selection and evaluation metrics. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99), Berkeley, USA, 15–19 Aug 1999, pp. 121–128
8.
Zurück zum Zitat Ramanathan, A., Rao, D.D.: A lightweight stemmer for Hindi. In: Proceedings of EACL (2003) Ramanathan, A., Rao, D.D.: A lightweight stemmer for Hindi. In: Proceedings of EACL (2003)
9.
Zurück zum Zitat Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980) Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
10.
Zurück zum Zitat Gupta, V., Lehal, G.S.: Features selection and weight learning for Punjabi text summarization. Int. J. Eng. Trends Technol. 2(2), 45–48 (2011) Gupta, V., Lehal, G.S.: Features selection and weight learning for Punjabi text summarization. Int. J. Eng. Trends Technol. 2(2), 45–48 (2011)
11.
Zurück zum Zitat Chen, F., Han, K., Chen, G.: An approach to sentence selection based text summarization. In: Proceedings of IEEE TENCON02, pp. 489–493 (2002) Chen, F., Han, K., Chen, G.: An approach to sentence selection based text summarization. In: Proceedings of IEEE TENCON02, pp. 489–493 (2002)
12.
Zurück zum Zitat Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the 6th Applied Natural Language Processing Conference (2000) Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the 6th Applied Natural Language Processing Conference (2000)
13.
Zurück zum Zitat Jing, H.: Cut-and-paste text summarization. Ph.D. thesis, Department of Computer Science, Columbia University, New York (2001) Jing, H.: Cut-and-paste text summarization. Ph.D. thesis, Department of Computer Science, Columbia University, New York (2001)
14.
Zurück zum Zitat Ray, P.R., Harish, V., Basu, A., Sarkar, S.: Part of speech tagging and local word grouping techniques for natural language processing. ICON (2003) Ray, P.R., Harish, V., Basu, A., Sarkar, S.: Part of speech tagging and local word grouping techniques for natural language processing. ICON (2003)
15.
Zurück zum Zitat Patel, A., Siddiqui, T., Tiwary, U.S.: A language independent approach to multilingual text summarization. Conference RIAO2007, Pittsburgh, PA, USA (2007) Patel, A., Siddiqui, T., Tiwary, U.S.: A language independent approach to multilingual text summarization. Conference RIAO2007, Pittsburgh, PA, USA (2007)
16.
Zurück zum Zitat Mihalcea, R., Tarau, P.: An algorithm for language independent single and multiple document summarization. In: Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), Korea (2005) Mihalcea, R., Tarau, P.: An algorithm for language independent single and multiple document summarization. In: Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), Korea (2005)
Metadaten
Titel
An Improvised Extractive Approach to Hindi Text Summarization
verfasst von
K. Vimal Kumar
Divakar Yadav
Copyright-Jahr
2015
Verlag
Springer India
DOI
https://doi.org/10.1007/978-81-322-2250-7_28

Premium Partner