Skip to main content
Top

2018 | OriginalPaper | Chapter

Keyword Extraction from Hindi Documents Using Document Statistics and Fuzzy Modelling

Authors : Sifatullah Siddiqi, Aditi Sharan

Published in: Information Systems Design and Intelligent Applications

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we put forward a novel unsupervised, domain independent and corpus independent approach for automatic keyword extraction. Our approach combines the document statistics of frequency and spatial distribution of a word in order to extract the keywords. We have extracted keywords from Hindi documents using document statistics and utilized the power of fuzzy logic to combine those document statistics effectively for better results. Further, we use this information to frame fuzzy rules for keyword extraction. Main advantages of our approach are that it uses the fuzzy membership for the variables instead of dealing with crisp thresholds and corpus independent setting of fuzzy membership boundaries. Our work is especially significant in the light that it has been implemented and tested on Hindi which is a resource poor and underrepresented language.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Zahang, C., Wang, H., Liu, Y., Wu, D., Liao, Y., & Wang, B.: Automatic Keyword Extraction from Documents Using Conditional Random Fields, Journal of CIS (2008), pp. 1169–1180. Zahang, C., Wang, H., Liu, Y., Wu, D., Liao, Y., & Wang, B.: Automatic Keyword Extraction from Documents Using Conditional Random Fields, Journal of CIS (2008), pp. 1169–1180.
2.
go back to reference Ortuño, M. et al.: Keyword detection in natural languages and DNA, Europhys. Lett. (2002). Ortuño, M. et al.: Keyword detection in natural languages and DNA, Europhys. Lett. (2002).
3.
go back to reference Luhn, H. P.: A Statistical Approach to Mechanized Encoding and Searching of Literary Information, IBM Journal of Research and Development, 1 (4). (1957) pp. 309–317. Luhn, H. P.: A Statistical Approach to Mechanized Encoding and Searching of Literary Information, IBM Journal of Research and Development, 1 (4). (1957) pp. 309–317.
4.
go back to reference G. Salton, C. S. Yang, Yu, C. T.: A Theory of Term Importance in Automatic Text Analysis, Journal of the American society for Information Science, 26(1), (1975) pp. 33–44. G. Salton, C. S. Yang, Yu, C. T.: A Theory of Term Importance in Automatic Text Analysis, Journal of the American society for Information Science, 26(1), (1975) pp. 33–44.
5.
go back to reference Herrera, J.P., Pury, P.A.: Statistical keyword detection in literary corpora, The European physical journal, (2008). Herrera, J.P., Pury, P.A.: Statistical keyword detection in literary corpora, The European physical journal, (2008).
6.
go back to reference Carpena, P. et al.: Level statistics of words-Finding keywords in literary texts and symbolic sequences, Physical Review E, (2009). Carpena, P. et al.: Level statistics of words-Finding keywords in literary texts and symbolic sequences, Physical Review E, (2009).
Metadata
Title
Keyword Extraction from Hindi Documents Using Document Statistics and Fuzzy Modelling
Authors
Sifatullah Siddiqi
Aditi Sharan
Copyright Year
2018
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-7512-4_35

Premium Partner