Skip to main content
Erschienen in:
Buchtitelbild

2017 | OriginalPaper | Buchkapitel

Information Retrieval for Gujarati Language Using Cosine Similarity Based Vector Space Model

verfasst von : Rajnish M. Rakholia, Jatinderkumar R. Saini

Erschienen in: Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Based on user query, to retrieve most relevant documents from the web for resource poor languages is a crucial task in Information Retrieval (IR) system. This paper presents Cosine Similarity Based Vector Space Document Model (VSDM) for Information Retrieval in Gujarati language. VSDM is widely used in information retrieval and document classification where each document is represented as a vector and each dimension corresponds to a separate term. Influence and relevancy of documents with user query is measured using cosine similarity under vector space where set of documents is considered as a set of vectors. The present work considers user query as a free order text, i.e., the word sequence does not affect results of the IR system. Technically, this is Natural Language Processing (NLP) application wherein stop-words removal, Term Frequency (TF) calculation, Normalized Term Frequency (NF) calculation and Inverse Document Frequency (IDF) calculation was done for 1360 files using Text and PDF formats and precision and recall values of 78 % and 86 % efficiency respectively were recorded. To the best of our knowledge, this is first IR task in Gujarati language using cosine similarity based calculations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat C. Sartori, A comparison of term weighting schemes for text classification and sentiment analysis with a supervised variant of tf. idf, in Data Management Technologies and Applications: 4th International Conference, DATA 2015, Colmar, France, July 20–22, 2015, Revised Selected Papers, vol. 584 (Springer, 2016), p. 39 C. Sartori, A comparison of term weighting schemes for text classification and sentiment analysis with a supervised variant of tf. idf, in Data Management Technologies and Applications: 4th International Conference, DATA 2015, Colmar, France, July 20–22, 2015, Revised Selected Papers, vol. 584 (Springer, 2016), p. 39
2.
Zurück zum Zitat B. Li, L. Han, Distance weighted cosine similarity measure for text classification, in Intelligent Data Engineering and Automated Learning–IDEAL 2013 (Springer, Berlin, 2013), pp. 611–618 B. Li, L. Han, Distance weighted cosine similarity measure for text classification, in Intelligent Data Engineering and Automated Learning–IDEAL 2013 (Springer, Berlin, 2013), pp. 611–618
3.
Zurück zum Zitat G.A. Al-Talib, H.S. Hassan, A study on analysis of SMS classification using TF-IDF Weighting. Int. J. Comput. Netw. Commun. Secur. (IJCNCS) 1(5), 189–194 (2013) G.A. Al-Talib, H.S. Hassan, A study on analysis of SMS classification using TF-IDF Weighting. Int. J. Comput. Netw. Commun. Secur. (IJCNCS) 1(5), 189–194 (2013)
4.
Zurück zum Zitat M. Kumar, R. Vig, e-Library content generation using WorldNet Tf-Idf semantics, in Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) (Springer, Berlin, 2013), pp. 221–227 M. Kumar, R. Vig, e-Library content generation using WorldNet Tf-Idf semantics, in Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) (Springer, Berlin, 2013), pp. 221–227
5.
Zurück zum Zitat W. Zhang, T. Yoshida, X. Tang, A comparative study of TFx IDF, LSI and multi-words for text classification. Expert Syst. Appl. 38(3), 2758–2765 (2011)CrossRef W. Zhang, T. Yoshida, X. Tang, A comparative study of TFx IDF, LSI and multi-words for text classification. Expert Syst. Appl. 38(3), 2758–2765 (2011)CrossRef
6.
Zurück zum Zitat J. Ramos, Using tf-idf to determine word relevance in document queries, in Proceedings of the First Instructional Conference on Machine Learning (2003) J. Ramos, Using tf-idf to determine word relevance in document queries, in Proceedings of the First Instructional Conference on Machine Learning (2003)
7.
Zurück zum Zitat D.L. Lee, H. Chuang, K. Seamons, Document ranking and the vector-space model. IEEE Softw. 14(2), 67–75 (1997)CrossRef D.L. Lee, H. Chuang, K. Seamons, Document ranking and the vector-space model. IEEE Softw. 14(2), 67–75 (1997)CrossRef
Metadaten
Titel
Information Retrieval for Gujarati Language Using Cosine Similarity Based Vector Space Model
verfasst von
Rajnish M. Rakholia
Jatinderkumar R. Saini
Copyright-Jahr
2017
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-3156-4_1

Premium Partner