Skip to main content

2016 | OriginalPaper | Buchkapitel

Evolving an Algorithm to Generate Sparse Inverted Index Using Hadoop and Pig

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Now a day’s users mostly prefer the keyword search method to access the data for the explosion of information. Inverted indexing efficiently plays a very important role for search operation over a large set of data. There are two problems exist in current keyword based searching technique. First, the large set of data is mostly unstructured and does not suite in the existing database systems. Second, the storage in inverted indexing is usually very large and compression techniques used so far is also not so efficient because they increase the processing time. To overcome these problems, Hadoop, which is a distributed framework for large dataset is needed where the required resources could be shared and accessed very easily. In our proposed work, we will join the list of consecutive document id in the inverted index into the intervals to save memory space. For this, we have developed the UDF (User Defined Function) for stemming and stop words for the sparse inverted index in pig latin. It can be observed in the results that our proposed method is efficient than existing techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Patil, M., Thankachan, S.V., Shah, R., Hon, W., Vitter, J.S., Chandrasekaran, S.: Inverted indexes for phrases and string. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 555–564. ACM (2011) Patil, M., Thankachan, S.V., Shah, R., Hon, W., Vitter, J.S., Chandrasekaran, S.: Inverted indexes for phrases and string. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 555–564. ACM (2011)
2.
Zurück zum Zitat Lim, L., Wang, M., Padmanabhan, S., Scot Vitter, J., Agarwal, R.: Efficient update of indexes for dynamically changing web documents. World Wide Web 10(1), 37–69 (2007)CrossRef Lim, L., Wang, M., Padmanabhan, S., Scot Vitter, J., Agarwal, R.: Efficient update of indexes for dynamically changing web documents. World Wide Web 10(1), 37–69 (2007)CrossRef
3.
Zurück zum Zitat Arsekar, R.A. et al.: Comparative study of mapreduce and pig in big data. International Journal of Current Engineering and Technology, vol. 5(2) (2015) Arsekar, R.A. et al.: Comparative study of mapreduce and pig in big data. International Journal of Current Engineering and Technology, vol. 5(2) (2015)
4.
Zurück zum Zitat Hsu, W.-C., Liao, I-E.: CIS-S: a compacted indexing scheme for efficient query evaluation of XML documents. Inf. Sci. 241(0), 195–211 (2013) Hsu, W.-C., Liao, I-E.: CIS-S: a compacted indexing scheme for efficient query evaluation of XML documents. Inf. Sci. 241(0), 195–211 (2013)
5.
Zurück zum Zitat Xu, G., Xu, F., Ma, H. Deploying and researching Hadoop in virtual machines. Published in: IEEE International Conference on Automation and Logistics (ICAL), Zhengzhou. pp. 395–399. ISSN: 2161-8151, E-ISBN: 978-1-4673-0363-7 (2012) Xu, G., Xu, F., Ma, H. Deploying and researching Hadoop in virtual machines. Published in: IEEE International Conference on Automation and Logistics (ICAL), Zhengzhou. pp. 395–399. ISSN: 2161-8151, E-ISBN: 978-1-4673-0363-7 (2012)
6.
Zurück zum Zitat Delbru, R., Campinas, S., Tummarello, G.: Searching web data: an entity retrieval and high-performance indexing model. World Wide Web 10(0), 33–58 (2012). Web-Scale Semantic Information Processing Delbru, R., Campinas, S., Tummarello, G.: Searching web data: an entity retrieval and high-performance indexing model. World Wide Web 10(0), 33–58 (2012). Web-Scale Semantic Information Processing
7.
Zurück zum Zitat Velusamy, K. et al.: Inverted indexing in big data using Hadoop multi nide cluster. In: IJCSA, vol. 4(11) (2013) Velusamy, K. et al.: Inverted indexing in big data using Hadoop multi nide cluster. In: IJCSA, vol. 4(11) (2013)
8.
Zurück zum Zitat Hammouda, K.M., Kamel, M.S.: Efficient phrase based document indexing for web document clustering. IEEE Trans. Knowl. Data Eng. 16(10), 1279–1296 (2004) Hammouda, K.M., Kamel, M.S.: Efficient phrase based document indexing for web document clustering. IEEE Trans. Knowl. Data Eng. 16(10), 1279–1296 (2004)
9.
Zurück zum Zitat Wu, H. et al.: Ginix: generalized inverted index for keyword search. In: IEEE Transactions on Knowledge and Data Mining, vol. 8(1) (2013) Wu, H. et al.: Ginix: generalized inverted index for keyword search. In: IEEE Transactions on Knowledge and Data Mining, vol. 8(1) (2013)
10.
Zurück zum Zitat Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: A not a foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (2008) Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: A not a foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (2008)
11.
Zurück zum Zitat Gugnani, S: Triple indexing: an efficient technique for fast phrase query evaluation. IJCA(0975 887). 87(13) (2014) Gugnani, S: Triple indexing: an efficient technique for fast phrase query evaluation. IJCA(0975 887). 87(13) (2014)
12.
Zurück zum Zitat Yang, R., Zhu, Q., Xia, Y.: A novel weighted phrase-based similarity for web documents clustering. J. Software 6(8), 1521–1528 (2011) Yang, R., Zhu, Q., Xia, Y.: A novel weighted phrase-based similarity for web documents clustering. J. Software 6(8), 1521–1528 (2011)
13.
Zurück zum Zitat Omanakuttan, S.: Inverted index schemes for keyword search: a survey of current best. In: International Journal of Advance Research in Computer Science and Management Studies, vol 3. ISSN 2321-7782 (2015) Omanakuttan, S.: Inverted index schemes for keyword search: a survey of current best. In: International Journal of Advance Research in Computer Science and Management Studies, vol 3. ISSN 2321-7782 (2015)
Metadaten
Titel
Evolving an Algorithm to Generate Sparse Inverted Index Using Hadoop and Pig
verfasst von
Sonam Sharma
Shailendra Singh
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-30927-9_49