Skip to main content

2016 | OriginalPaper | Buchkapitel

Open-Source Search Engines in the Cloud

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The key to the success of the analysis of petabytes of textual data available at our fingertips is to do it in the cloud. Today, several extensions exist that bring Lucene, the open-source de facto standard of textual search engine libraries, to the cloud. These extensions come in three main directions: implementing scalable distribution of the indices over the file system, storing them in NoSQL databases, and porting them to inherently distributed ecosystems. In this work, we evaluate the existing efforts in terms of distribution, high availability, fault tolerance, manageability, and high performance. We are committed to using common open-source technology only. So, we restrict our evaluation to publicly available open-source libraries and eventually fix their bugs. For each system under investigation, we build a benchmarking system by indexing the whole Wikipedia content and submitting hundreds of simultaneous search requests. By measuring the performance of both indexing and searching operations, we report of the most favorable constellation of open-source libraries that can be installed in the cloud.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Akioka, S., Muraoka, Y.: HPC Benchmarks on Amazon EC2. In: IEEE 24th International Conference on Advanced Information Networking and Applications Workshops (2010) Akioka, S., Muraoka, Y.: HPC Benchmarks on Amazon EC2. In: IEEE 24th International Conference on Advanced Information Networking and Applications Workshops (2010)
3.
Zurück zum Zitat Bojanova, I., Samba, A.: Analysis of cloud computing delivery architecture models. In: IEEE Workshops of International Conference on Advanced Information Networking and Applications (2011) Bojanova, I., Samba, A.: Analysis of cloud computing delivery architecture models. In: IEEE Workshops of International Conference on Advanced Information Networking and Applications (2011)
4.
Zurück zum Zitat Brewer, E.: Towards robust distributed systems. In: ACM Symposium on Principles of Distributed Computing (2000) Brewer, E.: Towards robust distributed systems. In: ACM Symposium on Principles of Distributed Computing (2000)
5.
Zurück zum Zitat Cutting, D., Pedersen, J.: Optimizations for dynamic inverted index maintenance. In: SIGIR 1990 (1990) Cutting, D., Pedersen, J.: Optimizations for dynamic inverted index maintenance. In: SIGIR 1990 (1990)
7.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
8.
Zurück zum Zitat Edlich, S., Friedland, A., Hampe, J., Brauer, B.: NoSQL: Introduction to the World of Non-relational Web 2.0 Databases (In German) NoSQL: Einstieg in die Welt nichtrelationaler Web 2.0 Datenbanken. Hanser Verlag, Munich (2010) Edlich, S., Friedland, A., Hampe, J., Brauer, B.: NoSQL: Introduction to the World of Non-relational Web 2.0 Databases (In German) NoSQL: Einstieg in die Welt nichtrelationaler Web 2.0 Datenbanken. Hanser Verlag, Munich (2010)
9.
Zurück zum Zitat Gupta, Y.: Kibana Essentials. Packt Publishing, Birmingham (2015) Gupta, Y.: Kibana Essentials. Packt Publishing, Birmingham (2015)
11.
Zurück zum Zitat Karambelkar, H.V.: Scaling Big Data with Hadoop and Solr, 2nd edn. Packt Publishing, Birmingham (2015) Karambelkar, H.V.: Scaling Big Data with Hadoop and Solr, 2nd edn. Packt Publishing, Birmingham (2015)
13.
Zurück zum Zitat Khare, R., et al.: Nutch: a flexible and scalable open-source web search engine. Technical report. Oregon State University, pp. 32–32 (2004) Khare, R., et al.: Nutch: a flexible and scalable open-source web search engine. Technical report. Oregon State University, pp. 32–32 (2004)
14.
Zurück zum Zitat Kuc, R., Rogozinski, M.: Mastering Elasticsearch, 2nd edn. Packt Publishing, Birmingham (2015) Kuc, R., Rogozinski, M.: Mastering Elasticsearch, 2nd edn. Packt Publishing, Birmingham (2015)
15.
Zurück zum Zitat Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRef Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRef
19.
Zurück zum Zitat McCandless, M., Hatcher, E., Gospodnetiæ, O.: Lucene in Action, 2nd edn. Manning, Greenwich (2010) McCandless, M., Hatcher, E., Gospodnetiæ, O.: Lucene in Action, 2nd edn. Manning, Greenwich (2010)
20.
Zurück zum Zitat Nagi, K.: Bringing information retrieval back to database management systems. In: International Conference on Information and Knowledge Engineering, IKE 2007 (2007) Nagi, K.: Bringing information retrieval back to database management systems. In: International Conference on Information and Knowledge Engineering, IKE 2007 (2007)
22.
Zurück zum Zitat Pessach, Y.: Distributed Storage: Concepts, Algorithms, and Implementations. CreateSpace Independent Publishing Platform (2013) Pessach, Y.: Distributed Storage: Concepts, Algorithms, and Implementations. CreateSpace Independent Publishing Platform (2013)
23.
Zurück zum Zitat Plugge, E., Hawkins, D., Membrey, P.: The Definitive Guide to mongoDB: The NoSQL Database for Cloud and Desktop Computing. Apress, Berkeley (2010)CrossRef Plugge, E., Hawkins, D., Membrey, P.: The Definitive Guide to mongoDB: The NoSQL Database for Cloud and Desktop Computing. Apress, Berkeley (2010)CrossRef
24.
Zurück zum Zitat Rabl, T., et al.: Solving big data challenges for enterprise application performance management. VLDB Endow. 5(12), 1724–1735 (2012)CrossRef Rabl, T., et al.: Solving big data challenges for enterprise application performance management. VLDB Endow. 5(12), 1724–1735 (2012)CrossRef
29.
Zurück zum Zitat Smiley, D., Pugh, E., Parisa, K., Mitchell, M.: Apache Solr Enterprise Search Server, 3rd edn. Packt Publishing, Birmingham (2015) Smiley, D., Pugh, E., Parisa, K., Mitchell, M.: Apache Solr Enterprise Search Server, 3rd edn. Packt Publishing, Birmingham (2015)
Metadaten
Titel
Open-Source Search Engines in the Cloud
verfasst von
Khaled Nagi
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-52758-1_7

Neuer Inhalt