Skip to main content
Top
Published in: Cluster Computing 2/2017

20-12-2016

SmallClient for big data: an indexing framework towards fast data retrieval

Authors: Aisha Siddiqa, Ahmad Karim, Victor Chang

Published in: Cluster Computing | Issue 2/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Numerous applications are continuously generating massive amount of data and it has become critical to extract useful information while maintaining acceptable computing performance. The objective of this work is to design an indexing framework which minimizes indexing overhead and improves query execution and data search performance with optimum aggregation of computing performance. We propose SmallClient, an indexing framework to speed up query execution. SmallClient has three modules: block creation, index creation and query execution. Block creation module supports improving data retrieval performance with minimum data uploading overhead. Index creation module allows maximum indexes on a dataset to increase index hit ratio with minimized indexing overhead. Finally, query execution module offers incoming queries to utilize these indexes. The evaluation shows that SmallClient outperforms Hadoop full scan with more than 90% search performance. Meanwhile, indexing overhead of SmallClient is reduced to approximately 50 and 80% for index size and indexing time respectively.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Vera-Baquero, A., Colomo-Palacios, R., Molloy, O.: Measuring and querying process performance in supply chains: an approach for mining big-data cloud storages. Proc. Comput. Sci. 64, 1026–1034 (2015)CrossRef Vera-Baquero, A., Colomo-Palacios, R., Molloy, O.: Measuring and querying process performance in supply chains: an approach for mining big-data cloud storages. Proc. Comput. Sci. 64, 1026–1034 (2015)CrossRef
2.
go back to reference Suthaharan, S.: Big data analytics. In: Machine Learning Models and Algorithms for Big Data Classification. Integrated Series in Information Systems, vol. 36, pp. 31-75. Springer, New York (2016) Suthaharan, S.: Big data analytics. In: Machine Learning Models and Algorithms for Big Data Classification. Integrated Series in Information Systems, vol. 36, pp. 31-75. Springer, New York (2016)
3.
go back to reference Karim, A., Salleh, R., Khan, M.K., Siddiqa, A., Choo, K.-K.R.: On the analysis and detection of mobile botnet applications. J. Univ. Comput. Sci. 22(4), 567–588 (2016) Karim, A., Salleh, R., Khan, M.K., Siddiqa, A., Choo, K.-K.R.: On the analysis and detection of mobile botnet applications. J. Univ. Comput. Sci. 22(4), 567–588 (2016)
4.
go back to reference Karim, A., Shah, S.A.A., Salleh, R.B., Arif, M., Noor, R.M., Shamshirband, S.: Mobile botnet attacks an emerging threat: classification, review and open issues. KSII Trans. Internet Inform. Syst. 9(4), 1471–1492 (2015) Karim, A., Shah, S.A.A., Salleh, R.B., Arif, M., Noor, R.M., Shamshirband, S.: Mobile botnet attacks an emerging threat: classification, review and open issues. KSII Trans. Internet Inform. Syst. 9(4), 1471–1492 (2015)
5.
go back to reference Yaqoob, I., Chang, V., Gani, A., Mokhtar, S., Hashem, I.A.T., Ahmed, E., Anuar, N.B., Khan, S.U.: Information fusion in social big data: foundations, state-of-the-art, applications, challenges, and future research directions. Int. J. Inform. Manag. (2016) Yaqoob, I., Chang, V., Gani, A., Mokhtar, S., Hashem, I.A.T., Ahmed, E., Anuar, N.B., Khan, S.U.: Information fusion in social big data: foundations, state-of-the-art, applications, challenges, and future research directions. Int. J. Inform. Manag. (2016)
7.
go back to reference Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. J. Parallel Distrib. Comput. 74(7), 2561–2573 (2014)CrossRef Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. J. Parallel Distrib. Comput. 74(7), 2561–2573 (2014)CrossRef
8.
go back to reference Siddiqa, A., TargioHashem, I.A., Yaqoob, I., Marjani, M., Shamshirband, S., Gani, A., Nasaruddin, F.: A survey of big data management: taxonomy and state-of-the-art. J. Netw. Comput. Appl. 71, 151–166 (2016)CrossRef Siddiqa, A., TargioHashem, I.A., Yaqoob, I., Marjani, M., Shamshirband, S., Gani, A., Nasaruddin, F.: A survey of big data management: taxonomy and state-of-the-art. J. Netw. Comput. Appl. 71, 151–166 (2016)CrossRef
9.
go back to reference Siddiqa, A., Karim, A., Gani, A.: Big data storage technologies: a survey. Front. Inform. Technol. Electron. Eng. 4(3), 28–33 (2016) Siddiqa, A., Karim, A., Gani, A.: Big data storage technologies: a survey. Front. Inform. Technol. Electron. Eng. 4(3), 28–33 (2016)
10.
go back to reference Chang, V., Wills, G.: A model to compare cloud and non-cloud storage of big data. Future Gener. Comput. Syst. 57, 56–76 (2016)CrossRef Chang, V., Wills, G.: A model to compare cloud and non-cloud storage of big data. Future Gener. Comput. Syst. 57, 56–76 (2016)CrossRef
11.
go back to reference Lomotey, Richard K., Deters, Ralph: Unstructured data mining: use case for CouchDB. Int. J. Big Data Intell. 2(3), 168–182 (2015)CrossRef Lomotey, Richard K., Deters, Ralph: Unstructured data mining: use case for CouchDB. Int. J. Big Data Intell. 2(3), 168–182 (2015)CrossRef
12.
go back to reference Yu, Shanshan, Jindian, Su, Li, Pengfei, Wang, Hao: Towards high performance text mining: a TextRank-based method for automatic text summarization. Int. J. Grid High Perform. Comput. 8(2), 58–75 (2016)CrossRef Yu, Shanshan, Jindian, Su, Li, Pengfei, Wang, Hao: Towards high performance text mining: a TextRank-based method for automatic text summarization. Int. J. Grid High Perform. Comput. 8(2), 58–75 (2016)CrossRef
13.
go back to reference Yu, Kun-Ming, Liu, Sheng-Hui, Zhou, Li-Wei, Shu-Hao, Wu: Apriori-based high efficiency load balancing parallel data mining algorithms on multi-core architectures. Int. J. Grid High Perform. Comput. 7(2), 77–99 (2015)CrossRef Yu, Kun-Ming, Liu, Sheng-Hui, Zhou, Li-Wei, Shu-Hao, Wu: Apriori-based high efficiency load balancing parallel data mining algorithms on multi-core architectures. Int. J. Grid High Perform. Comput. 7(2), 77–99 (2015)CrossRef
14.
go back to reference Dittrich, J., Quian, J.-A., Richter, S., Schuh, S., Jindal, A., Schad, J.: Only aggressive elephants are fast elephants. Proc. VLDB Endow. 5(11), 1591–1602 (2012)CrossRef Dittrich, J., Quian, J.-A., Richter, S., Schuh, S., Jindal, A., Schad, J.: Only aggressive elephants are fast elephants. Proc. VLDB Endow. 5(11), 1591–1602 (2012)CrossRef
15.
go back to reference Idreos, S., Alagiannis, I., Johnson, R., Ailamaki, A.: Here are my Data Files. Here are my Queries. Where are my Results? In: Proceedings of 5th Biennial Conference on Innovative Data Systems Research, No. EPFL-CONF-161489 2011, vol. EPFL-CONF-161489 (2011) Idreos, S., Alagiannis, I., Johnson, R., Ailamaki, A.: Here are my Data Files. Here are my Queries. Where are my Results? In: Proceedings of 5th Biennial Conference on Innovative Data Systems Research, No. EPFL-CONF-161489 2011, vol. EPFL-CONF-161489 (2011)
16.
go back to reference Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inform. Manag. 35(2), 137–144 (2015)CrossRef Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inform. Manag. 35(2), 137–144 (2015)CrossRef
17.
go back to reference Richter, S., Quian-Ruiz, J.-A., Schuh, S., Dittrich, J.: Towards zero-overhead adaptive indexing in Hadoop. arXiv preprint arXiv:1212.3480 (2012) Richter, S., Quian-Ruiz, J.-A., Schuh, S., Dittrich, J.: Towards zero-overhead adaptive indexing in Hadoop. arXiv preprint arXiv:​1212.​3480 (2012)
18.
go back to reference Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. CIDR 3, 1–8 (2007) Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. CIDR 3, 1–8 (2007)
19.
go back to reference Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 165–178 (2009) Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 165–178 (2009)
20.
go back to reference Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2(1), 922–933 (2009) Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2(1), 922–933 (2009)
21.
go back to reference Jens, D., Jorge-Arnulfo, O.-R., Alekh, J.: Hadoop++: making a yellow elephant run like a cheetah. Proc. VLDB Endow. 3(1–2), 515–529 (2010) Jens, D., Jorge-Arnulfo, O.-R., Alekh, J.: Hadoop++: making a yellow elephant run like a cheetah. Proc. VLDB Endow. 3(1–2), 515–529 (2010)
22.
go back to reference Zhuang, Y., Jiang, N., Wu, Z., Li, Q., Chiu, D.K.W., Hu, H.: Efficient and robust large medical image retrieval in mobile cloud computing environment. Inform. Sci. 263, 60–86 (2014)CrossRef Zhuang, Y., Jiang, N., Wu, Z., Li, Q., Chiu, D.K.W., Hu, H.: Efficient and robust large medical image retrieval in mobile cloud computing environment. Inform. Sci. 263, 60–86 (2014)CrossRef
23.
go back to reference Wang, M., Holub, V., Murphy, J., O’Sullivan, P.: High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring. Future Gener. Comput. Syst. 29(8), 1943–1962 (2013)CrossRef Wang, M., Holub, V., Murphy, J., O’Sullivan, P.: High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring. Future Gener. Comput. Syst. 29(8), 1943–1962 (2013)CrossRef
24.
go back to reference Kaushik, V.D., Umarani, J., Gupta, A.K., Gupta, A.K., Gupta, P.: An efficient indexing scheme for face database using modified geometric hashing. Neurocomputing 116, 208–221 (2013)CrossRef Kaushik, V.D., Umarani, J., Gupta, A.K., Gupta, A.K., Gupta, P.: An efficient indexing scheme for face database using modified geometric hashing. Neurocomputing 116, 208–221 (2013)CrossRef
25.
go back to reference Gani, A., Siddiqa, A., Shamshirband, S., Hanum, F.: A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl. Inf. Syst. 46(2), 241–284 (2016)CrossRef Gani, A., Siddiqa, A., Shamshirband, S., Hanum, F.: A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl. Inf. Syst. 46(2), 241–284 (2016)CrossRef
26.
go back to reference Jin, R., Cho, H.-J., Chung, T.-S.: A group round robin based b-tree index storage scheme for flash memory devices. Paper presented at the Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication, Siem Reap, Cambodia (2014) Jin, R., Cho, H.-J., Chung, T.-S.: A group round robin based b-tree index storage scheme for flash memory devices. Paper presented at the Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication, Siem Reap, Cambodia (2014)
27.
go back to reference Chi, P., Lee, W.-C., Xie, Y.: Making B<sup>+</sup>-tree efficient in PCM-based main memory. Paper presented at the Proceedings of the 2014 international symposium on Low power electronics and design, La Jolla (2014) Chi, P., Lee, W.-C., Xie, Y.: Making B<sup>+</sup>-tree efficient in PCM-based main memory. Paper presented at the Proceedings of the 2014 international symposium on Low power electronics and design, La Jolla (2014)
28.
go back to reference McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co., Chicago (2010) McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co., Chicago (2010)
29.
go back to reference Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
30.
go back to reference Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)CrossRef Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)CrossRef
31.
go back to reference Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on 2010, pp. 1–10 (2010) Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on 2010, pp. 1–10 (2010)
32.
go back to reference Eldawy, A., Mokbel, M.F.: Spatial Hadoop: A MapReduce Framework for Spatial Data. In: 2015 IEEE 31st International Conference on Data Engineering 2015, pp. 1352–1363. IEEE:1352-1363 (2015) Eldawy, A., Mokbel, M.F.: Spatial Hadoop: A MapReduce Framework for Spatial Data. In: 2015 IEEE 31st International Conference on Data Engineering 2015, pp. 1352–1363. IEEE:1352-1363 (2015)
34.
go back to reference McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co., Chicago (2010) McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co., Chicago (2010)
Metadata
Title
SmallClient for big data: an indexing framework towards fast data retrieval
Authors
Aisha Siddiqa
Ahmad Karim
Victor Chang
Publication date
20-12-2016
Publisher
Springer US
Published in
Cluster Computing / Issue 2/2017
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-016-0712-4

Other articles of this Issue 2/2017

Cluster Computing 2/2017 Go to the issue

Premium Partner