Skip to main content
Erschienen in: Frontiers of Information Technology & Electronic Engineering 8/2017

01.08.2017 | Review

Big data storage technologies: a survey

verfasst von: Aisha Siddiqa, Ahmad Karim, Abdullah Gani

Erschienen in: Frontiers of Information Technology & Electronic Engineering | Ausgabe 8/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

There is a great thrust in industry toward the development of more feasible and viable tools for storing fast-growing volume, velocity, and diversity of data, termed ‘big data’. The structural shift of the storage mechanism from traditional data management systems to NoSQL technology is due to the intention of fulfilling big data storage requirements. However, the available big data storage technologies are inefficient to provide consistent, scalable, and available solutions for continuously growing heterogeneous data. Storage is the preliminary process of big data analytics for real-world applications such as scientific experiments, healthcare, social networks, and e-business. So far, Amazon, Google, and Apache are some of the industry standards in providing big data storage solutions, yet the literature does not report an in-depth survey of storage technologies available for big data, investigating the performance and magnitude gains of these technologies. The primary objective of this paper is to conduct a comprehensive investigation of state-of-the-art storage technologies available for big data. A well-defined taxonomy of big data storage technologies is presented to assist data analysts and researchers in understanding and selecting a storage mechanism that better fits their needs. To evaluate the performance of different storage architectures, we compare and analyze the existing approaches using Brewer’s CAP theorem. The significance and applications of storage technologies and support to other categories are discussed. Several future research challenges are highlighted with the intention to expedite the deployment of a reliable and scalable storage system.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aasman, J., 2008. Event Processing Using an RDF Database (White Paper). Association for the Advancement of Artificial Intelligence, p.1–5. Aasman, J., 2008. Event Processing Using an RDF Database (White Paper). Association for the Advancement of Artificial Intelligence, p.1–5.
Zurück zum Zitat Anderson, J.C., Lehnardt, J., Slater, N., 2010. CouchDB: the Definitive Guide. O’Reilly Media, Inc., California. Anderson, J.C., Lehnardt, J., Slater, N., 2010. CouchDB: the Definitive Guide. O’Reilly Media, Inc., California.
Zurück zum Zitat Armbrust, M., Fox, A., Patterson, D., et al., 2009. Scads: scaleindependent storage for social computing applications. arXiv:0909.1775. Armbrust, M., Fox, A., Patterson, D., et al., 2009. Scads: scaleindependent storage for social computing applications. arXiv:0909.1775.
Zurück zum Zitat Azeem, R., Khan, M.I.A., 2012. Techniques about data replication for mobile ad-hoc network databases. Int. J. Multidiscipl. Sci. Eng., 3(5): 53–57. Azeem, R., Khan, M.I.A., 2012. Techniques about data replication for mobile ad-hoc network databases. Int. J. Multidiscipl. Sci. Eng., 3(5): 53–57.
Zurück zum Zitat Banker, K., 2011. MongoDB in Action. Manning Publications Co., New York. Banker, K., 2011. MongoDB in Action. Manning Publications Co., New York.
Zurück zum Zitat Baron, J., Kotecha, S., 2013. Storage Options in the AWS Cloud. Technical Report, Amazon Web Services, Washington DC. Baron, J., Kotecha, S., 2013. Storage Options in the AWS Cloud. Technical Report, Amazon Web Services, Washington DC.
Zurück zum Zitat Batra, S., Tyagi, C., 2012. Comparative analysis of relational and graph databases. Int. J. Soft Comput. Eng., 2(2): 509–512. Batra, S., Tyagi, C., 2012. Comparative analysis of relational and graph databases. Int. J. Soft Comput. Eng., 2(2): 509–512.
Zurück zum Zitat Burrows, M., 2006. The Chubby lock service for loosely-coupled distributed systems. Proc. 7th Symp. on Operating Systems Design and Implementation, p.335–350. Burrows, M., 2006. The Chubby lock service for loosely-coupled distributed systems. Proc. 7th Symp. on Operating Systems Design and Implementation, p.335–350.
Zurück zum Zitat Carlson, J., 2013. Redis in Action. Manning Publications Co., New York. Carlson, J., 2013. Redis in Action. Manning Publications Co., New York.
Zurück zum Zitat Cichocki, A., 2014. Era of big data processing: a new approach via tensor networks and tensor decompositions. arXiv:1403.2048. Cichocki, A., 2014. Era of big data processing: a new approach via tensor networks and tensor decompositions. arXiv:1403.2048.
Zurück zum Zitat Diack, B.W., Ndiaye, S., Slimani, Y., 2013. CAP theorem between claims and misunderstandings: what is to be sacrificed? Int. J. Adv. Sci. Technol., 56: 1–12. Diack, B.W., Ndiaye, S., Slimani, Y., 2013. CAP theorem between claims and misunderstandings: what is to be sacrificed? Int. J. Adv. Sci. Technol., 56: 1–12.
Zurück zum Zitat Dominguez-Sal, D., Urbón-Bayes, P., Giménez-Vañó, A., et al., 2010. Survey of graph database performance on the HPC scalable graph analysis benchmark. In: Shen, H.T., Pei, J., Özsu, M.T., et al. (Eds.), Web-Age Information Management. Springer Berlin Heidelberg, p.37–48. https://doi.org/10.1007/978-3-642-16720-1_4CrossRef Dominguez-Sal, D., Urbón-Bayes, P., Giménez-Vañó, A., et al., 2010. Survey of graph database performance on the HPC scalable graph analysis benchmark. In: Shen, H.T., Pei, J., Özsu, M.T., et al. (Eds.), Web-Age Information Management. Springer Berlin Heidelberg, p.37–48. https://​doi.​org/​10.​1007/​978-3-642-16720-1_​4CrossRef
Zurück zum Zitat George, L., 2011. HBase: the Definitive Guide. O’Reilly Media, Inc., California. George, L., 2011. HBase: the Definitive Guide. O’Reilly Media, Inc., California.
Zurück zum Zitat Gray, J., 1981. The transaction concept: virtues and limitations. Proc. 7th Int. Conf. on Very Large Data Bases, p.144–154. Gray, J., 1981. The transaction concept: virtues and limitations. Proc. 7th Int. Conf. on Very Large Data Bases, p.144–154.
Zurück zum Zitat Habeeb, M., 2010. A Developer’s Guide to Amazon SimpleDB. Addison-Wesley Professional. Habeeb, M., 2010. A Developer’s Guide to Amazon SimpleDB. Addison-Wesley Professional.
Zurück zum Zitat Helmke, M., 2012. Ubuntu Unleashed 2012 Edition: Covering 11.10 and 12.04. Sams Publishing. Helmke, M., 2012. Ubuntu Unleashed 2012 Edition: Covering 11.10 and 12.04. Sams Publishing.
Zurück zum Zitat Hewitt, E., 2010. Cassandra: the Definitive Guide. O’Reilly Media, Inc., California. Hewitt, E., 2010. Cassandra: the Definitive Guide. O’Reilly Media, Inc., California.
Zurück zum Zitat Khetrapal, A., Ganesh, V., 2006. HBase and Hypertable for Large Scale Distributed Storage Systems. Department of Computer Science, Purdue University. Khetrapal, A., Ganesh, V., 2006. HBase and Hypertable for Large Scale Distributed Storage Systems. Department of Computer Science, Purdue University.
Zurück zum Zitat Kristina, C., Michael, D., 2010. MongoDB: the Definitive Guide. O’Reilly Media, Inc., California. Kristina, C., Michael, D., 2010. MongoDB: the Definitive Guide. O’Reilly Media, Inc., California.
Zurück zum Zitat Niranjanamurthy, M., Archana, U.L., Niveditha, K.T., et al., 2014. The research study on DynamoDB—NoSQL database service. Int. J. Comput. Sci. Mob. Comput., 3(10): 268–279. Niranjanamurthy, M., Archana, U.L., Niveditha, K.T., et al., 2014. The research study on DynamoDB—NoSQL database service. Int. J. Comput. Sci. Mob. Comput., 3(10): 268–279.
Zurück zum Zitat Oliveira, S.F., Fürlinger, K., Kranzlmüller, D., 2012. Trends in computation, communication and storage and the consequences for data-intensive science. IEEE 14th Int. Conf. on High Performance Computing and Communication and IEEE 9th Int. Conf. on Embedded Software and Systems, p.572–579. https://doi.org/10.1109/HPCC.2012.83 Oliveira, S.F., Fürlinger, K., Kranzlmüller, D., 2012. Trends in computation, communication and storage and the consequences for data-intensive science. IEEE 14th Int. Conf. on High Performance Computing and Communication and IEEE 9th Int. Conf. on Embedded Software and Systems, p.572–579. https://​doi.​org/​10.​1109/​HPCC.​2012.​83
Zurück zum Zitat Shvachko, K.V., 2010. HDFS scalability: the limits to growth. Login, 35(2): 6–16. Shvachko, K.V., 2010. HDFS scalability: the limits to growth. Login, 35(2): 6–16.
Zurück zum Zitat Sumbaly, R., Kreps, J., Gao, L., et al., 2012. Serving largescale batch computed data with project Voldemort. Proc. 10th USENIX Conf. on File and Storage Technologies, p.18. Sumbaly, R., Kreps, J., Gao, L., et al., 2012. Serving largescale batch computed data with project Voldemort. Proc. 10th USENIX Conf. on File and Storage Technologies, p.18.
Zurück zum Zitat Tanenbaum, A., van Steen, M., 2007. Distributed Systems. Pearson Prentice Hall.MATH Tanenbaum, A., van Steen, M., 2007. Distributed Systems. Pearson Prentice Hall.MATH
Zurück zum Zitat Vyas, U., Kuppusamy, P., 2014. DynamoDB Applied Design Patterns. Packt Publishing Ltd., Birmingham. Vyas, U., Kuppusamy, P., 2014. DynamoDB Applied Design Patterns. Packt Publishing Ltd., Birmingham.
Zurück zum Zitat Walsh, L., Akhmechet, V., Glukhovsky, M., 2009. RethinkDBRethinking Database Storage (White Paper). Walsh, L., Akhmechet, V., Glukhovsky, M., 2009. RethinkDBRethinking Database Storage (White Paper).
Zurück zum Zitat Wang, H.J., Li, J.H., Zhang, H.M., et al., 2014. Benchmarking Replication and Consistency Strategies in Cloud Serving Databases: HBase and Cassandra. In: Zhan, J.F., Han, R., Weng, C.L. (Eds.), Big Data Benchmarks, Performance Optimization, and Emerging Hardware. Springer International Publishing, p.71–82. https://doi.org/10.1007/978-3-319-13021-7_6 Wang, H.J., Li, J.H., Zhang, H.M., et al., 2014. Benchmarking Replication and Consistency Strategies in Cloud Serving Databases: HBase and Cassandra. In: Zhan, J.F., Han, R., Weng, C.L. (Eds.), Big Data Benchmarks, Performance Optimization, and Emerging Hardware. Springer International Publishing, p.71–82. https://​doi.​org/​10.​1007/​978-3-319-13021-7_​6
Metadaten
Titel
Big data storage technologies: a survey
verfasst von
Aisha Siddiqa
Ahmad Karim
Abdullah Gani
Publikationsdatum
01.08.2017
Verlag
Zhejiang University Press
Erschienen in
Frontiers of Information Technology & Electronic Engineering / Ausgabe 8/2017
Print ISSN: 2095-9184
Elektronische ISSN: 2095-9230
DOI
https://doi.org/10.1631/FITEE.1500441

Weitere Artikel der Ausgabe 8/2017

Frontiers of Information Technology & Electronic Engineering 8/2017 Zur Ausgabe