Skip to main content

2018 | OriginalPaper | Buchkapitel

Evaluating Several Design Patterns and Trends in Big Data Warehousing Systems

verfasst von : Carlos Costa, Maribel Yasmina Santos

Erschienen in: Advanced Information Systems Engineering

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Big Data characteristics, namely volume, variety and velocity, currently highlight the severe limitations of traditional Data Warehouses (DWs). Their strict relational model, costly scalability, and, sometimes, inefficient performance open the way for emerging techniques and technologies. Recently, the concept of Big Data Warehousing is gaining attraction, aiming to study and propose new ways of dealing with the Big Data challenges in Data Warehousing contexts. The Big Data Warehouse (BDW) can be seen as a flexible, scalable and highly performant system that uses Big Data techniques and technologies to support mixed and complex analytical workloads (e.g., streaming analysis, ad hoc querying, data visualization, data mining, simulations) in several emerging contexts like Smart Cities and Industries 4.0. However, due to the almost embryonic state of this topic, the ambiguity of the constructs and the lack of common approaches still prevails. In this paper, we discuss and evaluate some design patterns and trends in Big Data Warehousing systems, including data modelling techniques (e.g., star schemas, flat tables, nested structures) and some streaming considerations for BDWs (e.g., Hive vs. NoSQL databases), aiming to foster and align future research, and to help practitioners in this area.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chandarana, P., Vijayalakshmi, M.: Big Data analytics frameworks. In: 2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA), pp. 430–434 (2014) Chandarana, P., Vijayalakshmi, M.: Big Data analytics frameworks. In: 2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA), pp. 430–434 (2014)
5.
Zurück zum Zitat Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute, San Francisco (2011) Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute, San Francisco (2011)
6.
Zurück zum Zitat NBD-PWG: NIST Big Data Interoperability Framework, vol. 6, Reference Architecture. National Institute of Standards and Technology (2015) NBD-PWG: NIST Big Data Interoperability Framework, vol. 6, Reference Architecture. National Institute of Standards and Technology (2015)
7.
Zurück zum Zitat Goss, R.G., Veeramuthu, K.: Heading towards big data building a better data warehouse for more data, more speed, and more users. In: 2013 24th Annual SEMI on Advanced Semiconductor Manufacturing Conference (ASMC), pp. 220–225. IEEE (2013) Goss, R.G., Veeramuthu, K.: Heading towards big data building a better data warehouse for more data, more speed, and more users. In: 2013 24th Annual SEMI on Advanced Semiconductor Manufacturing Conference (ASMC), pp. 220–225. IEEE (2013)
8.
Zurück zum Zitat Krishnan, K.: Data Warehousing in the Age of Big Data. Morgan Kaufmann Publishers Inc., San Francisco (2013) Krishnan, K.: Data Warehousing in the Age of Big Data. Morgan Kaufmann Publishers Inc., San Francisco (2013)
9.
Zurück zum Zitat Mohanty, S., Jagadeesh, M., Srivatsa, H.: Big Data Imperatives: Enterprise Big Data Warehouse, BI Implementations and Analytics. Apress, New York City (2013)CrossRef Mohanty, S., Jagadeesh, M., Srivatsa, H.: Big Data Imperatives: Enterprise Big Data Warehouse, BI Implementations and Analytics. Apress, New York City (2013)CrossRef
11.
Zurück zum Zitat Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. Wiley, New York (2013) Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. Wiley, New York (2013)
12.
Zurück zum Zitat Golab, L., Johnson, T.: Data stream warehousing. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 1290–1293 (2014) Golab, L., Johnson, T.: Data stream warehousing. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 1290–1293 (2014)
13.
Zurück zum Zitat Russom, P.: Evolving Data Warehouse Architectures in the Age of Big Data. The Data Warehouse Institute (2014) Russom, P.: Evolving Data Warehouse Architectures in the Age of Big Data. The Data Warehouse Institute (2014)
14.
Zurück zum Zitat Russom, P.: Data Warehouse Modernization in the Age of Big Data Analytics. The Data Warehouse Institute (2016) Russom, P.: Data Warehouse Modernization in the Age of Big Data Analytics. The Data Warehouse Institute (2016)
15.
Zurück zum Zitat Sun, L., Hu, M., Ren, K., Ren, M.: Present situation and prospect of data warehouse architecture under the Background of Big Data. In: 2013 International Conference on Information Science and Cloud Computing Companion (ISCC-C), pp. 529–535. IEEE (2013) Sun, L., Hu, M., Ren, K., Ren, M.: Present situation and prospect of data warehouse architecture under the Background of Big Data. In: 2013 International Conference on Information Science and Cloud Computing Companion (ISCC-C), pp. 529–535. IEEE (2013)
18.
Zurück zum Zitat O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (SSB) (2009) O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (SSB) (2009)
22.
Zurück zum Zitat Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Implementing multidimensional data warehouses into NoSQL. In: International Conference on Enterprise Information Systems (ICEIS 2015), pp. 172–183 (2015) Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Implementing multidimensional data warehouses into NoSQL. In: International Conference on Enterprise Information Systems (ICEIS 2015), pp. 172–183 (2015)
23.
Zurück zum Zitat Gröger, C., Schwarz, H., Mitschang, B.: The deep data warehouse: link-based integration and enrichment of warehouse data and unstructured content. In: IEEE 18th International Enterprise Distributed Object Computing Conference (EDOC), pp. 210–217 (2014) Gröger, C., Schwarz, H., Mitschang, B.: The deep data warehouse: link-based integration and enrichment of warehouse data and unstructured content. In: IEEE 18th International Enterprise Distributed Object Computing Conference (EDOC), pp. 210–217 (2014)
25.
Zurück zum Zitat Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive-a petabyte scale data warehouse using Hadoop. In: IEEE 26th International Conference on Data Engineering (ICDE), pp. 996–1005. IEEE (2010) Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive-a petabyte scale data warehouse using Hadoop. In: IEEE 26th International Conference on Data Engineering (ICDE), pp. 996–1005. IEEE (2010)
27.
Zurück zum Zitat Huai, Y., Chauhan, A., Gates, A., Hagleitner, G., Hanson, E.N., O’Malley, O., Pandey, J., Yuan, Y., Lee, R., Zhang, X.: Major technical advancements in apache hive. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1235–1246. ACM, New York (2014) Huai, Y., Chauhan, A., Gates, A., Hagleitner, G., Hanson, E.N., O’Malley, O., Pandey, J., Yuan, Y., Lee, R., Zhang, X.: Major technical advancements in apache hive. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1235–1246. ACM, New York (2014)
29.
Zurück zum Zitat Wang, H., Qin, X., Zhou, X., Li, F., Qin, Z., Zhu, Q., Wang, S.: Efficient query processing framework for big data warehouse: an almost join-free approach. Front. Comput. Sci. 9, 224–236 (2015)MathSciNetCrossRef Wang, H., Qin, X., Zhou, X., Li, F., Qin, Z., Zhu, Q., Wang, S.: Efficient query processing framework for big data warehouse: an almost join-free approach. Front. Comput. Sci. 9, 224–236 (2015)MathSciNetCrossRef
30.
Zurück zum Zitat Li, X., Mao, Y.: Real-time data ETL framework for big real-time data analysis. In: 2015 IEEE International Conference on Information and Automation, pp. 1289–1294 (2015) Li, X., Mao, Y.: Real-time data ETL framework for big real-time data analysis. In: 2015 IEEE International Conference on Information and Automation, pp. 1289–1294 (2015)
31.
Zurück zum Zitat Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen Sarma, J., Murthy, R., Liu, H.: Data warehousing and analytics infrastructure at Facebook. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 1013–1020. ACM, New York (2010) Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen Sarma, J., Murthy, R., Liu, H.: Data warehousing and analytics infrastructure at Facebook. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 1013–1020. ACM, New York (2010)
33.
Zurück zum Zitat Clegg, D.: Evolving data warehouse and BI architectures: the big data challenge (2015) Clegg, D.: Evolving data warehouse and BI architectures: the big data challenge (2015)
34.
Zurück zum Zitat Santos, M.Y., Costa, C., Galvão, J., Andrade, C., Martinho, B., Lima, F.V., Costa, E.: Evaluating SQL-on-Hadoop for Big Data warehousing on Not-So-Good hardware. In: Proceedings of International Database Engineering & Applications Symposium (IDEAS 2017), Bristol, UK (2017) Santos, M.Y., Costa, C., Galvão, J., Andrade, C., Martinho, B., Lima, F.V., Costa, E.: Evaluating SQL-on-Hadoop for Big Data warehousing on Not-So-Good hardware. In: Proceedings of International Database Engineering & Applications Symposium (IDEAS 2017), Bristol, UK (2017)
35.
Zurück zum Zitat Costa, E., Costa, C., Santos, M.Y.: Efficient Big Data modelling and organization for Hadoop hive-based data warehouses. In: Presented at the EMCIS 2017, Coimbra, Portugal (2017)CrossRef Costa, E., Costa, C., Santos, M.Y.: Efficient Big Data modelling and organization for Hadoop hive-based data warehouses. In: Presented at the EMCIS 2017, Coimbra, Portugal (2017)CrossRef
36.
Zurück zum Zitat Costa, C., Santos, M.Y.: The SusCity Big Data warehousing approach for smart cities. In: Proceedings of International Database Engineering & Applications Symposium, p. 10 (2017) Costa, C., Santos, M.Y.: The SusCity Big Data warehousing approach for smart cities. In: Proceedings of International Database Engineering & Applications Symposium, p. 10 (2017)
38.
Zurück zum Zitat Mackey, G., Sehrish, S., Wang, J.: Improving metadata management for small files in HDFS. In: 2009 IEEE International Conference on Cluster Computing and Workshops, pp. 1–4 (2009) Mackey, G., Sehrish, S., Wang, J.: Improving metadata management for small files in HDFS. In: 2009 IEEE International Conference on Cluster Computing and Workshops, pp. 1–4 (2009)
39.
Zurück zum Zitat White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Sebastopol (2015) White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Sebastopol (2015)
Metadaten
Titel
Evaluating Several Design Patterns and Trends in Big Data Warehousing Systems
verfasst von
Carlos Costa
Maribel Yasmina Santos
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-91563-0_28

Premium Partner