Skip to main content

2018 | OriginalPaper | Buchkapitel

SINGLE vs. MapReduce vs. Relational: Predicting Query Execution Time

verfasst von : Maryam Abbasi, Pedro Martins, José Cecílio, João Costa, Pedro Furtado

Erschienen in: Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Over the past decade’s several new concepts emerged to organize and query data over large Data Warehouse (DW) system with the same primary objective, that is, optimize processing speed. More recently, with the rise of BigData concept, storage cost lowered significantly, and performance (random accesses) increased, particularly with modern SSD disks. This paper introduces and tested a storage alternative which goes against current data normalization premises, where storage space is no longer a concern. By de-normalizing the entire data schema (transparent to the user) it is proposed a new concept system where query execution time must be entirely predictable, independently of its complexity, called, SINGLE. The proposed data model also allows easy partitioning and distributed processing to enable execution parallelism, boosting performance, as happens in MapReduce. TPC-H benchmark is used to evaluate storage space and query performance. Results show predictable performance when comparing with approaches based on a normalized relational schema, and MapReduce oriented.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chaudhuri, S., Das, G., Narasayya, V.: Optimized stratified sampling for approximate query processing. ACM Trans. Database Syst. (TODS) 32(2), 9 (2007)CrossRef Chaudhuri, S., Das, G., Narasayya, V.: Optimized stratified sampling for approximate query processing. ACM Trans. Database Syst. (TODS) 32(2), 9 (2007)CrossRef
2.
Zurück zum Zitat Cheng, D., Zhou, X., Lama, P., Wu, J., Jiang, C.: Cross-platform resource scheduling for Spark and MapReduce on YARN. IEEE Trans. Comput. 66, 1341–1353 (2017)MathSciNetCrossRef Cheng, D., Zhou, X., Lama, P., Wu, J., Jiang, C.: Cross-platform resource scheduling for Spark and MapReduce on YARN. IEEE Trans. Comput. 66, 1341–1353 (2017)MathSciNetCrossRef
4.
Zurück zum Zitat DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., Stonebraker, M.R., Wood, D.A.: Implementation techniques for main memory database systems, vol. 14. ACM (1984) DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., Stonebraker, M.R., Wood, D.A.: Implementation techniques for main memory database systems, vol. 14. ACM (1984)
5.
Zurück zum Zitat Harris, E.P., Ramamohanarao, K.: Join algorithm costs revisited. VLDB J.—Int. J. Very Large Data Bases 5(1), 064–084 (1996)CrossRef Harris, E.P., Ramamohanarao, K.: Join algorithm costs revisited. VLDB J.—Int. J. Very Large Data Bases 5(1), 064–084 (1996)CrossRef
6.
Zurück zum Zitat Kimball, R.: The Data Warehouse Lifecycle Toolkit. Wiley, Hoboken (2008) Kimball, R.: The Data Warehouse Lifecycle Toolkit. Wiley, Hoboken (2008)
7.
Zurück zum Zitat Lamb, A., et al.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)CrossRef Lamb, A., et al.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)CrossRef
8.
Zurück zum Zitat Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. Data Knowl. Eng. 69(1), 3–28 (2010)CrossRef Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. Data Knowl. Eng. 69(1), 3–28 (2010)CrossRef
9.
Zurück zum Zitat Mutharaju, R., Maier, F., Hitzler, P.: A MapReduce algorithm for SC. In: 23rd International Workshop on Description Logics DL2010, p. 456 (2010) Mutharaju, R., Maier, F., Hitzler, P.: A MapReduce algorithm for SC. In: 23rd International Workshop on Description Logics DL2010, p. 456 (2010)
11.
Zurück zum Zitat Patel, J.M., Carey, M.J., Vernon, M.K.: Accurate modeling of the hybrid hash join algorithm. In: ACM SIGMETRICS Performance Evaluation Review, vol. 22, pp. 56–66. ACM (1994) Patel, J.M., Carey, M.J., Vernon, M.K.: Accurate modeling of the hybrid hash join algorithm. In: ACM SIGMETRICS Performance Evaluation Review, vol. 22, pp. 56–66. ACM (1994)
12.
Zurück zum Zitat Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 165–178. ACM (2009) Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 165–178. ACM (2009)
13.
Zurück zum Zitat Pinto, Y.: A framework for systematic database denormalization. Glob. J. Comput. Sci. Technol. 9(4), 44–52 (2009) Pinto, Y.: A framework for systematic database denormalization. Glob. J. Comput. Sci. Technol. 9(4), 44–52 (2009)
15.
Zurück zum Zitat Sanders, G.L., Shin, S.: Denormalization effects on performance of RDBMS. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences 2001, p. 9. IEEE (2001) Sanders, G.L., Shin, S.: Denormalization effects on performance of RDBMS. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences 2001, p. 9. IEEE (2001)
16.
Zurück zum Zitat Zaker, M., Phon-Amnuaisuk, S., Haw, S.C.: Optimizing the data warehouse design by hierarchical denormalizing. In: Proceedings of the 8th Conference on Applied Computer Scince, pp. 131–138. World Scientific and Engineering Academy and Society (WSEAS) (2008) Zaker, M., Phon-Amnuaisuk, S., Haw, S.C.: Optimizing the data warehouse design by hierarchical denormalizing. In: Proceedings of the 8th Conference on Applied Computer Scince, pp. 131–138. World Scientific and Engineering Academy and Society (WSEAS) (2008)
Metadaten
Titel
SINGLE vs. MapReduce vs. Relational: Predicting Query Execution Time
verfasst von
Maryam Abbasi
Pedro Martins
José Cecílio
João Costa
Pedro Furtado
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-99987-6_5