Skip to main content
Erschienen in: Distributed and Parallel Databases 3/2015

01.09.2015

Formal representation of the SS-DB benchmark and experimental evaluation in EXTASCID

verfasst von: Yu Cheng, Florin Rusu

Erschienen in: Distributed and Parallel Databases | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Evaluating the performance of scientific data processing systems is a difficult task considering the plethora of application-specific solutions available in this landscape and the lack of a generally-accepted benchmark. The dual structure of scientific data coupled with the complex nature of processing complicate the evaluation procedure further. SS-DB is the first attempt to define a general benchmark for complex scientific processing over raw and derived data. It fails to draw sufficient attention though because of the ambiguous plain language specification and the extraordinary SciDB results. In this paper, we remedy the shortcomings of the original SS-DB specification by providing a formal representation in terms of ArrayQL algebra operators and ArrayQL/SciQL constructs. These are the first formal representations of the SS-DB benchmark. Starting from the formal representation, we give a reference implementation and present benchmark results in EXTASCID, a novel system for scientific data processing. EXTASCID is complete in providing native support both for array and relational data and extensible in executing any user code inside the system by the means of a configurable metaoperator. These features result in significant improvement over SciDB at data loading, extracting derived data, and operations over derived data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Szalay, A.S., et al.: Designing and mining multi-terabyte astronomy archives: the sloan digital sky survey. SIGMOD Rec. 29(2), 451–462 (2000)MathSciNetCrossRef Szalay, A.S., et al.: Designing and mining multi-terabyte astronomy archives: the sloan digital sky survey. SIGMOD Rec. 29(2), 451–462 (2000)MathSciNetCrossRef
6.
Zurück zum Zitat Cheng, Y., Rusu, F.: Astronomical data processing in EXTASCID. In: Proceedings of 2013 SSDBM International Conference on Scientific and Statistical Database Management, pp. 387–390 (2013) Cheng, Y., Rusu, F.: Astronomical data processing in EXTASCID. In: Proceedings of 2013 SSDBM International Conference on Scientific and Statistical Database Management, pp. 387–390 (2013)
7.
Zurück zum Zitat Cheng, Y., Rusu, F.: Formal representation of the SSDB benchmark and experimental evaluation in EXTASCID. CoRR, abs/1305.1609, 2013 Cheng, Y., Rusu, F.: Formal representation of the SSDB benchmark and experimental evaluation in EXTASCID. CoRR, abs/1305.1609, 2013
10.
Zurück zum Zitat Zhang, Y., Kersten, M., Ivanova, M., Nes, N.: SciQL: bridging the gap between science and relational DBMS. In: Proceedings of 2011 IDEAS Symposium on International Database Engineering and Applications, pp. 124–133 (2011) Zhang, Y., Kersten, M., Ivanova, M., Nes, N.: SciQL: bridging the gap between science and relational DBMS. In: Proceedings of 2011 IDEAS Symposium on International Database Engineering and Applications, pp. 124–133 (2011)
11.
Zurück zum Zitat Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The architecture of SciDB. In: Proceedings of 2011 SSDBM International Conference on Scientific and Statistical Database Management, pp. 1–16 (2011) Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The architecture of SciDB. In: Proceedings of 2011 SSDBM International Conference on Scientific and Statistical Database Management, pp. 1–16 (2011)
12.
Zurück zum Zitat Marathe, A.P., Salem, K.: Query processing techniques for arrays. VLDB J. (VLDBJ) 11(1), 68–91 (2002)CrossRef Marathe, A.P., Salem, K.: Query processing techniques for arrays. VLDB J. (VLDBJ) 11(1), 68–91 (2002)CrossRef
13.
Zurück zum Zitat Baumann, P.: A database array algebra for spatio-temporal data and beyond. In: Proceedings of 1999 NGITS International Workshop on Next Generation Information Technologies and Systems, pp. 76–93 (1999) Baumann, P.: A database array algebra for spatio-temporal data and beyond. In: Proceedings of 1999 NGITS International Workshop on Next Generation Information Technologies and Systems, pp. 76–93 (1999)
14.
Zurück zum Zitat Cheng, Y., Qin, C., Rusu, F.: GLADE: big data analytics made easy. In: Proceedings of 2012 ACM SIGMOD International Conference on Management of Data, pp. 697–700 (2012) Cheng, Y., Qin, C., Rusu, F.: GLADE: big data analytics made easy. In: Proceedings of 2012 ACM SIGMOD International Conference on Management of Data, pp. 697–700 (2012)
15.
Zurück zum Zitat Arumugam, S., Dobra, A., Jermaine, C., Pansare, N., Perez, L.: The DataPath system: a data-centric analytic processing engine for large data warehouses. In: Proceedings of 2010 ACM SIGMOD International Conference on Management of Data, pp. 519–530 (2010) Arumugam, S., Dobra, A., Jermaine, C., Pansare, N., Perez, L.: The DataPath system: a data-centric analytic processing engine for large data warehouses. In: Proceedings of 2010 ACM SIGMOD International Conference on Management of Data, pp. 519–530 (2010)
17.
Zurück zum Zitat DeWitt, D.J., Gray, J.: Parallel database systems: the future of database processing or a passing Fad? SIGMOD Rec. 19, 104 (1991)CrossRef DeWitt, D.J., Gray, J.: Parallel database systems: the future of database processing or a passing Fad? SIGMOD Rec. 19, 104 (1991)CrossRef
18.
Zurück zum Zitat Sarawagi, S., Stonebraker, M.: Efficient organization of large multidimensional arrays. In: Proceedings of 1994 IEEE ICDE International Conference on Data, Engineering, pp. 328–336 (1994) Sarawagi, S., Stonebraker, M.: Efficient organization of large multidimensional arrays. In: Proceedings of 1994 IEEE ICDE International Conference on Data, Engineering, pp. 328–336 (1994)
19.
Zurück zum Zitat Furtado, P., Baumann, P.: Storage of multidimensional arrays based on arbitrary tiling. In: Proceedings of 1999 IEEE ICDE International Conference on Data, Engineering, pp. 480–489 (1999) Furtado, P., Baumann, P.: Storage of multidimensional arrays based on arbitrary tiling. In: Proceedings of 1999 IEEE ICDE International Conference on Data, Engineering, pp. 480–489 (1999)
20.
Zurück zum Zitat Soroush, E., Balazinska, M., Wang, D.L.: ArrayStore: A storage manager for complex parallel array processing. In: Proceedings of 2011 ACM SIGMOD International Conference on Management of Data, pp. 253–264 (2011) Soroush, E., Balazinska, M., Wang, D.L.: ArrayStore: A storage manager for complex parallel array processing. In: Proceedings of 2011 ACM SIGMOD International Conference on Management of Data, pp. 253–264 (2011)
21.
Zurück zum Zitat Soroush, E., Balazinska, M.: Hybrid merge/overlap execution technique for parallel array processing. In: Proceedings of 2011 AD EDBT/ICDT Array Databases, Workshop, pp. 20–30 (2011) Soroush, E., Balazinska, M.: Hybrid merge/overlap execution technique for parallel array processing. In: Proceedings of 2011 AD EDBT/ICDT Array Databases, Workshop, pp. 20–30 (2011)
22.
Zurück zum Zitat Rusu, F., Dobra, A.: GLADE: a scalable framework for efficient analytics. OS Rev. 46(1), 12–18 (2012) Rusu, F., Dobra, A.: GLADE: a scalable framework for efficient analytics. OS Rev. 46(1), 12–18 (2012)
23.
Zurück zum Zitat Rusu, F., Cheng, Y.: A survey on array storage, query languages, and systems. CoRR, abs/1302.0103, 2013 Rusu, F., Cheng, Y.: A survey on array storage, query languages, and systems. CoRR, abs/1302.0103, 2013
24.
Zurück zum Zitat Difallah, D.E., Cudre-Mauroux, P.: Private communication Difallah, D.E., Cudre-Mauroux, P.: Private communication
26.
Zurück zum Zitat Ivanova, M., Nes, N., Goncalves, R., Kersten, M.: MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database. In: Proceedings of 2007 SSDBM International Conference on Scientific and Statistical Database Management, pp. 38–46 (2007) Ivanova, M., Nes, N., Goncalves, R., Kersten, M.: MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database. In: Proceedings of 2007 SSDBM International Conference on Scientific and Statistical Database Management, pp. 38–46 (2007)
27.
Zurück zum Zitat Libkin, L., Machlin, R., Wong, L.: A query language for multidimensional arrays: design, implementation, and optimization techniques. In: Proceedings of 1996 ACM SIGMOD International Conference on Management of Data, pp. 228–239 (1996) Libkin, L., Machlin, R., Wong, L.: A query language for multidimensional arrays: design, implementation, and optimization techniques. In: Proceedings of 1996 ACM SIGMOD International Conference on Management of Data, pp. 228–239 (1996)
28.
Zurück zum Zitat Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system RasDaMan. In: Proceedings of 1998 ACM SIGMOD International Conference on Management of Data, pp. 575–577 (1998) Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system RasDaMan. In: Proceedings of 1998 ACM SIGMOD International Conference on Management of Data, pp. 575–577 (1998)
29.
Zurück zum Zitat van Ballegooij, A.R. RAM: A multidimensional array DBMS. In: Proceedings of 2004 EDBT Extended Database Technology Workshops, pp. 154–165 (2004) van Ballegooij, A.R. RAM: A multidimensional array DBMS. In: Proceedings of 2004 EDBT Extended Database Technology Workshops, pp. 154–165 (2004)
30.
Zurück zum Zitat Cornacchia, R., Héman, S., Zukowski, M., de Vries, A.P., Boncz, P.: Flexible and efficient IR using array databases. VLDB J. (VLDBJ) 17, 151–168 (2008)CrossRef Cornacchia, R., Héman, S., Zukowski, M., de Vries, A.P., Boncz, P.: Flexible and efficient IR using array databases. VLDB J. (VLDBJ) 17, 151–168 (2008)CrossRef
31.
Zurück zum Zitat Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012) Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)
32.
Zurück zum Zitat Baumann, P., Holsten, S.: A comparative analysis of array models for databases. In: Proceedings of 2011 FGIT-DTA/BSBT, pp. 80–89 (2011) Baumann, P., Holsten, S.: A comparative analysis of array models for databases. In: Proceedings of 2011 FGIT-DTA/BSBT, pp. 80–89 (2011)
33.
Zurück zum Zitat Chang, C., Moon, B., Acharya, A., Shock, C., Sussman, A., Saltz, J.H.: Titan: a high-performance pemote sensing database. In: Proceedings of 1997 IEEE ICDE International Conference on Data, Engineering, pp. 375–384 (1997) Chang, C., Moon, B., Acharya, A., Shock, C., Sussman, A., Saltz, J.H.: Titan: a high-performance pemote sensing database. In: Proceedings of 1997 IEEE ICDE International Conference on Data, Engineering, pp. 375–384 (1997)
34.
Zurück zum Zitat Chang, C., Acharya, A., Sussman, A., Saltz, J.H.: T2: a customizable parallel database for multi-dimensional data. SIGMOD Rec. 27(1), 58–66 (1998)CrossRef Chang, C., Acharya, A., Sussman, A., Saltz, J.H.: T2: a customizable parallel database for multi-dimensional data. SIGMOD Rec. 27(1), 58–66 (1998)CrossRef
35.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
36.
Zurück zum Zitat Buck, J.B., Watkins, N., LeFevre, J., Ioannidou, K., Maltzahn, C., Polyzotis, N., Brandt, S.: SciHadoop: Array-based query processing in Hadoop. In: Proceedings of 2011 SC International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 66:1–66:11 (2011) Buck, J.B., Watkins, N., LeFevre, J., Ioannidou, K., Maltzahn, C., Polyzotis, N., Brandt, S.: SciHadoop: Array-based query processing in Hadoop. In: Proceedings of 2011 SC International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 66:1–66:11 (2011)
37.
Zurück zum Zitat Zhang, Y., Herodotos, H., Yang, J.: RIOT: I/O-efficient numerical computing without SQL. In: Proceedings of 2009 CIDR Conference on Innovative Database Research (2009) Zhang, Y., Herodotos, H., Yang, J.: RIOT: I/O-efficient numerical computing without SQL. In: Proceedings of 2009 CIDR Conference on Innovative Database Research (2009)
38.
Zurück zum Zitat Cudre-Mauroux, P., Wu, E., Madden, S.: TrajStore: An adaptive storage system for very large trajectory data sets. In: Proceedings of 2010 IEEE ICDE International Conference on Data, Engineering, pp. 109–120 (2010) Cudre-Mauroux, P., Wu, E., Madden, S.: TrajStore: An adaptive storage system for very large trajectory data sets. In: Proceedings of 2010 IEEE ICDE International Conference on Data, Engineering, pp. 109–120 (2010)
Metadaten
Titel
Formal representation of the SS-DB benchmark and experimental evaluation in EXTASCID
verfasst von
Yu Cheng
Florin Rusu
Publikationsdatum
01.09.2015
Verlag
Springer US
Erschienen in
Distributed and Parallel Databases / Ausgabe 3/2015
Print ISSN: 0926-8782
Elektronische ISSN: 1573-7578
DOI
https://doi.org/10.1007/s10619-014-7149-7

Weitere Artikel der Ausgabe 3/2015

Distributed and Parallel Databases 3/2015 Zur Ausgabe

Premium Partner