Skip to main content
Erschienen in: The Journal of Supercomputing 7/2021

02.01.2021

A scalable array storage for efficient maintenance of future data

verfasst von: Mehnuma Tabassum Omar, K. M. Azharul Hasan, Tatsuo Tsuji

Erschienen in: The Journal of Supercomputing | Ausgabe 7/2021

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Array-based storage system employs a renewed interest in the featured applications for their easy maintenance in the context of large volume data. However, the conventional schemes of array storages suffer from lack of scalability for dynamic data as they need to reallocate the whole array if the size of the array limit overflows. Therefore, the conventional array storage is difficult to use when the data grows overtime. To maintain such velocity of the future data, the array storage must be dynamic which can expand the size according to the growing nature of the data. Moreover, the address space of the array-based storage system overflows quickly if the length of dimension and the number of dimension is large. The index array models render dynamic storage system, but retrieval from index array model shows poor performance than the conventional schemes. In this paper, we demonstrate an index array-based scalable array storage that maintains the growing future data during runtime. The key idea is to convert an n-dimensional array into 2 dimensions and organize the array elements into ordered collections called segments. These segments divide the large allocation size into smaller one that delays the address space overflow. The retrieval performance of the proposed scheme outperforms other existing array systems. Since it converts an n-dimensional array into 2 dimensions, and it needs 2 indices only to maintain scalability. Therefore, it reduces the index overhead as well. The scheme also shows improved storage management performance than other approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th $\{$USENIX$\}$ Symposium on Operating Systems Design and Implementation ($\{$OSDI$\}$ 16), pp 265–283 Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th $\{$USENIX$\}$ Symposium on Operating Systems Design and Implementation ($\{$OSDI$\}$ 16), pp 265–283
2.
Zurück zum Zitat Ahsan SMM, Hasan KA (2011) An implementation scheme for multidimensional extendable array operations and its evaluation. In: International Conference on Informatics Engineering and Information Science. Springer, pp 136–150 Ahsan SMM, Hasan KA (2011) An implementation scheme for multidimensional extendable array operations and its evaluation. In: International Conference on Informatics Engineering and Information Science. Springer, pp 136–150
3.
Zurück zum Zitat Baumann P, Dehmel A, Furtado P, Ritsch R, Widmann N (1998) The multidimensional database system RasDaMan. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp 575–577 Baumann P, Dehmel A, Furtado P, Ritsch R, Widmann N (1998) The multidimensional database system RasDaMan. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp 575–577
4.
Zurück zum Zitat Baumann P, Dumitru AM, Merticariu V (2013) The array database that is not a database: file based array query answering in RasDaMan. In: International Symposium on Spatial and Temporal Databases. Springer, pp 478–483 Baumann P, Dumitru AM, Merticariu V (2013) The array database that is not a database: file based array query answering in RasDaMan. In: International Symposium on Spatial and Temporal Databases. Springer, pp 478–483
5.
Zurück zum Zitat Blanas S, Wu K, Byna S, Dong B, Shoshani A (2014) Parallel data analysis directly on scientific file formats. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp 385–396 Blanas S, Wu K, Byna S, Dong B, Shoshani A (2014) Parallel data analysis directly on scientific file formats. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp 385–396
6.
Zurück zum Zitat Brown PG (2010) Overview of SCiDB: large scale array storage, processing and analysis. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp 963–968 Brown PG (2010) Overview of SCiDB: large scale array storage, processing and analysis. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp 963–968
7.
Zurück zum Zitat Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM TOCS 26(2):1–26CrossRef Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM TOCS 26(2):1–26CrossRef
8.
Zurück zum Zitat Cheng Y, Qin C, Rusu F (2012) Glade: big data analytics made easy. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp 697–700 Cheng Y, Qin C, Rusu F (2012) Glade: big data analytics made easy. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp 697–700
9.
Zurück zum Zitat Dumitru A, Merticariu V, Baumann P (2014) Exploring cloud opportunities from an array database perspective. In: Proceedings of Workshop on Data Analytics in the Cloud, pp 1–4 Dumitru A, Merticariu V, Baumann P (2014) Exploring cloud opportunities from an array database perspective. In: Proceedings of Workshop on Data Analytics in the Cloud, pp 1–4
10.
Zurück zum Zitat Dumitru AM, Merticariu V, Baumann P (2016) Array database scalability: intercontinental queries on petabyte datasets. In: Proceedings of the 28th International Conference on Scientific and Statistical Database Management, pp 1–5 Dumitru AM, Merticariu V, Baumann P (2016) Array database scalability: intercontinental queries on petabyte datasets. In: Proceedings of the 28th International Conference on Scientific and Statistical Database Management, pp 1–5
11.
Zurück zum Zitat Folk M, Heber G, Koziol Q, Pourmal E, Robinson D (2011) An overview of the hdf5 technology suite and its applications. In: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, pp 36–47 Folk M, Heber G, Koziol Q, Pourmal E, Robinson D (2011) An overview of the hdf5 technology suite and its applications. In: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, pp 36–47
12.
Zurück zum Zitat Franzenburg A (2003) Distributed storage array. US patent app. 10/071,406 Franzenburg A (2003) Distributed storage array. US patent app. 10/071,406
13.
Zurück zum Zitat Furtado P, Baumann P (1999) Storage of multidimensional arrays based on arbitrary tiling. In: Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337). IEEE, pp 480–489 Furtado P, Baumann P (1999) Storage of multidimensional arrays based on arbitrary tiling. In: Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337). IEEE, pp 480–489
14.
Zurück zum Zitat Grolinger K, Higashino WA, Tiwari A, Capretz MA (2013) Data management in cloud environments: Nosql and newsql data stores. J Cloud Comput Adv Syst Appl 2(1):22CrossRef Grolinger K, Higashino WA, Tiwari A, Capretz MA (2013) Data management in cloud environments: Nosql and newsql data stores. J Cloud Comput Adv Syst Appl 2(1):22CrossRef
15.
Zurück zum Zitat Hasan KA, Shaikh MAH (2017) Efficient representation of higher-dimensional arrays by dimension transformations. J Supercomput 73(6):2801–2822CrossRef Hasan KA, Shaikh MAH (2017) Efficient representation of higher-dimensional arrays by dimension transformations. J Supercomput 73(6):2801–2822CrossRef
16.
Zurück zum Zitat Hasan KA, Tsuji T, Higuchi K (2007) An efficient implementation for MOLAP basic data structure and its evaluation. In: International Conference on Database Systems for Advanced Applications. Springer, pp 288–299 Hasan KA, Tsuji T, Higuchi K (2007) An efficient implementation for MOLAP basic data structure and its evaluation. In: International Conference on Database Systems for Advanced Applications. Springer, pp 288–299
17.
Zurück zum Zitat He J, Wu Y, Dong Y, Zhang Y, Zhou W (2016) Dynamic multidimensional index for large-scale cloud data. J Cloud Comput 5(1):10CrossRef He J, Wu Y, Dong Y, Zhang Y, Zhou W (2016) Dynamic multidimensional index for large-scale cloud data. J Cloud Comput 5(1):10CrossRef
19.
Zurück zum Zitat Idreos S, Groffen F, Nes N, Manegold S, Mullender S, Kersten M (2012) Monetdb: two decades of research in column-oriented database. IEEE Data Eng Bull 35:40–45 Idreos S, Groffen F, Nes N, Manegold S, Mullender S, Kersten M (2012) Monetdb: two decades of research in column-oriented database. IEEE Data Eng Bull 35:40–45
20.
Zurück zum Zitat McKinley KS, Carr S, Tseng CW (1996) Improving data locality with loop transformations. ACM TOPLAS 18(4):424–453CrossRef McKinley KS, Carr S, Tseng CW (1996) Improving data locality with loop transformations. ACM TOPLAS 18(4):424–453CrossRef
21.
Zurück zum Zitat Nimako G, Otoo EJ, Ohene-Kwofie D (2013) Pexta: a parallel chunked extendible dense array i/o for global array (ga). In: 2013 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 1–8 Nimako G, Otoo EJ, Ohene-Kwofie D (2013) Pexta: a parallel chunked extendible dense array i/o for global array (ga). In: 2013 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 1–8
22.
Zurück zum Zitat Omar MT, Hasan KA (2016) A scalable storage system for structured data based on higher order index array. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp 247–252 Omar MT, Hasan KA (2016) A scalable storage system for structured data based on higher order index array. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp 247–252
23.
Zurück zum Zitat Omar MT, Hasan KA (2016) Towards an efficient maintenance of address space overflow for array based storage system. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). IEEE, pp 133–138 Omar MT, Hasan KA (2016) Towards an efficient maintenance of address space overflow for array based storage system. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). IEEE, pp 133–138
24.
Zurück zum Zitat Otoo EJ, Merrett T (1983) A storage scheme for extendible arrays. Computing 31(1):1–9CrossRef Otoo EJ, Merrett T (1983) A storage scheme for extendible arrays. Computing 31(1):1–9CrossRef
25.
Zurück zum Zitat Otoo EJ, Nimako G, Ohene-Kwofie D (2013) Chunked extendible dense arrays for scientific data storage. Parallel Comput 39(12):802–818CrossRef Otoo EJ, Nimako G, Ohene-Kwofie D (2013) Chunked extendible dense arrays for scientific data storage. Parallel Comput 39(12):802–818CrossRef
26.
Zurück zum Zitat Otoo EJ, Rotem D, Seshadri S (2007) Optimal chunking of large multidimensional arrays for data warehousing. In: Proceedings of the ACM Tenth International Workshop on Data Warehousing and OLAP, pp 25–32 Otoo EJ, Rotem D, Seshadri S (2007) Optimal chunking of large multidimensional arrays for data warehousing. In: Proceedings of the ACM Tenth International Workshop on Data Warehousing and OLAP, pp 25–32
27.
Zurück zum Zitat Papadopoulos S, Datta K, Madden S, Mattson T (2016) The TileDB array data storage manager. Proc VLDB Endow 10(4):349–360CrossRef Papadopoulos S, Datta K, Madden S, Mattson T (2016) The TileDB array data storage manager. Proc VLDB Endow 10(4):349–360CrossRef
28.
Zurück zum Zitat Reed DA, Dongarra J (2015) Exascale computing and big data. Commun ACM 58(7):56–68CrossRef Reed DA, Dongarra J (2015) Exascale computing and big data. Commun ACM 58(7):56–68CrossRef
29.
Zurück zum Zitat Rew R, Davis G (1990) Netcdf: an interface for scientific data access. IEEE Comput Graph Appl 10(4):76–82CrossRef Rew R, Davis G (1990) Netcdf: an interface for scientific data access. IEEE Comput Graph Appl 10(4):76–82CrossRef
30.
Zurück zum Zitat Rotem D, Zhao JL (1996) Extendible arrays for statistical databases and OLAP applications. In: Proceedings of 8th International Conference on Scientific and Statistical Data Base Management. IEEE, pp 108–117 Rotem D, Zhao JL (1996) Extendible arrays for statistical databases and OLAP applications. In: Proceedings of 8th International Conference on Scientific and Statistical Data Base Management. IEEE, pp 108–117
31.
Zurück zum Zitat Rusu F, Cheng Y (2013) A survey on array storage, query languages, and systems. arXiv preprint arXiv:1302.0103 Rusu F, Cheng Y (2013) A survey on array storage, query languages, and systems. arXiv preprint arXiv:1302.0103
32.
Zurück zum Zitat Sarawagi S, Stonebraker M (1994) Efficient organization of large multidimensional arrays. In: Proceedings of 1994 IEEE 10th International Conference on Data Engineering. IEEE, pp 328–336 Sarawagi S, Stonebraker M (1994) Efficient organization of large multidimensional arrays. In: Proceedings of 1994 IEEE 10th International Conference on Data Engineering. IEEE, pp 328–336
33.
Zurück zum Zitat Shacham H, Page M, Pfaff B, Goh EJ, Modadugu N, Boneh D (2004) On the effectiveness of address-space randomization. In: Proceedings of the 11th ACM Conference on Computer and Communications Security, pp 298–307 Shacham H, Page M, Pfaff B, Goh EJ, Modadugu N, Boneh D (2004) On the effectiveness of address-space randomization. In: Proceedings of the 11th ACM Conference on Computer and Communications Security, pp 298–307
34.
Zurück zum Zitat Shaikh MAH, Hasan KA (2015) Efficient storage scheme for n-dimensional sparse array: Gcrs/gccs. In: 2015 International Conference on High Performance Computing & Simulation (HPCS). IEEE, pp 137–142 Shaikh MAH, Hasan KA (2015) Efficient storage scheme for n-dimensional sparse array: Gcrs/gccs. In: 2015 International Conference on High Performance Computing & Simulation (HPCS). IEEE, pp 137–142
35.
Zurück zum Zitat Shimada T, Tsuji T, Higuchi K (2008) A storage scheme for multidimensional data alleviating dimension dependency. In: 2008 Third International Conference on Digital Information Management. IEEE, pp 662–668 Shimada T, Tsuji T, Higuchi K (2008) A storage scheme for multidimensional data alleviating dimension dependency. In: 2008 Third International Conference on Digital Information Management. IEEE, pp 662–668
36.
Zurück zum Zitat Soroush E, Balazinska M, Wang D (2011) Arraystore: a storage manager for complex parallel array processing. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp 253–264 Soroush E, Balazinska M, Wang D (2011) Arraystore: a storage manager for complex parallel array processing. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp 253–264
37.
Zurück zum Zitat Stonebraker M, Brown P, Poliakov A, Raman S (2011) The architecture of scidb. In: International Conference on Scientific and Statistical Database Management. Springer, pp 1–16 Stonebraker M, Brown P, Poliakov A, Raman S (2011) The architecture of scidb. In: International Conference on Scientific and Statistical Database Management. Springer, pp 1–16
38.
Zurück zum Zitat Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 374–383 Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 374–383
39.
Zurück zum Zitat Wang Y, Su Y, Agrawal G (2015) A novel approach for approximate aggregations over arrays. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, pp 1–12 Wang Y, Su Y, Agrawal G (2015) A novel approach for approximate aggregations over arrays. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, pp 1–12
40.
Zurück zum Zitat Xing H, Agrawal G (2018) Compass: compact array storage with value index. In: Proceedings of the 30th International Conference on Scientific and Statistical Database Management, pp 1–12 Xing H, Agrawal G (2018) Compass: compact array storage with value index. In: Proceedings of the 30th International Conference on Scientific and Statistical Database Management, pp 1–12
41.
Zurück zum Zitat Xing H, Floratos S, Blanas S, Byna S, Prabhat M, Wu K, Brown P (2018) Arraybridge: interweaving declarative array processing in scidb with imperative hdf5-based programs. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, pp 977–988 Xing H, Floratos S, Blanas S, Byna S, Prabhat M, Wu K, Brown P (2018) Arraybridge: interweaving declarative array processing in scidb with imperative hdf5-based programs. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, pp 977–988
42.
Zurück zum Zitat Zhang Y, Kersten M, Manegold S (2013) SciQL: array data processing inside an RDBMS. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 1049–1052 Zhang Y, Kersten M, Manegold S (2013) SciQL: array data processing inside an RDBMS. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 1049–1052
Metadaten
Titel
A scalable array storage for efficient maintenance of future data
verfasst von
Mehnuma Tabassum Omar
K. M. Azharul Hasan
Tatsuo Tsuji
Publikationsdatum
02.01.2021
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 7/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03554-x

Weitere Artikel der Ausgabe 7/2021

The Journal of Supercomputing 7/2021 Zur Ausgabe