Skip to main content
Top
Published in: Wireless Personal Communications 3/2020

02-05-2020

Pseudo-Cache-Based IoT Small Files Management Framework in HDFS Cluster

Authors: Isma Farah Siddiqui, Nawab Muhammad Faseeh Qureshi, Bhawani Shankar Chowdhry, Muhammad Aslam Uqaili

Published in: Wireless Personal Communications | Issue 3/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Internet of Things (IoT) devices are generating an enormous number of files that are categorized into two types: (1) large files and (2) small files. Hadoop Distributed File System (HDFS) processes datasets using a default compression technique Hadoop Archives (HAR) for building data chunks of 64, 128 and 256 MBs. This technique works in normal batch processing, however, when a streaming chunk of IoT dataset is considered, it returns issues not addressed before: (1) improper file wrapping, (2) random access latency, (3) slower Namenode and (4) wastage of block volume. This paper proposes a novel technique pseudo-cache-based small files management framework (PSFMF), that bypasses default HAR with its novel logical file association mechanism and avoids huge memory to build HDFS blocks. The evaluation shows that PSFMF reduces the usage of memory consumption, increases MapReduce performance and reduces tasks workload over HDFS cluster.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Siddiqui, I. F., Qureshi, N. M. F., Shaikh, M. A., Chowdhry, B. S., Abbas, A., Bashir, A. K., et al. (2019). Stuck-at fault analytics of IoT devices using knowledge-based data processing strategy in smart grid. Wireless Personal Communications, 106, 1969–1983.CrossRef Siddiqui, I. F., Qureshi, N. M. F., Shaikh, M. A., Chowdhry, B. S., Abbas, A., Bashir, A. K., et al. (2019). Stuck-at fault analytics of IoT devices using knowledge-based data processing strategy in smart grid. Wireless Personal Communications, 106, 1969–1983.CrossRef
2.
go back to reference Faseeh Qureshi, N. M., et al. (2019). Dynamic container-based resource management framework of spark ecosystem. In 21st International conference on advanced communication technology (ICACT) (pp. 522–526). Faseeh Qureshi, N. M., et al. (2019). Dynamic container-based resource management framework of spark ecosystem. In 21st International conference on advanced communication technology (ICACT) (pp. 522–526).
3.
go back to reference Qureshi, N. M. F., & Shin, D. R. (2016). RDP: A storage-tier-aware robust data placement strategy for hadoop in a cloud-based heterogeneous environment. KSII Transactions on Internet & Information Systems, 10(9), 4063–4086. Qureshi, N. M. F., & Shin, D. R. (2016). RDP: A storage-tier-aware robust data placement strategy for hadoop in a cloud-based heterogeneous environment. KSII Transactions on Internet & Information Systems, 10(9), 4063–4086.
4.
go back to reference Qureshi, N. M. F, et al. (2018). A knowledge-based path optimization technique for cognitive nodes in smart grid. In IEEE global communications conference (GLOBECOM). Qureshi, N. M. F, et al. (2018). A knowledge-based path optimization technique for cognitive nodes in smart grid. In IEEE global communications conference (GLOBECOM).
5.
go back to reference Abbas, A., et al. (2018). Multi-objective optimum solutions for IoT-based feature models of software product line. IEEE Access, 6, 12228–12239.CrossRef Abbas, A., et al. (2018). Multi-objective optimum solutions for IoT-based feature models of software product line. IEEE Access, 6, 12228–12239.CrossRef
6.
go back to reference Musaddiq, A., et al. (2018). A survey on resource management in IoT operating systems. IEEE Access, 6, 8459–8482.CrossRef Musaddiq, A., et al. (2018). A survey on resource management in IoT operating systems. IEEE Access, 6, 8459–8482.CrossRef
7.
go back to reference Qureshi, N. M. F., Siddiqui, I. F., Unar, M. A., Uqaili, M. A., Nam, C. S., Shin, D. R., et al. (2019). An aggregate mapreduce data block placement strategy for wireless IoT edge nodes in smart grid. Wireless Personal Communications, 106, 2225–2236.CrossRef Qureshi, N. M. F., Siddiqui, I. F., Unar, M. A., Uqaili, M. A., Nam, C. S., Shin, D. R., et al. (2019). An aggregate mapreduce data block placement strategy for wireless IoT edge nodes in smart grid. Wireless Personal Communications, 106, 2225–2236.CrossRef
8.
go back to reference Qureshi, N. M. F., Shin, D. R., Siddiqui, I. F., & Chowdhry, B. S. (2017). Storage-tag-aware scheduler for hadoop cluster. IEEE Access, 5, 13742–13755.CrossRef Qureshi, N. M. F., Shin, D. R., Siddiqui, I. F., & Chowdhry, B. S. (2017). Storage-tag-aware scheduler for hadoop cluster. IEEE Access, 5, 13742–13755.CrossRef
10.
go back to reference Su, Q., Lu, L., & Feng, Q. (2018). An optimal solution of storing and processing small image files on hadoop. In International conference on brain inspired cognitive systems (pp. 644–653). Su, Q., Lu, L., & Feng, Q. (2018). An optimal solution of storing and processing small image files on hadoop. In International conference on brain inspired cognitive systems (pp. 644–653).
11.
go back to reference Ahad, M. A., & Biswas, R. (2019). Handling small size files in hadoop: Challenges, opportunities, and review. In J. Nayak, A. Abraham, B. Krishna, Sekhar G. Chandra, & A. Das (Eds.), Soft computing in data analytics (pp. 653–663). Singapore: Springer.CrossRef Ahad, M. A., & Biswas, R. (2019). Handling small size files in hadoop: Challenges, opportunities, and review. In J. Nayak, A. Abraham, B. Krishna, Sekhar G. Chandra, & A. Das (Eds.), Soft computing in data analytics (pp. 653–663). Singapore: Springer.CrossRef
12.
go back to reference Dev, D., & Patgiri, R (2015). HAR+: Archive and metadata distribution! Why not both?. In International conference on computer communication and informatics (ICCCI), Coimbatore (pp. 1–6). Dev, D., & Patgiri, R (2015). HAR+: Archive and metadata distribution! Why not both?. In International conference on computer communication and informatics (ICCCI), Coimbatore (pp. 1–6).
13.
go back to reference Zhang, B., Wang, X., & Zheng, Z. (2018). The optimization for recurring queries in big data analysis system with MapReduce. Future Generation Computer Systems, 87, 549–556.CrossRef Zhang, B., Wang, X., & Zheng, Z. (2018). The optimization for recurring queries in big data analysis system with MapReduce. Future Generation Computer Systems, 87, 549–556.CrossRef
14.
go back to reference Gohil, P., Panchal, B., & Dhobi, J. S. (2015). A novel approach to improve the performance of Hadoop in handling of small files. In IEEE international conference on electrical, computer and communication technologies (ICECCT), Coimbatore (pp. 1–5). Gohil, P., Panchal, B., & Dhobi, J. S. (2015). A novel approach to improve the performance of Hadoop in handling of small files. In IEEE international conference on electrical, computer and communication technologies (ICECCT), Coimbatore (pp. 1–5).
15.
go back to reference Khan, S., Liu, X., Ali, S. A. & Alam, M. (2019). Storage solutions for big data systems: A qualitative study and comparison. arXiv preprint arXiv:1904.11498 Khan, S., Liu, X., Ali, S. A. & Alam, M. (2019). Storage solutions for big data systems: A qualitative study and comparison. arXiv preprint arXiv:​1904.​11498
16.
go back to reference Huo, J., Weng, J., & Qu, H. (2019). A parallel clustering algorithm for logs data based on Hadoop platform. In Proceedings of the 3rd international conference on high performance compilation, computing and communications (pp. 90–94), ACM. Huo, J., Weng, J., & Qu, H. (2019). A parallel clustering algorithm for logs data based on Hadoop platform. In Proceedings of the 3rd international conference on high performance compilation, computing and communications (pp. 90–94), ACM.
17.
go back to reference Renner, T., Müller, J., Thamsen, L., & Kao, O. Addressing Hadoop’s small file problem with an appendable archive file format. In Proceedings of the computing frontiers conference (CF’17) (pp. 367–372). New York, NY: ACM. Renner, T., Müller, J., Thamsen, L., & Kao, O. Addressing Hadoop’s small file problem with an appendable archive file format. In Proceedings of the computing frontiers conference (CF’17) (pp. 367–372). New York, NY: ACM.
18.
go back to reference Ajah, I. A., & Nweke, H. F. (2019). Big data and business analytics: Trends, platforms, success factors and applications. Big Data and Cognitive Computing, 3(2), 32.CrossRef Ajah, I. A., & Nweke, H. F. (2019). Big data and business analytics: Trends, platforms, success factors and applications. Big Data and Cognitive Computing, 3(2), 32.CrossRef
19.
go back to reference Zhou, W., Feng, D., Tan, Z., & Zheng, Y. (2018). Improving big data storage performance in hybrid environment. Journal of Computational Science, 26, 409–418.CrossRef Zhou, W., Feng, D., Tan, Z., & Zheng, Y. (2018). Improving big data storage performance in hybrid environment. Journal of Computational Science, 26, 409–418.CrossRef
20.
go back to reference Cai, X., Chen, C., & Liang, Y. (2018). An optimization strategy of massive small files storage based on HDFS. In Joint international advanced engineering and technology research conference. Atlantis Press. Cai, X., Chen, C., & Liang, Y. (2018). An optimization strategy of massive small files storage based on HDFS. In Joint international advanced engineering and technology research conference. Atlantis Press.
21.
go back to reference Karan, A., Rautaray, S. S., & Pandey, M. (2019). A proposed approach for improving Hadoop performance for handling small files. In A. Abraham, P. Dutta, J. Mandal, A. Bhattacharya, & S. Dutta (Eds.), Emerging technologies in data mining and information security (pp. 311–319). Singapore: Springer.CrossRef Karan, A., Rautaray, S. S., & Pandey, M. (2019). A proposed approach for improving Hadoop performance for handling small files. In A. Abraham, P. Dutta, J. Mandal, A. Bhattacharya, & S. Dutta (Eds.), Emerging technologies in data mining and information security (pp. 311–319). Singapore: Springer.CrossRef
22.
go back to reference Su, Q., Lu, L., & QiuYan, F. (2018). An optimal solution of storing and processing small image files on Hadoop. In International conference on brain inspired cognitive systems. Cham: Springer. Su, Q., Lu, L., & QiuYan, F. (2018). An optimal solution of storing and processing small image files on Hadoop. In International conference on brain inspired cognitive systems. Cham: Springer.
23.
go back to reference Niazi, S., et al. (2018). Size matters: Improving the performance of small files in Hadoop. In Proceedings of the 19th international middleware conference. ACM. Niazi, S., et al. (2018). Size matters: Improving the performance of small files in Hadoop. In Proceedings of the 19th international middleware conference. ACM.
24.
go back to reference El Kafrawy, P. M., Sauber, A. M., Hafez, M. M., & Shawish, A. F. (2018). HDFSx: An enhanced model to handle small files in Hadoop with a simulating toolkit. In 1st International conference on computer applications & information security (ICCAIS), Riyadh (pp. 1–8). El Kafrawy, P. M., Sauber, A. M., Hafez, M. M., & Shawish, A. F. (2018). HDFSx: An enhanced model to handle small files in Hadoop with a simulating toolkit. In 1st International conference on computer applications & information security (ICCAIS), Riyadh (pp. 1–8).
25.
go back to reference Kaseb, M. R., Khafagy, M. H., Ali, I. A., & Saad, E. M. (2019). An improved technique for increasing availability in big data replication. Future Generation Computer Systems, 91, 493–505.CrossRef Kaseb, M. R., Khafagy, M. H., Ali, I. A., & Saad, E. M. (2019). An improved technique for increasing availability in big data replication. Future Generation Computer Systems, 91, 493–505.CrossRef
27.
go back to reference Hakak, S., Kamsin, A., Shivakumara, P., Idris, M. Y. I., & Gilkar, G. A. (2018). A new split based searching for exact pattern matching for natural texts. PloS One, 13(7), e0200912.CrossRef Hakak, S., Kamsin, A., Shivakumara, P., Idris, M. Y. I., & Gilkar, G. A. (2018). A new split based searching for exact pattern matching for natural texts. PloS One, 13(7), e0200912.CrossRef
28.
go back to reference Riesinger, C., Neckel, T., & Rupp, F. (2018). Non-standard pseudo random number generators revisited for GPUs. Future Generation Computer Systems, 82, 482–492.CrossRef Riesinger, C., Neckel, T., & Rupp, F. (2018). Non-standard pseudo random number generators revisited for GPUs. Future Generation Computer Systems, 82, 482–492.CrossRef
29.
go back to reference Alizadeh, M., Abolfazli, S., Zamani, M., Baharun, S., & Sakurai, K. (2016). Authentication in mobile cloud computing: A survey. Journal of Network and Computer Applications, 61, 59–80.CrossRef Alizadeh, M., Abolfazli, S., Zamani, M., Baharun, S., & Sakurai, K. (2016). Authentication in mobile cloud computing: A survey. Journal of Network and Computer Applications, 61, 59–80.CrossRef
30.
go back to reference Simsiri, N., et al. (2018). Work-efficient parallel union-find. Concurrency and Computation: Practice and Experience, 30(4), e4333.CrossRef Simsiri, N., et al. (2018). Work-efficient parallel union-find. Concurrency and Computation: Practice and Experience, 30(4), e4333.CrossRef
33.
go back to reference Siddiqui, I. F., Qureshi, N. M. F., Chowdhry, B. S., & Uqaili, M. A. (2019). Edge-node-aware adaptive data processing framework for smart grid. Wireless Personal Communications, 106(1), 179–189.CrossRef Siddiqui, I. F., Qureshi, N. M. F., Chowdhry, B. S., & Uqaili, M. A. (2019). Edge-node-aware adaptive data processing framework for smart grid. Wireless Personal Communications, 106(1), 179–189.CrossRef
Metadata
Title
Pseudo-Cache-Based IoT Small Files Management Framework in HDFS Cluster
Authors
Isma Farah Siddiqui
Nawab Muhammad Faseeh Qureshi
Bhawani Shankar Chowdhry
Muhammad Aslam Uqaili
Publication date
02-05-2020
Publisher
Springer US
Published in
Wireless Personal Communications / Issue 3/2020
Print ISSN: 0929-6212
Electronic ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-020-07312-3

Other articles of this Issue 3/2020

Wireless Personal Communications 3/2020 Go to the issue