Skip to main content
Top
Published in: Cluster Computing 5/2019

21-08-2017

Detection of duplicated data with minimum overhead and secure data transmission for sensor big data

Authors: S. Beulah, F. Ramesh Dhanaseelan

Published in: Cluster Computing | Special Issue 5/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Big data refers to the data sets that are difficult to deal with traditional data processing applications because of its speed, size and variety of data. The big data were generated from activities, sensing devices, mobile devices, Internet, RFID readers etc. One of the key sources of big data is the data from the sensor. The significant amounts of the data from the sensor are either redundant or almost similar. It initiates the requirement of de-duplication of the sensor data. The data from the sensors need to be stored for further process or analysis which requires end-to-end security for the data. A method is proposed in this paper for detecting the similar data with light-weight process using pattern analysis and matching. The distributed encoding process is proposed here for imposing end-to-end security for the generated data with reduced communication overhead. The data received in the processing server are decoded, analyzed and matched with patterns for removing similar and duplicated data. The result shows that the proposed system secures data during transmission with light-weighted processes. The duplicated and similar data are detected efficiently through inline process before the data enter into the storage. Experimental results are given as proof of the above mentioned concept.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Digital Data Created in 2020 Forecasted at 35 Zettabytes; Cloud Computing Will Manage Data Growth [Online]. By Todd Erickson Digital Data Created in 2020 Forecasted at 35 Zettabytes; Cloud Computing Will Manage Data Growth [Online]. By Todd Erickson
2.
go back to reference Gantz, J., Reinsel, D.: Extracting value from chaos. IDC Rev. 1142, 1–12 (2011) Gantz, J., Reinsel, D.: Extracting value from chaos. IDC Rev. 1142, 1–12 (2011)
3.
go back to reference Xia, W., et al.: DARE: A deduplication-aware resemblance detection and elimination scheme for data reduction with low overheads. IEEE Trans. Comput. 65(6), 1692–1705 (2016)MathSciNetCrossRef Xia, W., et al.: DARE: A deduplication-aware resemblance detection and elimination scheme for data reduction with low overheads. IEEE Trans. Comput. 65(6), 1692–1705 (2016)MathSciNetCrossRef
4.
go back to reference Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings USENIX Conferences File Storage Technologies, Jan, pp. 89–101 (2002) Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings USENIX Conferences File Storage Technologies, Jan, pp. 89–101 (2002)
5.
go back to reference Zhu, B., Li, K., Patterson, R.H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings 6th USENIX Conferences File Storage Technologies, Feb, vol. 8, pp. 1–14 (2008) Zhu, B., Li, K., Patterson, R.H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings 6th USENIX Conferences File Storage Technologies, Feb, vol. 8, pp. 1–14 (2008)
6.
go back to reference Muthitacharoen, A., Chen, B., Mazieres, D.: A low-bandwidth network file system. In: Proceedings ACM Symposium on Operating Systems Principles. Oct, pp. 1–14 (2001) Muthitacharoen, A., Chen, B., Mazieres, D.: A low-bandwidth network file system. In: Proceedings ACM Symposium on Operating Systems Principles. Oct, pp. 1–14 (2001)
7.
go back to reference Shilane, P., Huang, M., Wallace, G., Hsu, W.: WAN optimized replication of backup datasets using stream-informed delta compression. In: Proceedings 10th USENIX Conferences File Storage Technol., Feb, pp. 49–64 (2012) Shilane, P., Huang, M., Wallace, G., Hsu, W.: WAN optimized replication of backup datasets using stream-informed delta compression. In: Proceedings 10th USENIX Conferences File Storage Technol., Feb, pp. 49–64 (2012)
8.
go back to reference Kulkarni, P., Douglis, F., LaVoie, J.D., Tracey, J.M.: Redundancy elimination within large collections of files. In: Proceedings USENIX Annual Technical Conference, Jun, pp. 59–72 (2012) Kulkarni, P., Douglis, F., LaVoie, J.D., Tracey, J.M.: Redundancy elimination within large collections of files. In: Proceedings USENIX Annual Technical Conference, Jun, pp. 59–72 (2012)
9.
go back to reference Yang, Q., Ren, J.: I-cash: Intelligently coupled array of SSD and HDD. In: Proceedings 17th IEEE International Symposium High Perform. Computer Architecture , Feb, pp. 278–289 (2011) Yang, Q., Ren, J.: I-cash: Intelligently coupled array of SSD and HDD. In: Proceedings 17th IEEE International Symposium High Perform. Computer Architecture , Feb, pp. 278–289 (2011)
10.
go back to reference Gupta, D., Lee, S., Vrable, M., Savage, S., Snoeren, A.C., Varghese, G., Voelker, G.M., Vahdat, A.: Difference engine: Harnessing memory redundancy in virtu’al machines. In: Proceedings 5th Symposium on Operating Systems Design Implementation., Dec, pp. 309–322 ( 2008) Gupta, D., Lee, S., Vrable, M., Savage, S., Snoeren, A.C., Varghese, G., Voelker, G.M., Vahdat, A.: Difference engine: Harnessing memory redundancy in virtu’al machines. In: Proceedings 5th Symposium on Operating Systems Design Implementation., Dec, pp. 309–322 ( 2008)
11.
go back to reference Dong, X., et al.: Secure sensitive data sharing on a big data platform. Tsinghua Sci. Technol. 20(1), 72–80 (2015)MathSciNetCrossRef Dong, X., et al.: Secure sensitive data sharing on a big data platform. Tsinghua Sci. Technol. 20(1), 72–80 (2015)MathSciNetCrossRef
12.
13.
go back to reference Lu, H., Li, J., Guizani, M.: Secure and efficient data transmission for cluster-based wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 25(3), 750–761 (2014)CrossRef Lu, H., Li, J., Guizani, M.: Secure and efficient data transmission for cluster-based wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 25(3), 750–761 (2014)CrossRef
14.
go back to reference Xu, K., Yue, H., Guo, L., Guo, Y., Fang, Y.; Privacy-preserving machine learning algorithms for big data systems. In: ICDCS, IEEE (2015) Xu, K., Yue, H., Guo, L., Guo, Y., Fang, Y.; Privacy-preserving machine learning algorithms for big data systems. In: ICDCS, IEEE (2015)
15.
go back to reference Zakerzadeh, H., Aggarwal, C.C, Barker, K.: Privacy-preserving big data publishing. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management. ACM (2015) Zakerzadeh, H., Aggarwal, C.C, Barker, K.: Privacy-preserving big data publishing. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management. ACM (2015)
16.
17.
go back to reference Hashem, I.A.T., et al.: The rise of “big data” on cloud computing: Review and open research issues. Inf. Syst. 47, 98–115 (2015)CrossRef Hashem, I.A.T., et al.: The rise of “big data” on cloud computing: Review and open research issues. Inf. Syst. 47, 98–115 (2015)CrossRef
18.
go back to reference Harnik, D., Pinkas, B., Shulman-Peleg, A.: Side channels in cloud services: Deduplication in cloud storage. IEEE Secur. Priv. 8(6), 40–47 (2010)CrossRef Harnik, D., Pinkas, B., Shulman-Peleg, A.: Side channels in cloud services: Deduplication in cloud storage. IEEE Secur. Priv. 8(6), 40–47 (2010)CrossRef
19.
go back to reference Akhila, K., Ganesh, A., Sunitha, C.: A study on deduplication techniques over encrypted data. Proc. Comput. Sci. 87, 38–43 (2016)CrossRef Akhila, K., Ganesh, A., Sunitha, C.: A study on deduplication techniques over encrypted data. Proc. Comput. Sci. 87, 38–43 (2016)CrossRef
20.
go back to reference Jayapandian, N., Md Rahman, A.M.J.: Secure and efficient online data storage and sharing over cloud environment using probabilistic with homomorphic encryption. Clust. Comput. 20(2), 1561–1573 (2017)CrossRef Jayapandian, N., Md Rahman, A.M.J.: Secure and efficient online data storage and sharing over cloud environment using probabilistic with homomorphic encryption. Clust. Comput. 20(2), 1561–1573 (2017)CrossRef
21.
go back to reference Stanek, J. et al.: A secure data deduplication scheme for cloud storage. In: International Conference on Financial Cryptography and Data Security. Springer, Heidelberg (2014) Stanek, J. et al.: A secure data deduplication scheme for cloud storage. In: International Conference on Financial Cryptography and Data Security. Springer, Heidelberg (2014)
22.
go back to reference Low, W.L., Lee, M.L., Ling, T.W.: A knowledge-based approach for duplicate elimination in data cleaning. Inf. Syst. 26(8), 585–606 (2001)CrossRef Low, W.L., Lee, M.L., Ling, T.W.: A knowledge-based approach for duplicate elimination in data cleaning. Inf. Syst. 26(8), 585–606 (2001)CrossRef
23.
go back to reference Li, Y. et al.: Outsourced privacy-preserving C4. 5 decision tree algorithm over horizontally and vertically partitioned dataset among multiple parties. Clust. Comput. (2017). doi:10.1007/s10586-017-1019-9 Li, Y. et al.: Outsourced privacy-preserving C4. 5 decision tree algorithm over horizontally and vertically partitioned dataset among multiple parties. Clust. Comput. (2017). doi:10.​1007/​s10586-017-1019-9
24.
Metadata
Title
Detection of duplicated data with minimum overhead and secure data transmission for sensor big data
Authors
S. Beulah
F. Ramesh Dhanaseelan
Publication date
21-08-2017
Publisher
Springer US
Published in
Cluster Computing / Issue Special Issue 5/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1079-x

Other articles of this Special Issue 5/2019

Cluster Computing 5/2019 Go to the issue

Premium Partner