Skip to main content
Top
Published in: Journal of Intelligent Information Systems 3/2020

08-10-2019

A utility based approach for data stream anonymization

Authors: Ugur Sopaoglu, Osman Abul

Published in: Journal of Intelligent Information Systems | Issue 3/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Data streams are good models to characterize dynamic, on-line, fast and high-volume data requirements of today’s businesses. However, sensitivity of data is usually an obstacle for deployment of many data streams applications. To address this challenging issue, many privacy preserving models, including k-anonymity, have been adapted to data streams. Data stream anonymization frameworks have already addressed how to preserve data quality as much as possible under bounded delays. In this work, our main motivation is to minimize average delay while keeping data quality high. It is our claim that data utility is a function of both data quality and data aging in data streams processing tasks. However, there is a tradeoff between data aging and data quality optimizations. To this end, we present a tunable data stream k-anonymization framework and an algorithm named UBDSA (Utility Based Approach for Data Stream Anonymization). To attain high quality anonymity groups, UBDSA also introduces a new distance metric, named CAIL (Cardinality Aware Information Loss). Our experimental evaluations compare performance of UBDSA with the literature, and the results show its merit in terms of better average delay and information loss.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Abul, O., Bonchi, F., Nanni, M. (2008). Never walk alone: uncertainty for anonymity in moving objects databases. In Proc. of 24th international conference on data engineering (ICDE). Abul, O., Bonchi, F., Nanni, M. (2008). Never walk alone: uncertainty for anonymity in moving objects databases. In Proc. of 24th international conference on data engineering (ICDE).
go back to reference Aggarwal, C.C. (2005). On k-anonymity and the curse of dimensionality. In Proceedings of the 31st international conference on very large data bases. VLDB Endowment (pp. 901–909). Aggarwal, C.C. (2005). On k-anonymity and the curse of dimensionality. In Proceedings of the 31st international conference on very large data bases. VLDB Endowment (pp. 901–909).
go back to reference Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A. (2005). Approximation algorithms for k-anonymity. Journal of Privacy Technology (JOPT). Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A. (2005). Approximation algorithms for k-anonymity. Journal of Privacy Technology (JOPT).
go back to reference Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D. (2008). Anonymity preserving pattern discovery. VLDB Journal, 17(4), 703–727.CrossRef Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D. (2008). Anonymity preserving pattern discovery. VLDB Journal, 17(4), 703–727.CrossRef
go back to reference Cao, F., Estert, M., Qian, W., Zhou, A. (2006). Density-based clustering over an evolving data stream with noise. In Proceedings of the 2006 SIAM international conference on data mining. SIAM (pp. 328–339). Cao, F., Estert, M., Qian, W., Zhou, A. (2006). Density-based clustering over an evolving data stream with noise. In Proceedings of the 2006 SIAM international conference on data mining. SIAM (pp. 328–339).
go back to reference Cao, J., Carminati, B., Ferrari, E., Tan, K.L. (2011). Castle: continuously anonymizing data streams. IEEE Transactions on Dependable and Secure Computing, 8(3), 337–352.CrossRef Cao, J., Carminati, B., Ferrari, E., Tan, K.L. (2011). Castle: continuously anonymizing data streams. IEEE Transactions on Dependable and Secure Computing, 8(3), 337–352.CrossRef
go back to reference Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM (pp. 71–80). Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM (pp. 71–80).
go back to reference Fung, B.C., Wang, K., Yu, P.S. (2005). Top-down specialization for information and privacy preservation. In 21st International conference on data engineering, 2005. ICDE 2005. Proceedings. IEEE (pp. 205–216). Fung, B.C., Wang, K., Yu, P.S. (2005). Top-down specialization for information and privacy preservation. In 21st International conference on data engineering, 2005. ICDE 2005. Proceedings. IEEE (pp. 205–216).
go back to reference Gaber, M.M., Zaslavsky, A., Krishnaswamy, S. (2009). Data stream mining. In Data mining and knowledge discovery handbook. Springer (pp. 759–787). Gaber, M.M., Zaslavsky, A., Krishnaswamy, S. (2009). Data stream mining. In Data mining and knowledge discovery handbook. Springer (pp. 759–787).
go back to reference Gedik, B., & Liu, L. (2008). Protecting location privacy with personalized k-anonymity: architecture and algorithms. IEEE Transactions on Mobile Computing, 7 (1), 1–18.CrossRef Gedik, B., & Liu, L. (2008). Protecting location privacy with personalized k-anonymity: architecture and algorithms. IEEE Transactions on Mobile Computing, 7 (1), 1–18.CrossRef
go back to reference Guo, K., & Zhang, Q. (2013). Fast clustering-based anonymization approaches with time constraints for data streams. Knowledge-Based Systems, 46, 95–108.CrossRef Guo, K., & Zhang, Q. (2013). Fast clustering-based anonymization approaches with time constraints for data streams. Knowledge-Based Systems, 46, 95–108.CrossRef
go back to reference Hu, X., Sun, Z., Wu, Y., Hu, W., Dong, J. (2009). K-anonymity based on sensitive tuples. In 2009 First international workshop on database technology and applications. IEEE (pp. 91–94). Hu, X., Sun, Z., Wu, Y., Hu, W., Dong, J. (2009). K-anonymity based on sensitive tuples. In 2009 First international workshop on database technology and applications. IEEE (pp. 91–94).
go back to reference Kim, S., Sung, M.K., Chung, Y.D. (2014). A framework to preserve the privacy of electronic health data streams. Journal of Biomedical Informatics, 50, 95–106.CrossRef Kim, S., Sung, M.K., Chung, Y.D. (2014). A framework to preserve the privacy of electronic health data streams. Journal of Biomedical Informatics, 50, 95–106.CrossRef
go back to reference Koukis, D., Antonatos, S., Antoniades, D., Markatos, E.P., Trimintzios, P. (2006). A generic anonymization framework for network traffic. In IEEE International Conference on Communications, 2006. ICC’06. IEEE, (Vol. 5 pp. 2302–2309). Koukis, D., Antonatos, S., Antoniades, D., Markatos, E.P., Trimintzios, P. (2006). A generic anonymization framework for network traffic. In IEEE International Conference on Communications, 2006. ICC’06. IEEE, (Vol. 5 pp. 2302–2309).
go back to reference Kumar, S.N., & et al. (2013). Sensitive attributes based privacy preserving in data mining using k-anonymity. International Journal of Computer Applications, 84(13), 1–6.CrossRef Kumar, S.N., & et al. (2013). Sensitive attributes based privacy preserving in data mining using k-anonymity. International Journal of Computer Applications, 84(13), 1–6.CrossRef
go back to reference LeFevre, K., DeWitt, D.J., Ramakrishnan, R. (2006). Mondrian multidimensional k-anonymity. In Proceedings of the 22nd international conference on data engineering, 2006. ICDE’06. IEEE (pp. 25–25). LeFevre, K., DeWitt, D.J., Ramakrishnan, R. (2006). Mondrian multidimensional k-anonymity. In Proceedings of the 22nd international conference on data engineering, 2006. ICDE’06. IEEE (pp. 25–25).
go back to reference Li, N., Li, T., Venkatasubramanian, S. (2007). t-closeness: privacy beyond k-anonymity and l-diversity. In IEEE 23rd International conference on data engineering, 2007. ICDE 2007. IEEE (pp. 106–115). Li, N., Li, T., Venkatasubramanian, S. (2007). t-closeness: privacy beyond k-anonymity and l-diversity. In IEEE 23rd International conference on data engineering, 2007. ICDE 2007. IEEE (pp. 106–115).
go back to reference Li, J., Ooi, B.C., Wang, W. (2008). Anonymizing streaming data for privacy protection. In IEEE 24th international conference on data engineering, 2008. ICDE 2008. IEEE (pp. 1367–1369). Li, J., Ooi, B.C., Wang, W. (2008). Anonymizing streaming data for privacy protection. In IEEE 24th international conference on data engineering, 2008. ICDE 2008. IEEE (pp. 1367–1369).
go back to reference Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M. (2006). l-diversity: privacy beyond k-anonymity. In Proceedings of the 22nd international conference on data engineering, 2006. ICDE’06. IEEE (pp. 24–24). Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M. (2006). l-diversity: privacy beyond k-anonymity. In Proceedings of the 22nd international conference on data engineering, 2006. ICDE’06. IEEE (pp. 24–24).
go back to reference Meyerson, A., & Williams, R. (2004). On the complexity of optimal k-anonymity. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM (pp. 223–228). Meyerson, A., & Williams, R. (2004). On the complexity of optimal k-anonymity. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM (pp. 223–228).
go back to reference Mohamed, M.A., Nagi, M.H., Ghanem, S.M. (2016). A clustering approach for anonymizing distributed data streams. In 2016 11th international conference on computer engineering & systems (ICCES). IEEE (pp. 9–16). Mohamed, M.A., Nagi, M.H., Ghanem, S.M. (2016). A clustering approach for anonymizing distributed data streams. In 2016 11th international conference on computer engineering & systems (ICCES). IEEE (pp. 9–16).
go back to reference Mohammadian, E., Noferesti, M., Jalili, R. (2014). Fast: fast anonymization of big data streams. In Proceedings of the 2014 international conference on big data science and computing. ACM (p. 23). Mohammadian, E., Noferesti, M., Jalili, R. (2014). Fast: fast anonymization of big data streams. In Proceedings of the 2014 international conference on big data science and computing. ACM (p. 23).
go back to reference Nergiz, M.E., Atzori, M., Saygin, Y., Guc, B. (2009). Towards trajectory anonymization a generalization based approach. Transactions on Data Privacy, 2(106), 47–75.MathSciNet Nergiz, M.E., Atzori, M., Saygin, Y., Guc, B. (2009). Towards trajectory anonymization a generalization based approach. Transactions on Data Privacy, 2(106), 47–75.MathSciNet
go back to reference Otgonbayar, A., Pervez, Z., Dahal, K. (2016). Toward anonymizing iot data streams via partitioning. In 2016 IEEE 13th International conference on mobile ad hoc and sensor systems (MASS). IEEE (pp. 331–336). Otgonbayar, A., Pervez, Z., Dahal, K. (2016). Toward anonymizing iot data streams via partitioning. In 2016 IEEE 13th International conference on mobile ad hoc and sensor systems (MASS). IEEE (pp. 331–336).
go back to reference Otgonbayar, A., Pervez, Z., Dahal, K., Eager, S. (2018). K-varp: K-anonymity for varied data streams via partitioning. Information Sciences, 467, 238–255.CrossRef Otgonbayar, A., Pervez, Z., Dahal, K., Eager, S. (2018). K-varp: K-anonymity for varied data streams via partitioning. Information Sciences, 467, 238–255.CrossRef
go back to reference Sakpere, A.B., & Kayem, A.V. (2015). Adaptive buffer resizing for efficient anonymization of streaming data with minimal information loss. In 2015 international conference on information systems security and privacy (ICISSP). IEEE (pp. 1–11). Sakpere, A.B., & Kayem, A.V. (2015). Adaptive buffer resizing for efficient anonymization of streaming data with minimal information loss. In 2015 international conference on information systems security and privacy (ICISSP). IEEE (pp. 1–11).
go back to reference Sopaoglu, U., & Abul, O. (2017). A top-down k-anonymization implementation for apache spark. In 2017 IEEE International conference on big data (Big Data). IEEE (pp. 4513–4521). Sopaoglu, U., & Abul, O. (2017). A top-down k-anonymization implementation for apache spark. In 2017 IEEE International conference on big data (Big Data). IEEE (pp. 4513–4521).
go back to reference Sweeney, L. (2002). k-anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557–570.MathSciNetCrossRef Sweeney, L. (2002). k-anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557–570.MathSciNetCrossRef
go back to reference Wang, K., Yu, P.S., Chakraborty, S. (2004). Bottom-up generalization: a data mining solution to privacy protection. In Fourth IEEE international conference on data mining, 2004. ICDM’04. IEEE (pp. 249–256). Wang, K., Yu, P.S., Chakraborty, S. (2004). Bottom-up generalization: a data mining solution to privacy protection. In Fourth IEEE international conference on data mining, 2004. ICDM’04. IEEE (pp. 249–256).
go back to reference Wang, W., Li, J., Ai, C., Li, Y. (2007). Privacy protection on sliding window of data streams. In International conference on collaborative computing: networking, applications and worksharing, 2007. CollaborateCom 2007. IEEE (pp. 213–221). Wang, W., Li, J., Ai, C., Li, Y. (2007). Privacy protection on sliding window of data streams. In International conference on collaborative computing: networking, applications and worksharing, 2007. CollaborateCom 2007. IEEE (pp. 213–221).
go back to reference Wang, P., Lu, J., Zhao, L., Yang, J. (2010). B-castle: an efficient publishing algorithm for k-anonymizing data streams. In 2010 second WRI global congress on intelligent systems (GCIS). IEEE, (Vol. 2 pp. 132–136). Wang, P., Lu, J., Zhao, L., Yang, J. (2010). B-castle: an efficient publishing algorithm for k-anonymizing data streams. In 2010 second WRI global congress on intelligent systems (GCIS). IEEE, (Vol. 2 pp. 132–136).
go back to reference Zakerzadeh, H., & Osborn, S.L. (2011). Faanst: fast anonymizing algorithm for numerical streaming data. In Data privacy management and autonomous spontaneous security. Springer (pp. 36–50). Zakerzadeh, H., & Osborn, S.L. (2011). Faanst: fast anonymizing algorithm for numerical streaming data. In Data privacy management and autonomous spontaneous security. Springer (pp. 36–50).
go back to reference Zakerzadeh, H., & Osborn, S.L. (2013). Delay-sensitive approaches for anonymizing numerical streaming data. International Journal of Information Security, 12(5), 423–437.CrossRef Zakerzadeh, H., & Osborn, S.L. (2013). Delay-sensitive approaches for anonymizing numerical streaming data. International Journal of Information Security, 12(5), 423–437.CrossRef
go back to reference Zhang, J., Yang, J., Zhang, J., Yuan, Y. (2010). Kids: k-anonymization data stream base on sliding window. In 2010 2nd International conference on future computer and Communication (ICFCC). IEEE, (Vol. 2 pp. V2–311). Zhang, J., Yang, J., Zhang, J., Yuan, Y. (2010). Kids: k-anonymization data stream base on sliding window. In 2010 2nd International conference on future computer and Communication (ICFCC). IEEE, (Vol. 2 pp. V2–311).
go back to reference Zhang, X., Liu, C., Nepal, S., Yang, C., Dou, W., Chen, J. (2014a). A hybrid approach for scalable sub-tree anonymization over big data using mapreduce on cloud. Journal of Computer and System Sciences, 80(5), 1008–1020.MathSciNetCrossRef Zhang, X., Liu, C., Nepal, S., Yang, C., Dou, W., Chen, J. (2014a). A hybrid approach for scalable sub-tree anonymization over big data using mapreduce on cloud. Journal of Computer and System Sciences, 80(5), 1008–1020.MathSciNetCrossRef
go back to reference Zhang, X., Yang, L.T., Liu, C., Chen, J. (2014b). A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Transactions on Parallel and Distributed Systems, 25(2), 363–373.CrossRef Zhang, X., Yang, L.T., Liu, C., Chen, J. (2014b). A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Transactions on Parallel and Distributed Systems, 25(2), 363–373.CrossRef
go back to reference Zhang, X., Dou, W., Pei, J., Nepal, S., Yang, C., Liu, C., Chen, J. (2015). Proximity-aware local-recoding anonymization with mapreduce for scalable big data privacy preservation in cloud. IEEE Transactions on Computers, 64(8), 2293–2307.MathSciNetCrossRef Zhang, X., Dou, W., Pei, J., Nepal, S., Yang, C., Liu, C., Chen, J. (2015). Proximity-aware local-recoding anonymization with mapreduce for scalable big data privacy preservation in cloud. IEEE Transactions on Computers, 64(8), 2293–2307.MathSciNetCrossRef
go back to reference Zhou, A., Cao, F., Qian, W., Jin, C. (2008). Tracking clusters in evolving data streams over sliding windows. Knowledge and Information Systems, 15(2), 181–214.CrossRef Zhou, A., Cao, F., Qian, W., Jin, C. (2008). Tracking clusters in evolving data streams over sliding windows. Knowledge and Information Systems, 15(2), 181–214.CrossRef
Metadata
Title
A utility based approach for data stream anonymization
Authors
Ugur Sopaoglu
Osman Abul
Publication date
08-10-2019
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 3/2020
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-019-00577-6

Other articles of this Issue 3/2020

Journal of Intelligent Information Systems 3/2020 Go to the issue

Premium Partner