Swipe to navigate through the articles of this issue
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Data streams are good models to characterize dynamic, on-line, fast and high-volume data requirements of today’s businesses. However, sensitivity of data is usually an obstacle for deployment of many data streams applications. To address this challenging issue, many privacy preserving models, including k-anonymity, have been adapted to data streams. Data stream anonymization frameworks have already addressed how to preserve data quality as much as possible under bounded delays. In this work, our main motivation is to minimize average delay while keeping data quality high. It is our claim that data utility is a function of both data quality and data aging in data streams processing tasks. However, there is a tradeoff between data aging and data quality optimizations. To this end, we present a tunable data stream k-anonymization framework and an algorithm named UBDSA (Utility Based Approach for Data Stream Anonymization). To attain high quality anonymity groups, UBDSA also introduces a new distance metric, named CAIL (Cardinality Aware Information Loss). Our experimental evaluations compare performance of UBDSA with the literature, and the results show its merit in terms of better average delay and information loss.
Please log in to get access to this content
To get access to this content you need the following product:
Abul, O., Bonchi, F., Nanni, M. (2008). Never walk alone: uncertainty for anonymity in moving objects databases. In Proc. of 24th international conference on data engineering (ICDE).
Adult. (2019). Uci machine learning repository. ftp://ftp.ics.uci.edu/pub/.
Aggarwal, C.C. (2003). A framework for diagnosing changes in evolving data streams. In Proceedings of the 2003 ACM SIGMOD international conference on management of data, SIGMOD ’03 (pp. 575–586). New York: ACM. http://doi.acm.org/10.1145/872757.872826.
Aggarwal, C.C. (2005). On k-anonymity and the curse of dimensionality. In Proceedings of the 31st international conference on very large data bases. VLDB Endowment (pp. 901–909).
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A. (2005). Approximation algorithms for k-anonymity. Journal of Privacy Technology (JOPT).
Apache Spark. (2019). Unified analytics engine for big data. https://spark.apache.org/.
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D. (2008). Anonymity preserving pattern discovery. VLDB Journal, 17(4), 703–727. CrossRef
Cao, F., Estert, M., Qian, W., Zhou, A. (2006). Density-based clustering over an evolving data stream with noise. In Proceedings of the 2006 SIAM international conference on data mining. SIAM (pp. 328–339).
Cao, J., Carminati, B., Ferrari, E., Tan, K.L. (2011). Castle: continuously anonymizing data streams. IEEE Transactions on Dependable and Secure Computing, 8(3), 337–352. CrossRef
Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM (pp. 71–80).
Fung, B.C., Wang, K., Yu, P.S. (2005). Top-down specialization for information and privacy preservation. In 21st International conference on data engineering, 2005. ICDE 2005. Proceedings. IEEE (pp. 205–216).
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S. (2009). Data stream mining. In Data mining and knowledge discovery handbook. Springer (pp. 759–787).
Gedik, B., & Liu, L. (2008). Protecting location privacy with personalized k-anonymity: architecture and algorithms. IEEE Transactions on Mobile Computing, 7 (1), 1–18. CrossRef
Guo, K., & Zhang, Q. (2013). Fast clustering-based anonymization approaches with time constraints for data streams. Knowledge-Based Systems, 46, 95–108. CrossRef
Hu, X., Sun, Z., Wu, Y., Hu, W., Dong, J. (2009). K-anonymity based on sensitive tuples. In 2009 First international workshop on database technology and applications. IEEE (pp. 91–94).
Kim, S., Sung, M.K., Chung, Y.D. (2014). A framework to preserve the privacy of electronic health data streams. Journal of Biomedical Informatics, 50, 95–106. CrossRef
Koukis, D., Antonatos, S., Antoniades, D., Markatos, E.P., Trimintzios, P. (2006). A generic anonymization framework for network traffic. In IEEE International Conference on Communications, 2006. ICC’06. IEEE, (Vol. 5 pp. 2302–2309).
Kumar, S.N., & et al. (2013). Sensitive attributes based privacy preserving in data mining using k-anonymity. International Journal of Computer Applications, 84(13), 1–6. CrossRef
LeFevre, K., DeWitt, D.J., Ramakrishnan, R. (2006). Mondrian multidimensional k-anonymity. In Proceedings of the 22nd international conference on data engineering, 2006. ICDE’06. IEEE (pp. 25–25).
Li, N., Li, T., Venkatasubramanian, S. (2007). t-closeness: privacy beyond k-anonymity and l-diversity. In IEEE 23rd International conference on data engineering, 2007. ICDE 2007. IEEE (pp. 106–115).
Li, J., Ooi, B.C., Wang, W. (2008). Anonymizing streaming data for privacy protection. In IEEE 24th international conference on data engineering, 2008. ICDE 2008. IEEE (pp. 1367–1369).
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M. (2006). l-diversity: privacy beyond k-anonymity. In Proceedings of the 22nd international conference on data engineering, 2006. ICDE’06. IEEE (pp. 24–24).
MapReduce. (2019). Mapreduce tutorial. Apache. https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html.
Meyerson, A., & Williams, R. (2004). On the complexity of optimal k-anonymity. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM (pp. 223–228).
Mohamed, M.A., Nagi, M.H., Ghanem, S.M. (2016). A clustering approach for anonymizing distributed data streams. In 2016 11th international conference on computer engineering & systems (ICCES). IEEE (pp. 9–16).
Mohammadian, E., Noferesti, M., Jalili, R. (2014). Fast: fast anonymization of big data streams. In Proceedings of the 2014 international conference on big data science and computing. ACM (p. 23).
Nergiz, M.E., Atzori, M., Saygin, Y., Guc, B. (2009). Towards trajectory anonymization a generalization based approach. Transactions on Data Privacy, 2(106), 47–75. MathSciNet
Otgonbayar, A., Pervez, Z., Dahal, K. (2016). Toward anonymizing iot data streams via partitioning. In 2016 IEEE 13th International conference on mobile ad hoc and sensor systems (MASS). IEEE (pp. 331–336).
Otgonbayar, A., Pervez, Z., Dahal, K., Eager, S. (2018). K-varp: K-anonymity for varied data streams via partitioning. Information Sciences, 467, 238–255. CrossRef
Sakpere, A.B., & Kayem, A.V. (2015). Adaptive buffer resizing for efficient anonymization of streaming data with minimal information loss. In 2015 international conference on information systems security and privacy (ICISSP). IEEE (pp. 1–11).
Sopaoglu, U., & Abul, O. (2017). A top-down k-anonymization implementation for apache spark. In 2017 IEEE International conference on big data (Big Data). IEEE (pp. 4513–4521).
Telco. (2019). Telco customer dataset. https://www.kaggle.com/blastchar/telco-customer-churn.
Wang, K., Yu, P.S., Chakraborty, S. (2004). Bottom-up generalization: a data mining solution to privacy protection. In Fourth IEEE international conference on data mining, 2004. ICDM’04. IEEE (pp. 249–256).
Wang, W., Li, J., Ai, C., Li, Y. (2007). Privacy protection on sliding window of data streams. In International conference on collaborative computing: networking, applications and worksharing, 2007. CollaborateCom 2007. IEEE (pp. 213–221).
Wang, P., Lu, J., Zhao, L., Yang, J. (2010). B-castle: an efficient publishing algorithm for k-anonymizing data streams. In 2010 second WRI global congress on intelligent systems (GCIS). IEEE, (Vol. 2 pp. 132–136).
Zakerzadeh, H., & Osborn, S.L. (2011). Faanst: fast anonymizing algorithm for numerical streaming data. In Data privacy management and autonomous spontaneous security. Springer (pp. 36–50).
Zakerzadeh, H., & Osborn, S.L. (2013). Delay-sensitive approaches for anonymizing numerical streaming data. International Journal of Information Security, 12(5), 423–437. CrossRef
Zhang, J., Yang, J., Zhang, J., Yuan, Y. (2010). Kids: k-anonymization data stream base on sliding window. In 2010 2nd International conference on future computer and Communication (ICFCC). IEEE, (Vol. 2 pp. V2–311).
Zhang, X., Yang, L.T., Liu, C., Chen, J. (2014b). A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Transactions on Parallel and Distributed Systems, 25(2), 363–373. CrossRef
Zhou, A., Cao, F., Qian, W., Jin, C. (2008). Tracking clusters in evolving data streams over sliding windows. Knowledge and Information Systems, 15(2), 181–214. CrossRef
- A utility based approach for data stream anonymization
- Publication date
- Springer US
Journal of Intelligent Information Systems
Integrating Artificial Intelligence and Database Technologies
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
Neuer Inhalt/© ITandMEDIA