nach oben

Data Mining and Knowledge Discovery

Erschienen in:

28.06.2017

The best privacy defense is a good privacy offense: obfuscating a search engine user’s profile

verfasst von: Jörg Wicker, Stefan Kramer

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 5/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

User privacy on the internet is an important and unsolved problem. So far, no sufficient and comprehensive solution has been proposed that helps a user to protect his or her privacy while using the internet. Data are collected and assembled by numerous service providers. Solutions so far focused on the side of the service providers to store encrypted or transformed data that can be still used for analysis. This has a major flaw, as it relies on the service providers to do this. The user has no chance of actively protecting his or her privacy. In this work, we suggest a new approach, empowering the user to take advantage of the same tool the other side has, namely data mining to produce data which obfuscates the user’s profile. We apply this approach to search engine queries and use feedback of the search engines in terms of personalized advertisements in an algorithm similar to reinforcement learning to generate new queries potentially confusing the search engine. We evaluated the approach using a real-world data set. While evaluation is hard, we achieve results that indicate that it is possible to influence the user’s profile that the search engine generates. This shows that it is feasible to defend a user’s privacy from a new and more practical perspective.

Vorheriger Artikel Identifying consistent statements about numerical data with dispersion-corrected subgroup discovery

Nächster Artikel Local community detection in multilayer networks

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

While AOL retracted the data, several pages still provide access to the data and keep analyzing it, e.g., see http://www.aolstalker.com/.

This resembles the expected value for the distance between the user interest category \(\kappa _i\) and the assignment to an interest category by the search engine, with the difference that the categories do not exclude each other and thus the probabilities do not sum up to one.

In the terminology of Ceci et al., we are thus using a so-called proper training set, not a hierarchical training set. Another notable difference from standard hierarchical text categorization is that our training set consists of queries, not of full documents.

The implementation is available upon request.

Detailed results and statistics on the results are given in the supplementary material.

This could only be the case when the same action with regard to the user interest category would be chosen, which is not the case. This action almost never gets chosen, as it simply never is evaluated by high scores (details not shown here).

Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data. ACM, New York, pp 439–450

Aldeen YAAS, Salleh M, Razzaque MA (2015) A comprehensive review on privacy preserving data mining. SpringerPlus 4(1):694CrossRef

Barreno M, Nelson B, Joseph AD, Tygar J (2010) The security of machine learning. Mach Learn 81(2):121–148MathSciNetCrossRef

Barreno M, Nelson B, Sears R, Joseph AD, Tygar JD (2006) Can machine learning be secure? In: Proceedings of the 2006 ACM symposium on information, computer and communications security. ACM, New York, pp 16–25

Beato F, Conti M, Preneel B (2013) Friend in the middle (fim): tackling de-anonymization in social networks. In: IEEE international conference on pervasive computing and communications workshops (PERCOM Workshops), pp 279–284

Biggio B, Nelson B, Laskov P (2012) Poisoning attacks against support vector machines. In: Proceedings of the 29th international conference on machine learning (ICML-12), pp 1807–1814

Bilenko M, Richardson M (2011) Predictive client-side profiles for personalized advertising. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 413–421

Ceci M, Malerba D (2007) Classifying web documents in a hierarchy of categories: a comprehensive study. J Intell Inf Syst 28(1):37–78CrossRef

Eckersley P (2010) Privacy enhancing technologies: proceedings 10th international symposium, pets 2010, Berlin, Germany, July 21–23. In: Atallah MJ, Hopper NJ (eds) Privacy enhancing technologies, chapter How Unique Is Your Web Browser? Springer, Berlin, pp 1–18

Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869CrossRef

Gervais A, Shokri R, Singla A, Capkun S, Lenders V (2014) Quantifying web-search privacy. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, CCS ’14. ACM, New York, pp 966–977

Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newslett 11(1):10–18CrossRef

Howe DC, Nissenbaum H (2009) Trackmenot: resisting surveillance in web search. In: Kerr I, Steeves V, Lucock C (eds) Lessons from the identity trail: anonymity, privacy, and identity in a networked society, vol 23. Oxford University, Oxford, pp 417–436

Huang L, Joseph AD, Nelson B, Rubinstein BI, Tygar J (2011) Adversarial machine learning. In: Proceedings of the 4th ACM workshop on security and artificial intelligence. ACM, New York, pp 43–58

Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Third IEEE international conference on data mining, pp 99–106

Klivans AR, Long PM, Servedio RA (2009) Learning halfspaces with malicious noise. J Mach Learn Res 10:2715–2740MathSciNetMATH

Lowd D, Meek C (2005) Adversarial learning. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, New York, pp 641–647

Nikiforakis N, Joosen W, Livshits B (2015) Privaricator: Deceiving fingerprinters with little white lies. In: Proceedings of the 24th international conference on world wide web. International world wide web conferences steering committee, pp 820–830

Nikiforakis N, Kapravelos A, Joosen W, Kruegel C, Piessens F, Vigna G (2013) Cookieless monster: exploring the ecosystem of web-based device fingerprinting. In: IEEE symposium on security and privacy (SP), pp 541–555

Pedreschi D, Bonchi F, Turini F, Verykios VS, Atzori M, Malin B, Moelans B, Saygin Y (2008) Privacy protection: regulations and technologies, opportunities and threats. In: Giannotti F, Pedreschi D (eds) Mobility, data mining and privacy: geographic knowledge discovery. Springer, Berlin, pp 101–119CrossRef

Purcell K, Brenner J, Rainie L (2012) Search engine use 2012. Technical report, Pew Internet and American Life Project Washington

Rebollo-Monedero D, Forné J, Domingo-Ferrer J (2012) Query profile obfuscation by means of optimal query exchange between users. IEEE Trans Dependable Secure Comput 9(5):641–654

Sánchez D, Castellà-Roca J, Viejo A (2013) Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search engines. Inf Sci 218:17–30CrossRef

Skarkala ME, Maragoudakis M, Gritzalis S, Mitrou L, Toivonen H, Moen P (2012) Privacy preservation by k-anonymization of weighted social networks. In: Proceedings of the 2012 international conference on advances in social networks analysis and mining (ASONAM 2012), ASONAM ’12. IEEE Computer Society, Washington, DC, pp 423–428

Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT Press, Cambridge

Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM Sigmod Record 33(1):50–57CrossRef

Viejo A, Sánchez D (2014) Profiling social networks to provide useful and privacy-preserving web search. J Assoc Inf Sci Technol 65(12):2444–2458CrossRef

Wiering M, Van Otterlo M (2012) Reinforcement learning. In: Adaptation, learning, and optimization, vol 12. Springer Berlin Heidelberg

Xu L, Jiang C, Wang J, Yuan J, Ren Y (2014) Information security in big data: privacy and data mining. IEEE Access 2:1149–1176CrossRef

Titel: The best privacy defense is a good privacy offense: obfuscating a search engine user’s profile
verfasst von: Jörg Wicker
Stefan Kramer
Publikationsdatum: 28.06.2017
Verlag: Springer US
Erschienen in: Data Mining and Knowledge Discovery / Ausgabe 5/2017
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-017-0524-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 5/2017

Ensemble-based community detection in multilayer networks

MixedTrails: Bayesian hypothesis comparison on heterogeneous sequential data

Tour recommendation for groups

Identifying consistent statements about numerical data with dispersion-corrected subgroup discovery

Local community detection in multilayer networks

Lagrangian relaxations for multiple network alignment