Skip to main content

Über dieses Buch

Vast amounts of data are nowadays collected, stored and processed, in an effort to assist in making a variety of administrative and governmental decisions. These innovative steps considerably improve the speed, effectiveness and quality of decisions. Analyses are increasingly performed by data mining and profiling technologies that statistically and automatically determine patterns and trends. However, when such practices lead to unwanted or unjustified selections, they may result in unacceptable forms of discrimination.

Processing vast amounts of data may lead to situations in which data controllers know many of the characteristics, behaviors and whereabouts of people. In some cases, analysts might know more about individuals than these individuals know about themselves. Judging people by their digital identities sheds a different light on our views of privacy and data protection.

This book discusses discrimination and privacy issues related to data mining and profiling practices. It provides technological and regulatory solutions, to problems which arise in these innovative contexts. The book explains that common measures for mitigating privacy and discrimination, such as access controls and anonymity, fail to properly resolve privacy and discrimination concerns. Therefore, new solutions, focusing on technology design, transparency and accountability are called for and set forth.



Opportunities of Data Mining and Profiling


Data Dilemmas in the Information Society: Introduction and Overview

This chapter provides and introduction to this book and an overview of all chapters. First, it is pointed out what this book is about: discrimination and privacy issues of data mining and profiling and solutions (both technological and non-technological) for these issues. A large part of this book is based on research results of a project on how and to what extent legal and ethical rules can be integrated in data mining algorithms to prevent discrimination. Since this is an introductory chapter, it is explained what data mining and profiling are and why we need these tools in an information society. Despite this unmistakable need, however, data mining and profiling may also have undesirable effects, particularly discriminatory effects and privacy infringements. This creates dilemmas on how to deal with data mining and profiling. Regulation may take place using laws, norms, market forces and code (i.e., constraints in the architecture of technologies). This chapter concludes with an overview of the structure of this book, containing chapters on the opportunities of data mining and profiling, possible discrimination and privacy issues, practical applications and solutions in code, law, norms and the market.
Bart Custers

What Is Data Mining and How Does It Work?

Due to recent technological developments it became possible to generate and store increasingly larger datasets. Not the amount of data, however, but the ability to interpret and analyze the data, and to base future policies and decisions on the outcome of the analysis determines the value of data. The amounts of data collected nowadays not only offer unprecedented opportunities to improve decision procedures for companies and governments, but also hold great challenges. Many pre-existing data analysis tools did not scale up to the current data sizes. From this need, the research filed of data mining emerged. In this chapter we position data mining with respect to other data analysis techniques and introduce the most important classes of techniques developed in the area: pattern mining, classification, and clustering and outlier detection. Also related, supporting techniques such as pre-processing and database coupling are discussed.
Toon Calders, Bart Custers

Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures

Nowadays, more and more decision procedures are supported or even guided by automated processes. An important technique in this automation is data mining. In this chapter we study how such automatically generated decision support models may exhibit discriminatory behavior towards certain groups based upon, e.g., gender or ethnicity. Surprisingly, such behavior may even be observed when sensitive information is removed or suppressed and the whole procedure is guided by neutral arguments such as predictive accuracy only. The reason for this phenomenon is that most data mining methods are based upon assumptions that are not always satisfied in reality, namely, that the data is correct and represents the population well. In this chapter we discuss the implicit modeling assumptions made by most data mining algorithms and show situations in which they are not satisfied. Then we outline three realistic scenarios in which an unbiased process can lead to discriminatory models. The effects of the implicit assumptions not being fulfilled are illustrated by examples. The chapter concludes with an outline of the main challenges and problems to be solved.
Toon Calders, Indrė Žliobaitė

Possible Discrimination and Privacy Issues


A Comparative Analysis of Anti-Discrimination and Data Protection Legislations

Departing from the ECJ’s Huber case where Germany was condemned for discriminatory processing of personal data and which suggests that there is a strong kin between data protection and discrimination issues, this chapter is an attempt to further compare the two fundamental rights - non-discrimination, and data protection.
Beyond their place in the EU legal order, their respective object or scope, this chapter will contend that these two human rights increasingly turn to the same mode of operation, including, inter alia, reliance upon administrative structures and procedures, and the endowment of citizens with a bundle of individual rights. We will argue that this similarity can be understood in the light of their nature as regulatory human rights, that is, embodying the logic of negative freedom.
The final section will examine situations of overlap between the rights, building upon the Huber and Test-Achats cases. This will lead to final conclusions on how to best articulate these rights.
Raphaël Gellert, Katja de Vries, Paul de Hert, Serge Gutwirth

The Discovery of Discrimination

Discrimination discovery from data consists in the extraction of discriminatory situations and practices hidden in a large amount of historical decision records.We discuss the challenging problems in discrimination discovery, and present, in a unified form, a framework based on classification rules extraction and filtering on the basis of legally-grounded interestingness measures. The framework is implemented in the publicly available DCUBE tool. As a running example, we use a public dataset on credit scoring.
Dino Pedreschi, Salvatore Ruggieri, Franco Turini

Discrimination Data Analysis: A Multi-disciplinary Bibliography

Discrimination data analysis has been investigated for the last fifty years in a large body of social, legal, and economic studies. Recently, discrimination discovery and prevention has become a blooming research topic in the knowledge discovery community. This chapter provides a multi-disciplinary annotated bibliography of the literature on discrimination data analysis, with the intended objective to provide a common basis to researchers from a multi-disciplinary perspective.We cover legal, sociological, economic and computer science references.
Andrea Romei, Salvatore Ruggieri

Risks of Profiling and the Limits of Data Protection Law

Profiling and automated decision-making may pose risks to individuals. Possible risks that flow forth from profiling and automated decision-making include discrimination, de-individualisation and stereotyping. To mitigate these risks, the right to privacy is traditionally invoked. However, given the rapid technological developments in the area of profiling, it is questionable whether the right to informational privacy and data protection law provide an adequate level of protection and are effective in balancing different interests when it comes to profiling. To answer the question as to whether data protection law can adequately protect us against the risks of profiling, I will discuss the role of data protection law in the context of profiling and automated decision-making. First, the specific risks associated with profiling and automated decision-making are explored. From there I examine how data protection law addresses these risks. Next I discuss possible limitations and possible drawbacks of data protection law when it comes to the issue of profiling and automated decision-making. I conclude with several suggestions to for making current data protection law more effective in dealing with the risks of profiling. These include more focus on the actual goals of data processing and ‘ethics by design’.
Bart Schermer

Practical Applications


Explainable and Non-explainable Discrimination in Classification

Nowadays more and more decisions in lending, recruitment, grant or study applications are partially being automated based on computational models (classifiers) premised on historical data. If the historical data was discriminating towards socially and legally protected groups, a model learnt over this data will make discriminatory decisions in the future. As a solution, most of the discrimination free modeling techniques force the treatment of the sensitive groups to be equal and do not take into account that some differences may be explained by other factors and thus justified. For example, disproportional recruitment rates for males and females may be explainable by the fact that more males have higher education; treating males and females equally will introduce reverse discrimination, which may be undesirable as well. Given that the law or domain experts specify which factors are discriminatory (e.g. gender, marital status) and which can be used for explanation (e.g. education), this chapter presents a methodology how to quantify the tolerable difference in treatment of the sensitive groups. We instruct how to measure, which part of the difference is explainable and present the local learning techniques that remove exactly the illegal discrimination, allowing the differences in decisions to be present as long as they are explainable.
Faisal Kamiran, Indrė Žliobaitė

Knowledge-Based Policing: Augmenting Reality with Respect for Privacy

Contemporary information-led policing (ILP) and its derivative, knowledge-based policing (KBP) fail to deliver value at the edge of action. In this chapter we will argue that by designing augmented realities, information may become as intertwined with action as it can ever get. To this end, however, the positivist epistemological foundation of the synthesized world (and ILP and KBP for that matter) has to be brought into line with the interpretive-constructivist epistemological perspective of every day policing. Using a real-world example of the Dutch National Police Services Agency (KLPD) we illustrate how augmented reality may be used to identify and intercept criminals red-handedly. Subsequently we discuss how we think that the required data processing can be brought into line with the legislative requirements of subsidiarity, proportionality, and the linkage between ends and means, followed by a discussion about the consequences for, among other things, privacy, discrimination, and legislation.
Jan-Kees Schakel, Rutger Rienks, Reinier Ruissen

Combining and Analyzing Judicial Databases

To monitor crime and law enforcement, databases of several organizations, covering different parts of the criminal justice system, have to be integrated. Combined data from different organizations may then be analyzed, for instance, to investigate how specific groups of suspects move through the system. Such insight is useful for several reasons, for example, to define an effective and coherent safety policy. To integrate or relate judicial data two approaches are currently employed: a data warehouse and a dataspace approach. The former is useful for applications that require combined data on an individual level. The latter is suitable for data with a higher level of aggregation. However, developing applications that exploit combined judicial data is not without risk. One important issue while handling such data is the protection of the privacy of individuals. Therefore, several precautions have to be taken in the data integration process: use aggregate data, follow the Dutch Personal Data Protection Act, and filter out privacy-sensitive results. Another issue is that judicial data is essentially different from data in exact or technical sciences. Therefore, data mining should be used with caution, in particular to avoid incorrect conclusions and to prevent discrimination and stigmatization of certain groups of individuals.
Susan van den Braak, Sunil Choenni, Sicco Verwer

Solutions in Code


Privacy-Preserving Data Mining Techniques: Survey and Challenges

This chapter presents a brief summary and review of Privacy-preserving Data Mining (PPDM). The review of the existing approaches is structured along a tentative taxonomy of PPDM as a field. The main axes of this taxonomy specify what kind of data is being protected, and what is the ownership of the data (centralized or distributed). We comment on the relationship between PPDM and preventing discriminatory use of data mining techniques. We round up the chapter by discussing some of the new, arising challenges before PPDM as a field.
Stan Matwin

Techniques for Discrimination-Free Predictive Models

In this chapter, we give an overview of the techniques developed ourselves for constructing discrimination-free classifiers. In discrimination-free classification the goal is to learn a predictive model that classifies future data objects as accurately as possible, yet the predicted labels should be uncorrelated to a given sensitive attribute. For example, the task could be to learn a gender-neutral model that predicts whether a potential client of a bank has a high income or not. The techniques we developed for discrimination-aware classification can be divided into three categories: (1) removing the discrimination directly from the historical dataset before an off-the-shelf classification technique is applied; (2) changing the learning procedures themselves by restricting the search space to non-discriminatory models; and (3) adjusting the discriminatory models, learnt by off-the-shelf classifiers on discriminatory historical data, in a post-processing phase. Experiments show that even with such a strong constraint as discrimination-freeness, still very accurate models can be learnt. In particular,we study a case of income prediction,where the available historical data exhibits a wage gap between the genders. Due to legal restrictions, however, our predictions should be gender-neutral. The discrimination-aware techniques succeed in significantly reducing gender discrimination without impairing too much the accuracy.
Faisal Kamiran, Toon Calders, Mykola Pechenizkiy

Direct and Indirect Discrimination Prevention Methods

Along with privacy, discrimination is a very important issue when considering the legal and ethical aspects of data mining. It is more than obvious that most people do not want to be discriminated because of their gender, religion, nationality, age and so on, especially when those attributes are used for making decisions about them like giving them a job, loan, insurance, etc. Discovering such potential biases and eliminating them from the training data without harming their decision-making utility is therefore highly desirable. For this reason, anti-discrimination techniques including discrimination discovery and prevention have been introduced in data mining. Discrimination prevention consists of inducing patterns that do not lead to discriminatory decisions even if the original training datasets are inherently biased. In this chapter, by focusing on the discrimination prevention, we present a taxonomy for classifying and examining discrimination prevention methods. Then, we introduce a group of pre-processing discrimination prevention methods and specify the different features of each approach and how these approaches deal with direct or indirect discrimination. A presentation of metrics used to evaluate the performance of those approaches is also given. Finally, we conclude our study by enumerating interesting future directions in this research body.
Sara Hajian, Josep Domingo-Ferrer

Introducing Positive Discrimination in Predictive Models

In this chapter we give three solutions for the discrimination-aware classification problem that are based upon Bayesian classifiers. These classifiers model the complete probability distribution by making strong independence assumptions. First we discuss the necessity of having discrimination-free classification for probabilistic models. Then we will show three ways to adapt a Naive Bayes classifier in order to make it discrimination-free. The first technique is based upon setting different thresholds for the different communities. The second technique will learn two different models for both communities, while the third model describes how we can incorporate our belief of how discrimination was added to the decisions in the training data as a latent variable. By explicitly modeling the discrimination, we can reverse engineer decisions. Since all three models can be seen as ways to introduce positive discrimination, we end the chapter with a reflection on positive discrimination.
Sicco Verwer, Toon Calders

Solutions in Law, Norms and the Market


From Data Minimization to Data Minimummization

Data mining and profiling offer great opportunities, but also involve risks related to privacy and discrimination. Both problems are often addressed by implementing data minimization principles, which entail restrictions on gathering, processing and using data. Although data minimization can sometimes help to minimize the scale of damage that may take place in relation to privacy and discrimination, for example when a data leak occurs or when data are being misused, it has several disadvantages as well. Firstly, the dataset loses a rather large part of its value when personal and sensitive data are filtered from it. Secondly, by deleting these data, the context in which the data were gathered and had a certain meaning is lost. This chapter will argue that this loss of contextuality, which is inherent to data mining as such but is aggravated by the use of data minimization principles, gives rise to or aggravates already existing privacy and discrimination problems. Thus, an opposite approach is suggested, namely that of data minimummization, which requires a minimum set of data being gathered, stored and clustered when used in practice. This chapter argues that if the data minimummization principle is not realized, this may lead to quite some inconveniences; on the other hand, if the principle is realized, new techniques can be developed that rely on the context of the data, which may provide for innovative solutions. However, this is far from a solved problem and it requires further research.
Bart van der Sloot

Quality of Information, the Right to Oblivion and Digital Reputation

The aim of this chapter is to focus on the quality of information from a legal point of view. The road map will be as follows: the paper will begin by clarifying the definition of quality of information from a legal point of view; it will then move on to draw a link between the quality of information and fundamental rights with particular reference to digital reputation; and finally it will introduce the time dimension and the right to oblivion.
The analysis conducted here will be a scholarly reflection based both on the European Directive and the Italian Law. It introduces an original perspective concerning three different topics: quality of information, right to oblivion and digital reputation.
Giusella Finocchiaro, Annarita Ricci

Transparency in Data Mining: From Theory to Practice

A broad variety of governmental initiatives are striving to use advanced computerized processes to predict human behavior. This is especially true when the behavioral trends sought generate substantial risks or are difficult to enforce. Data mining applications are the technological tools which make governmental prediction possible. The growing use of predictive practices premised upon the analysis of personal information and powered by data mining, has generated a flurry of negative reactions and responses. A central concern often voiced in this context is the lack of transparency these processes entail. Although echoed across the policy, legal and academic debate, the nature of transparency in this context is unclear and calls for a rigorous analysis. Transparency might pertain to different segments of the data mining and prediction process. This chapter makes initial steps in illuminating the true meaning of transparency in this specific context and provides tools for further examining this issue.
This chapter begins by briefly describing and explaining the practices of data mining, when used to predict future human conduct on the basis of previously collected personal information. It then moves to address the flow of information generated in the prediction process. In doing so, it introduces a helpful taxonomy regarding four distinct segments within the prediction process. Each segment presents unique transparency-related challenges. Thereafter, the chapter provides a brief theoretical analysis seeking the foundations for transparency requirements. The analysis addresses transparency as a tool to enhance government efficiency, facilitate crowdsourcing and promote autonomy. Finally, the chapter concludes by bringing the findings of the two previous sections together. It explains at which contexts the arguments for transparency are strongest, and draws out the implications of these conclusions.
Tal Zarsky

Data Mining as Search: Theoretical Insights and Policy Responses

Data mining has captured the imagination as a tool which could potentially close the intelligence gap constantly deepening between governments and their new targets – terrorists and sophisticated criminals. It should therefore come as no surprise that data mining initiatives are popping up throughout the regulatory framework. The visceral feeling of many in response to the growing use of governmental data mining of personal data is that such practices are extremely problematic. Yet, framing the notions behind the visceral response in the form of legal theory is a difficult task.
This chapter strives to advance the theoretical discussion regarding the proper understanding of the problems data mining practices generate. It does so within the confines of privacy law and interests, which many sense are utterly compromised by the governmental data mining practices. Within this broader theoretical realm, the chapter focuses on examining the relevance of a related legal paradigm in privacy law – that of governmental searches. Data mining, the chapter explains, compromises some of same interests compromised by illegal governmental searches. Yet it does so in a unique and novel way. This chapter introduces three analytical paths for extending the well accepted notion of illegal searches into this novel setting. It also points to the important intricacies every path entails and the difficulties of applying the notion of search to this novel setting. Finally, the chapter briefly explains the policy implications of every theory. Indeed, the manner in which data mining practices are conceptualized directly effects the possible solutions which might be set in place to limit related concerns.
Tal Zarsky

Concise Conclusions


The Way Forward

The growing use of data mining practices by both government and commercial entities leads to both great promises and challenges. They hold the promise of facilitating an information environment which is fair, accurate and efficient. At the same time, they might lead to practices which are both invasive and discriminatory, yet in ways the law has yet to grasp. This point is demonstrated by showing how the common measures for mitigating privacy concerns, such as a priori limiting measures (particularly access controls, anonymity and purpose specification) are mechanisms that are increasingly failing solutions against privacy and discrimination issues in this novel context.
Instead, a focus on (a posteriori) accountability and transparency may be more useful. This requires improved detection of discrimination and privacy violations as well as designing and implementing techniques that are discrimination-free and privacy-preserving. This requires further (technological) research.
But even with further technological research, there may be new situations and new mechanisms through which privacy violations or discrimination may take place. Novel predictive models can prove to be no more than sophisticated tools to mask the “classic” forms of discrimination, by hiding discrimination behind new proxies. Also, discrimination might be transferred to new forms of population segments, dispersed throughout society and only connected by some attributes they have in common. Such groups will lack political force to defend their interests. They might not even know what is happening.
With regard to privacy, the adequacy of the envisaged European legal framework is discussed in the light of data mining and profiling. The European Union is currently revising the data protection legislation. The question whether these new proposals will adequately address the issues raised in this book is dealt with.
Bart Custers, Toon Calders, Tal Zarsky, Bart Schermer


Weitere Informationen

Premium Partner

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.



Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung

Unternehmen haben das Innovationspotenzial der eigenen Mitarbeiter auch außerhalb der F&E-Abteilung erkannt. Viele Initiativen zur Partizipation scheitern in der Praxis jedoch häufig. Lesen Sie hier  - basierend auf einer qualitativ-explorativen Expertenstudie - mehr über die wesentlichen Problemfelder der mitarbeiterzentrierten Produktentwicklung und profitieren Sie von konkreten Handlungsempfehlungen aus der Praxis.
Jetzt gratis downloaden!