Association Rule Hiding for Data Mining

Authors: Aris Gkoulalas-Divanis, Vassilios S. Verykios

Publisher: Springer US

Book Series : Advances in Database Systems

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

Privacy and security risks arising from the application of different data mining techniques to large institutional data repositories have been solely investigated by a new research domain, the so-called privacy preserving data mining. Association rule hiding is a new technique in data mining, which studies the problem of hiding sensitive association rules from within the data.

Association Rule Hiding for Data Mining addresses the problem of "hiding" sensitive association rules, and introduces a number of heuristic solutions. Exact solutions of increased time complexity that have been proposed recently are presented, as well as a number of computationally efficient (parallel) approaches that alleviate time complexity problems, along with a thorough discussion regarding closely related problems (inverse frequent item set mining, data reconstruction approaches, etc.). Unsolved problems, future directions and specific examples are provided throughout this book to help the reader study, assimilate and appreciate the important aspects of this challenging problem.

Association Rule Hiding for Data Mining is designed for researchers, professors and advanced-level students in computer science studying privacy preserving data mining, association rule mining, and data mining. This book is also suitable for practitioners working in this industry.

Frontmatter

Fundamental Concepts

Frontmatter

Chapter 1. Introduction

Abstract

The significant advances in data collection and data storage technologies have provided the means for the inexpensive storage of enormous amounts of transactional data in data warehouses that reside in companies and public sector organizations. Apart from the benefit of using this data per se (e.g., for keeping up to date profiles of the customers and their purchases, maintaining a list of the available products, their quantities and price, etc), the mining of these datasets with the existing data mining tools can reveal invaluable knowledge that was unknown to the data holder beforehand. The extracted knowledge patterns can provide insight to the data holders as well as be invaluable in important tasks, such as decision making and strategic planning. Moreover, companies are often willing to collaborate with other entities who conduct similar business, towards the mutual benefit of their businesses. Significant knowledge patterns can be derived and shared among the partners through the collaborative mining of their datasets. Furthermore, public sector organizations and civilian federal agencies usually have to share a portion of their collected data or knowledge with other organizations having a similar purpose, or even make this data and/or knowledge public in order to comply with certain regulations. For example, in the United States, the National Institutes of Health (NIH) [2] endorses research that leads to significant findings which improve human health and provides a set of guidelines which sanction the timely dissemination of NIH-supported research findings for use by other researchers. At the same time, the NIH acknowledges the need to maintain privacy standards and, thus, requires NIH-sponsored investigators to disclose data collected or studied in a manner that is “free of identifiers that could lead to deductive disclosure of the identity of individual subjects” [2] and deposit it to the Database of Genotype and Phenotype (dbGaP) [45] for broad dissemination.