2011 | OriginalPaper | Buchkapitel
On the Complexity of the l-diversity Problem
verfasst von : Riccardo Dondi, Giancarlo Mauri, Italo Zoppis
Erschienen in: Mathematical Foundations of Computer Science 2011
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
The problem of publishing personal data without giving up privacy is becoming increasingly important. Different interesting formalizations have been recently proposed in this context, i.e.
k-anonymity
[17,18] and
l-diversity
[12]. These approaches require that the rows in a table are clustered in sets satisfying some constraint, in order to prevent the identification of the individuals the rows belong to. In this paper we focus on the
l-diversity
problem, where the possible attributes are distinguished in
sensible
attributes and
quasi-identifier
attributes. The goal is to partition the set of rows, where for each set
C
of the partition it is required that the number of rows having a specific value in the sensible attribute is at most
$\frac{1}{l}$
|
C
|.
We investigate the approximation and parameterized complexity of
l-diversity
. Concerning the approximation complexity, we prove the following results: (1) the problem is not approximable within factor
c
ln
l
, for some constant
c
> 0, even if the input table consists of two columns; (ii) the problem is APX-hard, even if
l
= 4 and the input table contains exactly 3 columns; (iii) the problem admits an approximation algorithm of factor
m
(where
m
+ 1 is the number of columns in the input table), when the sensitive attribute ranges over an alphabet of constant size. Concerning the parameterized complexity, we prove the following results: (i) the problem is W[1]-hard even if parameterized by the size of the solution,
l
, and the size of the alphabet; (ii) the problem admits a fixed-parameter algorithm when both the maximum number of different values in a column and the number of columns are parameters.