Introduction
Related works
Clinical research data and de-identification
International de-identification methods
De-identification steps and security requirements
Type | 1. Classification | 2. Processing of DID | 3. Processing of QID | 4. Processing of SA (optional) |
---|---|---|---|---|
Structured data | ||||
Methods | Manual (human) Heuristic Artificial Intelligence | Generalization, Randomization, Elimination | ||
Threats | Mis-classification | Misprocess, Single out, Likability, Inference | Homogeneity attack, Background knowledge attack | Similarity attack, Skewness attack |
Unstructured data | ||||
Methods | Manual (human) Heuristic Artificial Intelligent | Cryptography, Replacement, Elimination | ||
Threats | Mis-classification | Misprocess, Single out, Likability, Inference |
De-identification framework for the utilization of various clinical research data
Architecture
Service scenario
Data optimization and classification
Structured data | Unstructured data |
---|---|
//S0. Data Optimization OptimizeID(inputdata){ Auth = Authentication(irbapproval, projectid) if(Auth == CORRECT){//equal? ChkResult = CheckValidation(inputdata) if(ChkResult == ERROR) exit else ClassifyID(ChkResult) } exit } | //U0. File Optimization SelectOptiFILE(){ Auth = Authentication(irbapproval, projectid) if(Auth == CORRECT){//equal? SelectedFile = FileSelection(inputfile) ChkFile = CheckValidation(SelectedFile) if(ChkFile == ERROR) exit else ClassifyFILE(ChkFile) } exit } |
//S1. Data Classification ClassifyID(ChkResult) { classifiedID[DID|QID|SA] = locator(ChkResult) if (classifiedID == DID) goto S2 else if (classifiedID == QID){ if (classifiedID == SA) { {checkSA = 1} goto S3 } else//NSA goto S2 } | //U1. File Classification ClassifyFILE(ChkFile) { ClfFile[C|U] = CheckChangeable(ChkFile) if(ClfFile == C)//Changeable File goto U2 else goto U2//Unchangeable File } |
Processing of structured data (DID and QID)
//S2. Processing of identifier HandleID(id, rule){ setRule = basicRule CheckedRisk = CheckRisk(value1) PermittedRisk = SetRisk(value2) //The Rule, ChkRisk, PermittedRisk can be customized. For column = 1 to Number of columns{ if(CheckedRisk < PermittedRisk){ goto S3 } if(classifiedID == DID){ returnedId = DeidentificationID(id, setRule) //Cryptography, Replacement, Elimination CheckedRisk = CheckRisk(returnedId) //Risk evaluation (by each column) } Next column } goto S3 } | //S3. Processing of quasi-identifier HandleQID(qid, level){ setLevel = basicLevel CheckedRisk = CheckRisk(value1) PermittedRisk = SetRisk(value2) //The Level, ChkRisk, PermittedRisk can be customized. For column = 1 to Number of columns{ if(CheckedRisk < PermittedRisk){ goto S4 } if(classifiedID == QID){ returnedQid = DeidentificationQID(qid, setLevel) //Suppression, Generalization, Perturbation, CheckedRisk = CheckRisk(returnedQid) //Risk evaluation (by each column) } Next column } goto S4 } |
Processing of unstructured data
//U2. File processing HandleFile(inputfile, Rule){ filetype = CheckType(inputfile) if (filetype == C){//Changeable File duplicatedFile = CopyFile(inputfile) OriginalFileEncryption(inputfile, key, currentTime) result1 = DeidentificationFile(duplicatedFile, rule) //Cryptography, Replacement, Elimination } else {//Unchangeable File UnchangeableFileEncryption(inputfile, key, currentTime) } goto U3 } |
Adequacy test
//S4, U3. Adequacy test Adequacy(S3|U3){ CheckedClassification = CheckClassification(){ if(CheckedClassification == INVALID){ goto S1 | U1//Re-classification } } CheckedDeidentification = CheckDeidentification(){ if(CheckedDeidentification == INVALID){ if(classifiedID == DID){ goto S2 } else if(classifiedID == QID) { goto S3 } else { goto U2 } } } Call Dataprovider() } |