Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
ISO/IEC 27554:2022 establishes a comprehensive framework for de-identification of personally identifiable information (PII). In an era of big data analytics, artificial intelligence, and open data sharing, organizations must balance the utility of data against the privacy rights of individuals. De-identification — the process of removing or modifying PII to reduce the risk of re-identification — is a cornerstone of privacy-preserving data practices. This standard provides a structured methodology covering the entire de-identification lifecycle: policy establishment, risk assessment, technique selection, implementation, re-identification attack testing, and ongoing governance. Unlike earlier guidance documents that focused narrowly on technical anonymization techniques, ISO/IEC 27554 takes a holistic approach that encompasses organizational policies, legal compliance, and technical controls as interconnected elements of a robust de-identification program.
The standard provides detailed technical specifications for a spectrum of de-identification techniques, categorized by their strength and reversibility. Pseudonymization replaces direct identifiers (names, email addresses, national IDs) with pseudonyms, but the mapping may be retained, making it reversible under controlled conditions. Anonymization transforms data irreversibly so that individuals cannot be identified by the data custodian or any third party. Specific techniques covered include generalization (replacing precise values with broader categories), suppression (removing identifying values entirely), perturbation (adding controlled noise), k-anonymity (ensuring each record is indistinguishable from at least k-1 others), l-diversity (ensuring sensitive attribute diversity within anonymized groups), t-closeness (ensuring distribution of sensitive attributes in anonymized groups mirrors the overall distribution), and differential privacy (adding calibrated noise to query results).
| Technique | Privacy Level | Data Utility | Reversibility | Typical Use Case |
|---|---|---|---|---|
| Pseudonymization | Low-Medium | High | Reversible (with mapping) | Clinical trial data, user analytics |
| Generalization | Medium | Medium-High | Irreversible | Census data, epidemiological studies |
| k-Anonymity (k=5) | Medium | Medium | Irreversible | Health record publishing |
| l-Diversity | Medium-High | Medium | Irreversible | Medical data with sensitive diagnoses |
| Differential Privacy (ε=1) | High | Low-Medium | Irreversible | Statistical databases, ML training |
| Perturbation | Medium-High | Medium | Irreversible | Survey microdata, mobility traces |
ISO/IEC 27554 emphasizes that de-identification is not merely a technical operation but requires ongoing governance. The standard mandates: (1) a de-identification policy approved at the executive level, defining roles, responsibilities, and escalation procedures; (2) a re-identification risk assessment conducted before any data release, considering the data environment (public release, trusted researcher access, internal use), the availability of auxiliary data, and the motivation and capability of potential attackers; (3) re-identification attack testing using both known attack methodologies (linkage attacks, differencing attacks, reconstruction attacks) and adversarial testing tailored to the specific dataset; (4) a data disclosure review board that approves or rejects data release requests based on the residual re-identification risk; and (5) periodic re-assessment as new data sources become publicly available or new re-identification techniques emerge. The standard provides a quantitative risk scoring methodology that balances the probability of re-identification against the potential harm to affected individuals, enabling organizations to define objective risk thresholds for different data sharing scenarios.