No distinction in data protection law is more frequently misapplied in practice than the one between pseudonymisation and anonymisation. Treating them as interchangeable is a compliance error with serious consequences: truly anonymised data falls entirely outside the scope of the GDPR, while pseudonymised data remains personal data subject to all GDPR obligations.
Organisations that classify pseudonymised data as anonymous and treat it accordingly are operating as if GDPR does not apply to data that it very clearly does.
As of 2024, multiple European supervisory authorities have issued enforcement actions against organisations that claimed their data was anonymised, even though re-identification remained technically possible.
The standard for genuine anonymisation is significantly higher than most organisations appreciate. This guide sets out the legal definitions, the technical criteria, and the practical consequences of getting the distinction right or wrong.

Pseudonymisation under GDPR is the process of replacing direct identifiers in a dataset with artificial codes or references, such that the data can no longer be attributed to a specific individual without access to a separately held “key” or mapping table. Pseudonymised data is still personal data under GDPR. All GDPR obligations continue to apply. Pseudonymisation is a security measure, not an escape from the regulation.
GDPR Article 4(5) defines pseudonymisation as: “the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.”
Three elements are critical in this definition:
1. Direct identifiers are replaced. Names, email addresses, national IDs, and other obvious identifiers are replaced with codes, tokens, or reference numbers.
2. A “key” exists. The mapping between the original identifiers and the artificial references is maintained. Re-identification is possible for anyone who has access to this key.
3. The key is held separately and secured. The pseudonymisation is only effective if the key and the pseudonymised data are stored separately and the key is adequately protected.
Because re-identification remains possible, pseudonymised data is personal data under GDPR Article 4(1). All six data protection principles under Article 5 apply. Data subjects retain all their rights under Articles 15-22. The controller must maintain a lawful basis under Article 6 and (where applicable) satisfy Article 9 conditions for special category data.
What pseudonymisation does provide:
• It is recognised in GDPR Article 25 as an appropriate technical measure to implement data protection by design and by default
• It is listed in GDPR Article 32 as an example of an appropriate security measure
• It reduces the risk of exposure in the event of a breach: an attacker who obtains pseudonymised data without the key cannot directly identify individuals
• It may allow certain further processing under GDPR Article 89 for archiving, research, and statistical purposes
The main pseudonymisation techniques are key-coding (replacing identifiers with reference numbers), encryption (using cryptographic algorithms with a separately held key), tokenisation (replacing sensitive values with random tokens), hash functions (producing fixed-length outputs from input data), and data masking (replacing characters within fields). Each offers different security and reversibility characteristics. The choice of technique depends on the use case and the data sensitivity.
The simplest approach. A lookup table maps original identifiers (names, email addresses, national IDs) to assigned codes (Patient_001, User_4782). The lookup table is the key: held separately, secured, and access-controlled. This approach is widely used in clinical research, where patient identities must be separated from clinical data without losing the ability to link related records.
A cryptographic algorithm transforms identifiers into ciphertext using an encryption key. The ciphertext is unreadable without the key. AES-256 is the current standard for symmetric encryption. The encryption key must be stored separately from the encrypted data and subject to robust key management procedures.
Unlike key-coding, encryption of identifiers does not maintain the human-readable structure of the original data. A hashed or encrypted name cannot be easily looked up: re-identification requires decryption.
Widely used in payment processing. A sensitive value (such as a credit card number) is replaced with a randomly generated token. The token has no mathematical relationship to the original value: it is a reference to an entry in a secure token vault. Tokenisation is increasingly used for healthcare, identity, and financial data.
A hash function produces a fixed-length output (the “hash”) from any input. SHA-256 is a commonly used cryptographic hash function. Hashing is technically one-way: you cannot reverse a hash to obtain the original value. However, hashing is not always a safe pseudonymisation technique because many inputs are predictable. An attacker with a list of candidate values (such as all UK National Insurance numbers or all email addresses) can hash each candidate and compare the result to identify matches. This is known as a rainbow table attack. For pseudonymisation purposes, hash functions should be salted (a random value added to the input before hashing) to resist this attack.
Replaces specific characters in a field with placeholder characters. For example, an email address “[email protected]” might be masked as “jn.s**@c*****.com”. Masking is useful for development and testing environments where realistic data structures are needed, but actual personal data must not be exposed.
What is data anonymisation, and how does it differ from pseudonymisation?
Anonymisation under GDPR is the process of permanently and irreversibly modifying personal data so that no individual can be identified from the resulting dataset, directly or indirectly, by any means reasonably likely to be used. Truly anonymised data is no longer personal data and falls entirely outside the scope of GDPR. But the bar for genuine anonymisation is extremely high, and many datasets that organisations classify as anonymous are not.
GDPR Recital 26 establishes the standard: “The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.”
The key distinction from pseudonymisation is irreversibility. For anonymisation to be effective and for GDPR to cease applying:
• The process must be permanent. There is no key that could reverse it.
• The data controller itself must not be able to re-identify individuals.
• Re-identification must not be possible through the combination with other reasonably available data sources.
The Article 29 Working Party (now the EDPB) Opinion 05/2014 on Anonymisation Techniques sets out three criteria that anonymisation must satisfy:
1. Singling out: Is it still possible to isolate some or all records which identify an individual? If any record in the dataset can be linked to a single person, the data is not anonymous.
2. Linkability: Is it possible to link at least two records concerning the same data subject, whether in the same database or in two different databases? If records can be linked across datasets to reconstruct a profile, the data is not anonymous.
3. Inference: Is it possible to deduce, with significant probability, the value of an attribute from the values of other attributes? If the dataset allows inferences about specific individuals, it is not truly anonymous.
The re-identification challenge in practice: Research published in Nature Communications (2019) demonstrated that 99.98% of individuals in an “anonymised” mobility dataset could be correctly re-identified using just four spatio-temporal data points. A 2000 study by Harvard researcher Latanya Sweeney demonstrated that 87% of the US population could be uniquely identified using only postcode, date of birth, and gender. These findings illustrate why regulators treat anonymisation claims with significant scepticism.
Common anonymisation techniques include k-anonymity, l-diversity, t-closeness, generalisation, data suppression, and noise addition. No single technique guarantees perfect anonymisation, and each involves a trade-off between privacy protection and data utility. The EDPB and the ICO both recommend combining multiple techniques and applying the three-test framework to verify effectiveness.
A dataset satisfies k-anonymity if every record in the dataset is indistinguishable from at least k-1 other records with respect to a defined set of quasi-identifiers (such as age, postcode, gender). The higher the value of k, the harder it is to single out an individual.
Limitation: K-anonymity does not protect against attribute disclosure. If all records in a group of k individuals share the same sensitive attribute (for example, the same medical diagnosis), then membership in the group reveals that attribute.
Extends k-anonymity by requiring that each equivalence class (group of k records) contains at least l distinct values for each sensitive attribute. This prevents homogeneity attacks.
Further extends l-diversity by requiring that the distribution of sensitive attributes within each equivalence class closely matches that of the overall dataset. This prevents inference attacks based on skewed group-level distributions.
Replaces specific values with less precise ones. An exact age (34) becomes a range (30-39). A full postcode becomes a partial postcode or geographic region. A precise timestamp becomes a month and year.
Limitation: Generalisation reduces data utility and may not always prevent singling out, particularly for individuals with unusual combinations of attributes.
Removes records or fields that pose excessive re-identification risk. For example, records from very small demographic groups may be suppressed entirely if their rarity makes them identifiable even after generalisation.
Statistical noise is added to numerical data, producing plausible but imprecise values. Useful for aggregate analytics. Individual-level data remains obscured while population-level statistics remain valid.
Choose pseudonymisation when you need to maintain the ability to link data back to individuals in the future (for research follow-up, audit, or accountability purposes), or when complete anonymisation would destroy the data’s utility. Choose anonymisation when sharing data publicly, transferring it to third parties without ongoing relationships, or publishing statistical reports where individual identification is not needed, and the anonymisation can be verified to meet the EDPB’s three-test standard.
• Internal analytics require the ability to link related records over time (cohort studies, longitudinal research)
• Software development or testing environments need realistic data patterns, but must not expose real personal data
• Business processes require accountability (knowing which employee performed an action), but direct identification in analytics is unnecessary
• GDPR Article 89 research or statistical purposes require data that retains linkability while reducing direct identification risk
• Publishing data publicly, including in reports, academic research, or open data initiatives
• Transferring data to third parties where no ongoing relationship with the individual is required
• Long-term archival storage where the original purposes of the collection have been met, and re-identification is no longer needed
• Regulatory requirements mandate that personal data not leave a jurisdiction, but aggregate insights need to be shared cross-border
Understanding the fundamental distinctions between these two approaches to privacy protection is essential for making informed decisions about data processing strategies.
| Aspect | Pseudonymisation | Anonymisation |
| Reversibility | Reversible with additional information | Irreversible by design |
| GDPR Status | Remains personal data, full compliance required | Not personal data if properly implemented |
| Data Subject Rights | All GDPR rights apply | No rights apply to truly anonymous data |
| Re-identification Risk | Possible with access to mapping/keys | Theoretically impossible |
| Data Utility | High – maintains data relationships | Variable – may reduce analytical value |
| Implementation Complexity | Moderate – requires secure key management | High verification of anonymisation effectiveness |
| Ongoing Obligations | Continuous compliance monitoring is required | Minimal once anonymisation is verified |
If we remove a person’s name from a dataset, is it anonymised? Almost certainly not. Removing a name alone while retaining other attributes (date of birth, postcode, employer, medical condition) typically leaves the individual identifiable through a combination of these attributes. The EDPB’s three tests must be satisfied for the full dataset before anonymisation can be claimed.
Does pseudonymised data require a GDPR Article 6 lawful basis? Yes. Pseudonymised data is personal data under GDPR and requires a lawful basis for processing under Article 6 (and Article 9 if it is special category data). Pseudonymisation is a security measure, not a substitute for a lawful basis.
Can we transfer pseudonymised data to a third country outside the EU without safeguards? No. Pseudonymised data is personal data. International transfers of pseudonymised data are subject to the transfer restrictions in Chapter V of the GDPR and require an adequacy decision, Standard Contractual Clauses (SCCs), Binding Corporate Rules, or another appropriate transfer mechanism.
Is hashed data pseudonymised or anonymised? It depends on the hash function used and the nature of the input. An unsalted hash of a predictable input (like a national ID number) can be reversed by an attacker with a list of candidate values. A salted cryptographic hash of a genuinely unpredictable value offers stronger protection. In most practical cases, hashed data where the original values are known to the data controller is pseudonymised, not anonymised.
Can a data breach notification exemption apply to pseudonymised data? Under GDPR Article 34(3)(a), notification to affected individuals may not be required if the controller has implemented appropriate technical protection measures (such as encryption or pseudonymisation) that render the data unintelligible to unauthorised persons. However, the controller must still notify the supervisory authority within 72 hours under Article 33, unless the breach is unlikely to result in a risk to rights and freedoms. Notification to individuals may still be required, even for pseudonymised data, if the re-identification risk is assessed as high.
What is “differential privacy” and is it an anonymisation technique? Differential privacy is a mathematical framework that adds calibrated random noise to query results, ensuring that the presence or absence of any single individual in a dataset has a negligible effect on the output. It provides a provable privacy guarantee and is increasingly used by large technology companies and statistical agencies. It is considered a strong anonymisation-adjacent technique, though its practical implementation requires specialist expertise.
For expert advice on implementing pseudonymisation or assessing your anonymisation approach, contact our team.
About the Author
Ana Mishova
Sales and Business Development Consultant — GDPRLocal
Ana focuses on helping organisations understand their compliance obligations and find the right data protection solutions. At GDPRLocal she works closely with businesses of all sizes, making GDPR and privacy compliance clear, practical, and accessible.