Identifiable data includes any information that can directly or indirectly identify specific individuals, organisations, or entities. This includes everything from obvious direct identifiers, such as names and social security numbers, to indirect identifiers, such as IP addresses and location data, that can uniquely identify a natural person when combined with other data points. Under the General Data Protection Regulation and similar frameworks, processing personal data without proper safeguards can result in severe penalties.
Understanding how to recognise and protect identifiable data is crucial for GDPR, CCPA, and broader data protection compliance across global jurisdictions. Protecting personal data is a core compliance requirement, guaranteeing that organisations keep information from unauthorised access and misuse.
What This Guide Covers
This guide will help you learn about identifiable data types, legal requirements across major frameworks, protection methods including anonymisation and pseudonymisation, and practical compliance strategies.
Whether you’re implementing a new data privacy program or ensuring ongoing compliance with existing data protection requirements, you’ll find specific guidance for recognising identifiable data and meeting regulatory obligations.
Why This Matters
Non-compliance with identifiable data regulations can result in fines of up to €20 million or 4% of annual revenue under the GDPR, with similar penalties in other jurisdictions. Beyond financial risks, mishandling identifiable data can lead to data breaches, loss of customer trust, and operational disruptions that impact business continuity.
What You’ll Learn:
• Recognise identifiable data types across direct and indirect identifier categories
• Understand legal obligations under GDPR, CCPA, and sector-specific regulations
• Implement protection measures, including classification and access controls
• Address common compliance challenges in vendor management and cross-border transfers
Identifiable data refers to any information that can identify a natural person either directly through explicit identifiers or indirectly through a combination with other data points.
An identifiable natural person is someone who can be identified, directly or indirectly, through one or more factors specific to their physical, physiological, genetic, mental, economic, cultural, or social identity. This definition has expanded significantly with advances in data processing capabilities and automated means of analysis.
The concept matters for privacy protection because modern data processing can create identification risks even from seemingly anonymised data. Anonymised data is intended to prevent identification, but perfect anonymisation is challenging, and there remains a risk that individuals could be re-identified if sufficient linkable information is available. When data subjects can be re-identified through linkable information or sophisticated analytical techniques, privacy protections must apply regardless of whether identification was intentional.
Direct identifiers are information that can uniquely identify individuals without additional data points. These include full names, identification numbers (such as Social Security numbers or other unique identifiers), passport numbers, email addresses, telephone numbers, and biometric records such as fingerprints or facial recognition data.
Under most data protection frameworks, direct identifiers constitute personal data that triggers the highest level of protection requirements. This connects to the main concept because they represent the most obvious form of identifiable data that data controllers must recognise and protect.
Indirect identifiers include ZIP codes, birth dates, IP addresses, device IDs, and cookie identifiers, as well as employment information, which require combination with other data points to achieve identification. Individuals can be indirectly identified when combined with other information, enabling them to be recognised or traced. These quasi-identifiers become particularly important when analysing de-identified datasets, where seemingly anonymous information, when combined with indirect identifiers, can still uniquely identify individuals.
Building on direct identifiers, indirect identifiers demonstrate how identification risks extend beyond obvious personal data. Research has shown that 87% of the US population can be uniquely identified through just three data points: gender, ZIP code, and birth date.
Different regulatory frameworks establish specific categorisation systems for identifiable data, each with distinct protection requirements and compliance obligations, focusing on keeping the rights of the identifiable person whose data is being processed.
The General Data Protection Regulation defines personal data as any information relating to an identified or identifiable natural person under Article 4(1). This expansive definition covers names, identification numbers, location data, online identifiers, and genetic data, as well as any information that can be linked to a particular person through automated means or other circumstances.
Special categories of sensitive personal data require additional protection under GDPR Article 9, including health data, biometric data for unique identification, political opinions, trade union membership, and sexual orientation. Processing such data generally requires explicit consent or other specific legal grounds, with enhanced security measures. A data protection impact assessment is often needed when processing these special categories to identify and reduce potential risks to individuals’ privacy and personal data security.
The National Institute of Standards and Technology defines PII as any information maintained by an agency about an individual, including direct identifiers like Social Security numbers, driver’s license numbers, and financial account numbers, plus any linkable information that could be traced back to a specific person.
Unlike GDPR, US PII standards typically focus on narrower categories of information but maintain a similar protective intent. State laws like the California Consumer Privacy Act have expanded these definitions to more closely align with international standards, creating a complex compliance setup for organisations operating across multiple jurisdictions.
HIPAA establishes specific protections for health information combined with personal identifiers, creating the category of Protected Health Information. This includes medical record numbers, health plan beneficiary numbers, biometric identifiers used for healthcare purposes, and any health information that can be linked to individual patients.
The HIPAA Safe Harbour rule identifies 18 specific identifiers that must be removed to consider health data de-identified, including names, geographic subdivisions smaller than state level, dates directly related to individuals, and Internet Protocol addresses. De-identification is the process of removing or masking these identifiers to ensure the data is no longer identifiable. Once data has been de-identified in accordance with the Safe Harbour rule, it is no longer identifiable under HIPAA. Another approach is the use of pseudonymised data, where personal identifiers are replaced with codes, allowing some re-linking under controlled conditions while still protecting individual privacy. PHI requires the highest level of protection due to its sensitivity and potential for discrimination.
Key Points:
• GDPR applies to any information relating to identifiable natural persons, with special protections for sensitive data.
• US PII focuses on government-maintained information, but state laws are expanding the scope.
• PHI combines health information with personal identifiers, requiring specialised protection measures.
Effective compliance requires systematic approaches to data classification and protection that address the specific requirements of applicable regulatory frameworks. Managing and restricting data access, especially to identifiable data, is vital to guarantee that only authorised individuals can view or handle sensitive information. Sharing identifiable data with other organisations is permitted only in limited circumstances where strict legal, regulatory, and ethical standards are met.
When to use this: Before collecting, processing, or storing any data containing potential identifiers, and as part of ongoing data governance programs.
1. Inventory all data elements: Document every data field collected, processed, or stored, including your own data, and classify it appropriately. This includes data from third parties, vendors, and automated systems that may contain hidden identifiers.
2. Identify direct identifiers: Use regulatory checklists under GDPR Article 4, NIST SP 800-122, and applicable sector-specific standards to identify obvious personal identifiers that require immediate protection systematically.
3. Assess indirect identifier combinations: Evaluate how seemingly anonymous data points could combine to identify individuals, considering both internal data linkages and potential external data sources that could enable re-identification.
4. Apply the appropriate legal framework: Determine which regulations apply based on the data subject’s location, organisational jurisdiction, and data type, recognising that multiple frameworks may apply simultaneously to the same dataset.
| Feature | Anonymisation | Pseudonymisation |
| Legal Status | No longer considered personal data under GDPR | Still constitutes personal data requiring full compliance |
| Reversibility | Irreversible identification removal, but perfect anonymisation is challenging, and there is always a risk that anonymised data could be re-identified | Reversible with additional information held separately; pseudonymised data can sometimes be re-identified if additional information is available |
| Protection Level | Standard data security measures | Full GDPR protections, including data subject rights |
| Use Cases | Research datasets, public reporting | Internal analytics, secure processing environments |
• Anonymisation eliminates identification risks by creating anonymised data from which individuals cannot be identified, but perfect anonymisation is challenging to achieve, and there is always a risk that someone could re-identify the data.
• Pseudonymisation maintains analytical value while requiring ongoing compliance obligations, but pseudonymised data can sometimes be re-identified if additional information is accessible.
Choose anonymisation for public data sharing and research publication, but use pseudonymisation for internal processing where re-identification capabilities may be needed for legitimate business purposes.
Understanding theoretical requirements differs significantly from implementing adequate data protection in organisational environments with diverse data sources and processing activities. Even when handling non-sensitive data, organisations should remain aware of data concerns, as potential misuse or exposure can still have implications for privacy and security.
When discussing the risks of mishandling data, it is vital to emphasise the importance of protecting a person’s identity to prevent harm, such as data breaches or identity theft, and to maintain trust and compliance.
Solution: Implement automated scanning tools to detect potential identifiers across structured and unstructured data, while maintaining comprehensive identifier registries that document known identification risks and their combinations.
Research demonstrates that 87% of the US population can be uniquely identified using combinations of just gender, ZIP code, and birth date, highlighting how seemingly innocuous data can become highly identifying when aggregated.
Solution: Establish Standard Contractual Clauses (SCCs) for international transfers and implement adequacy decision frameworks that address specific jurisdiction requirements, particularly in light of the European Court of Justice’s decisions invalidating previous transfer mechanisms.
Post-Schrems II requirements mandate transfer impact assessments that evaluate the legal environment in destination countries and additional safeguards needed to protect European Union data subjects when processing occurs in other countries.
Solution: Implement comprehensive Data Processing Agreements (DPAs) that clearly define responsibilities for protecting personal data, including the data of other users when sharing with vendors or third parties. Combine this with regular vendor audits that verify ongoing compliance with data protection requirements and security measures. When requesting data removal from data brokers, be aware that existing ones may not always honour such requests, complicating compliance efforts.
Under GDPR Article 28, data controllers remain fully liable for processor compliance failures, making vendor management a critical component of overall data protection strategies rather than a transferable risk.
Recognising identifiable data as the fundamental building block of adequate privacy. Compliance requires ongoing vigilance, as new data sources and processing technologies create evolving identification risks. Safeguarding an individual’s identity is vital to prevent misuse of personal information and to ensure compliance with privacy laws. Success depends on systematic approaches that address both obvious direct identifiers and subtle combinations of indirect identifiers that can uniquely identify data subjects.
1. What is identifiable data, and why is it essential to protect it?
Identifiable data includes any information that can directly or indirectly identify a natural person, such as names, identification numbers, IP addresses, or location data. Protecting this data is vital because mishandling it can lead to data breaches, identity theft, and significant legal penalties under regulations like GDPR and CCPA. Proper protection safeguards individuals’ privacy and helps organisations maintain compliance and trust.
2. What is the difference between anonymised data and pseudonymised data?
Anonymised data is processed in a way that prevents individuals from being identified by any means, thereby exempting it from GDPR requirements. However, perfect anonymisation is challenging to achieve. Pseudonymised data replaces direct identifiers with artificial codes, allowing some re-identification under controlled conditions. While pseudonymised data still qualifies as personal data under GDPR and requires full compliance, it balances privacy with analytical utility.
3. How can organisations ensure compliance when sharing identifiable data with third parties?
Organisations must implement strict data protection agreements, such as Data Processing Agreements (DPAs), clearly defining responsibilities and security measures for third parties. They should conduct regular audits to verify compliance and guarantee data minimisation principles are followed. Sharing identifiable data is permitted only under limited circumstances, with appropriate legal, regulatory, and ethical safeguards in place.