The Legal Definition: Article 4(1) GDPR
The GDPR casts a deliberately wide net. Article 4(1) defines personal data as:
"Any information relating to an identified or identifiable natural person ('data subject'); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person."
Article 4(1), Regulation (EU) 2016/679
Four elements make this definition work , and each one matters for how you classify data across your organization:
- "Any information" , no limit on format. Text, numbers, photos, audio recordings, biometric templates, metadata, and behavioral patterns all qualify.
- "Relating to" , the information must concern the individual, either by content (it describes them), purpose (it's used to evaluate them), or result (its processing impacts them).
- "Identified or identifiable" , the person doesn't need to be named. If you can single them out by combining data points, that's enough.
- "Natural person" . GDPR protects living individuals, not companies. But data about a sole trader or a named employee is personal data.
The practical consequence: if there's a reasonable possibility that anyone , not just you, but any party with access , could link data back to an individual, it's personal data under GDPR. This is why classification errors cascade through your entire compliance program.
Real-World Examples: What Counts and What Doesn't
The boundary between personal data and non-personal data is less obvious than most organizations assume. Here's how common data types classify under GDPR, based on regulatory guidance and CJEU case law:
| Data Type | Personal Data? | Why |
|---|---|---|
| Full name | Yes | Directly identifies an individual |
| Email: [email protected] | Yes | Identifies a natural person by name |
| Email: [email protected] | No | Generic address, no identifiable person |
| Dynamic IP address | Yes | CJEU Breyer ruling (C-582/14): identifiable with ISP records |
| Cookie ID / device fingerprint | Yes | Online identifier under Recital 30; singles out a user |
| Employee ID number | Yes | Identification number linked to a specific person |
| GPS location data | Yes | Tracks an individual's movements; location identifier |
| Salary data linked to role | Yes | Combined with department/role, identifies the individual |
| Anonymized survey results | It depends | Only non-personal if re-identification is not reasonably possible |
| Pseudonymized customer records | Yes | Re-identification possible with the separately held key |
| CCTV footage | Yes | Images of identifiable individuals |
| Aggregated stats (large dataset) | No | Only if individuals cannot be singled out from the aggregate |
| Genetic test results | Yes (special category) | Article 9 special category , genetic data |
| Trade union membership | Yes (special category) | Article 9 special category , explicitly listed |
The "Mosaic Effect" Trap
Many organizations classify individual data points in isolation , an employee number here, a department code there , and conclude they're not personal data. But GDPR looks at identifiability through combination. When your HR system, payroll platform, and access control logs can be cross-referenced, data that seems anonymous in one system becomes personal data in the aggregate. This is exactly the gap that cross-entity data mapping is designed to close.
Special Categories: Article 9 Data Requires Extra Protection
GDPR treats certain types of personal data as inherently high-risk. Article 9 prohibits processing these categories unless a specific legal exemption applies , and the penalties for getting it wrong are proportionally higher.
The special categories are:
- Racial or ethnic origin , includes nationality fields in HR systems if they reveal ethnicity
- Political opinions , political party donations, voter registration data
- Religious or philosophical beliefs , dietary preference fields that reveal religion (e.g., "halal" or "kosher" in catering systems)
- Trade union membership , payroll deductions for union dues
- Genetic data . DNA test results, hereditary information
- Biometric data (when used for identification) , fingerprint scans, facial recognition templates. Note: biometric data used for authentication (unlocking a phone) may not trigger Article 9 in all interpretations, but fingerprint access logs certainly do
- Health data , sick leave records, disability accommodations, occupational health assessments, insurance claims. This is the most commonly misclassified category in enterprise environments
- Sex life or sexual orientation . HR diversity monitoring fields, beneficiary designations that reveal partner gender
Where Multi-Entity Organizations Get Caught
The most common failure we see in organizations managing privacy across multiple subsidiaries: health data classification inconsistency. A sick leave record is classified as "standard HR data" in one subsidiary and "Article 9 health data" in another. When a supervisory authority audits the group, the inconsistency itself becomes evidence of inadequate governance. This is why a unified data taxonomy across all entities isn't optional . it's the foundation of defensible compliance.
Legal Bases for Processing Special Categories
Processing special category data requires both a legal basis under Article 6 and a separate exemption under Article 9(2). The most commonly relied-upon exemptions include:
- Explicit consent (Article 9(2)(a)) , must be freely given, specific, informed, and unambiguous. Implied consent is never sufficient.
- Employment law obligations (Article 9(2)(b)) , processing necessary for carrying out obligations under employment and social security law
- Vital interests (Article 9(2)(c)) , emergency medical situations where the data subject cannot consent
- Substantial public interest (Article 9(2)(g)) , must be proportionate and have safeguards
Pseudonymized vs. Anonymized Data: The Critical Distinction
This is where more compliance programs go wrong than almost anywhere else. The distinction determines whether GDPR applies at all , and the line is far less clear than most organizations assume.
Pseudonymized Data = Still Personal Data
Pseudonymization replaces direct identifiers with artificial ones (tokens, codes, hashes) while keeping the re-identification key separate. GDPR explicitly defines pseudonymization in Article 4(5) and treats pseudonymized data as personal data throughout.
Why? Because re-identification is possible. The key exists somewhere. As long as any party , you, a processor, a data recipient, or an attacker with reasonable effort , could reconnect the pseudonym to the individual, it remains personal data subject to all GDPR obligations.
Pseudonymization is a security measure, not an exemption. It can reduce risk (and GDPR recognizes it as a safeguard in Articles 25 and 32), but it doesn't remove the data from GDPR's scope.
Anonymized Data = Outside GDPR Scope
Truly anonymized data , where re-identification is irreversible and not reasonably possible by any party using any means reasonably likely to be used , falls outside GDPR entirely (Recital 26).
The test is rigorous: you must consider all means "reasonably likely to be used" for re-identification, including future technological developments, cost of re-identification, and the availability of complementary datasets. The European Data Protection Board (EDPB) has set a high bar, and supervisory authorities have consistently found that datasets organizations believed were anonymous were, in fact, pseudonymous.
Practical Implication for Your ROPA
If your Records of Processing Activities exclude datasets on the assumption they're "anonymized," verify that assumption with a documented re-identification risk assessment. If you're wrong, those datasets should have been in your ROPA all along , and every processing activity involving them has been undocumented. Priverion's ROPA management includes data classification workflows that flag exactly this kind of gap across all group entities.
Criminal Conviction Data: Article 10
Data relating to criminal convictions and offences gets its own rule under Article 10. It's not a special category under Article 9, but processing is restricted to official authority or when authorized by EU or Member State law with appropriate safeguards.
For employers: background checks, criminal record disclosures, and even noting that an employee has a clean record all fall under Article 10. If your subsidiaries in different jurisdictions handle pre-employment screening differently, inconsistent treatment of Article 10 data is a common audit finding.
Children's Data: Enhanced Protections Under Article 8
When processing children's personal data based on consent for information society services, GDPR requires parental consent for children under 16 (though Member States can lower this to 13). The controller must make reasonable efforts to verify that consent is given by the holder of parental responsibility.
If your organization processes data from minors , educational platforms, family benefit programs, youth services , this adds a classification layer that your data mapping must reflect.
Mapping Personal Data Across a Multi-Entity Organization
Understanding the definition is the starting point. The real challenge for organizations with multiple subsidiaries is applying that definition consistently across every entity, every system, and every jurisdiction.
Here's the process that works , and what we've seen fail:
What Works: Centralized Taxonomy, Distributed Execution
- Establish a group-wide data classification taxonomy , define personal data categories identically across all entities. "Health data" means the same thing in your Swiss subsidiary and your German one.
- Map processing activities at the entity level , each subsidiary documents its own processing activities using the shared taxonomy. This ensures local accuracy with group-wide consistency.
- Identify cross-entity data flows , where personal data moves between subsidiaries or to third parties, map those flows explicitly. Intra-group transfers still require a legal basis.
- Automate recertification , when a classification changes (e.g., a new data type is added to HR systems), that change must propagate to every entity's ROPA. Manual email chains don't scale.
- Review and audit regularly , data classification isn't a one-time exercise. New systems, new vendors, and regulatory guidance all change what counts as personal data.
What Fails: Decentralized Classification Without Oversight
When each subsidiary defines personal data independently , or when the group DPO has no visibility into subsidiary-level classifications , inconsistencies accumulate silently. The German subsidiary classifies dietary preferences as health data (correct under many DPA interpretations). The UK subsidiary classifies the same field as "standard employee data." The ROPA looks complete in both entities, but the group's compliance posture has a gap that any cross-border audit will find.
This is the exact problem that led to Priverion's founding: a 12-subsidiary enterprise managing GDPR compliance across 47 spreadsheets, with no consistent way to ensure the same data was classified the same way everywhere.


