The EU May Redefine 'Personal Data' for AI Training

For more than a decade, the GDPR has been the strictest broad data protection law in the world, and one of its quiet strengths was a deliberately wide definition of “personal data”: if information could plausibly be linked back to a person — by anyone, with any reasonably available means — it counted, full stop. That breadth is exactly what’s now under negotiation.

The European Commission’s “Digital Omnibus” package, working through the EU legislative process in 2026, proposes changes that would narrow that definition and explicitly bless “legitimate interest” as a valid legal basis for training AI models on personal data, with an opt-out rather than an opt-in attached. Privacy advocates have not been quiet about it — noyb founder Max Schrems described the direction as resembling “Trump’ian lawmaking practices taking hold in Brussels.”

Whatever happens to this specific proposal, it’s worth understanding what’s actually being debated, because it reframes a question that affects anyone whose photos, documents, or personal information sit in any cloud product with an AI feature.

The GDPR’s existing standard for “personal data” is assessed broadly: if data could be linked to an identifiable person — by the company holding it, or by some other party with means that aren’t unreasonably difficult to obtain — it’s personal data, regardless of who’s actually doing the linking. This expansive standard is part of why GDPR has been considered the global benchmark; it closes the loophole of “this data isn’t personal because we can’t directly identify someone with it,” even when combining it with other available data plausibly could.

Using personal data for a new purpose — like training an AI model — has generally required a clear legal basis: consent, contractual necessity, or one of several other grounds, including “legitimate interest.” Legitimate interest has always existed in the law, but it comes with built-in friction: it requires balancing the company’s interest against the individual’s rights, and it’s been applied cautiously to anything as consequential as AI training.

What the Digital Omnibus Proposes to Change

Two changes are doing most of the work in this proposal:

A narrower definition of personal data. The standard would shift toward being assessed from the data controller’s perspective — what that specific company can realistically identify — rather than the broader “any party, with reasonably available means” standard. In practice, this means more data could fall outside GDPR’s scope entirely if a company argues it can’t itself directly identify someone with it, even if the data could be combined with other sources to do so.

Explicit legitimate-interest cover for AI training. The Commission’s proposal would treat AI development and operation as a recognized legitimate interest, provided companies apply “enhanced safeguards” and offer users an unconditional right to opt out. The EU’s own data protection bodies (the EDPB and EDPS) have pushed back gently here, noting that legitimate interest arguably already permits this under current law and doesn’t need a new article — which is itself a telling signal that the change is more about removing ambiguity in companies’ favor than creating new legal ground.

The shift from opt-in to opt-out matters more than it sounds. An opt-in regime means a company has to get affirmative permission before using your data for something new. An opt-out regime means the use is already authorized by default, and the burden falls on you to find the setting, understand what it does, and turn it off — for every product, individually, on your own initiative.

Why This Is Happening Now

The pressure behind this proposal isn’t really about individual users — it’s about competitiveness. European officials have been increasingly vocal about wanting EU companies to be able to build and train AI models without the same level of friction that US and Chinese companies often operate under, where data protection regimes are generally lighter or differently structured. The EU is also separately proposing to delay parts of the AI Act’s high-risk rules into 2027, in the same general direction: easing constraints on AI development relative to where the law currently sits.

That’s a coherent policy goal. It’s also one that trades away some of the specific protection that made “I’m in the EU, so my data has the GDPR’s protection” a meaningfully different statement than the same sentence said about most other jurisdictions.

What This Means If It Passes

More products will be able to use your data for AI training by default, with notice rather than consent. If legitimate interest becomes the standard basis, the practical experience for users shifts from “we’re asking permission” to “we’re telling you, and you can opt out if you go find the setting.”

Less data will count as “personal” in the first place — which means less data will be protected at all. A narrower definition doesn’t just affect AI training; it affects every GDPR right tied to personal data, including access requests, deletion rights, and breach notification obligations. If something falls outside the definition, none of those rights attach to it.

The actual difference between services will increasingly come down to their own commitments, not just the law’s floor. If the legal minimum drops, the gap between “what the law requires” and “what a specific company actually does” gets wider — which makes a company’s own stated policy (not selling data, not training third-party AI, exact retention behavior) matter more than it did when GDPR’s default was already strict for everyone.

Why a Narrower Definition Has Ripple Effects Beyond AI

It’s easy to read this debate as being only about AI training, since that’s the headline use case driving the proposal. But the definition of “personal data” is the foundation the entire GDPR sits on — every right in the regulation (access, correction, deletion, breach notification, the rules around automated decision-making) only applies to information that qualifies as personal data in the first place.

A narrower definition doesn’t selectively exempt AI training while leaving everything else as protected as before. If a category of information falls outside the redefined scope — because a company can argue it can’t, on its own, directly identify someone with it — none of the GDPR’s other protections apply to that data either, not just the AI-training-specific ones. This is why the EU’s own data protection bodies have been cautious about the proposal even though they’re broadly sympathetic to giving AI development a clearer legal basis: narrowing the definition is a much blunter tool than the specific problem (legal clarity for AI training) actually requires.

There’s also a practical irony in the controller-perspective standard. Data that one company can’t identify on its own might be trivially identifiable once combined with another dataset that company has access to, or could acquire. The “any party, with reasonably available means” standard existed specifically to prevent companies from claiming ignorance about identifiability they could resolve with minimal additional effort. Weakening that standard reopens exactly the loophole it was designed to close.

How This Compares to the US Approach

For anyone weighing this against US privacy law, it’s worth noting the EU isn’t moving toward the US model exactly — it’s moving partway. US frameworks like the CCPA (which California enforces, and which daftei complies with as a baseline alongside GDPR) have historically combined a comparably broad definition of personal information with consent and opt-out mechanisms that vary significantly by state, since there’s no single federal privacy law equivalent to GDPR.

What’s notable is that the EU debate is borrowing one specific feature long associated with the more permissive end of US-style regulation — legitimate-interest-by-default with an opt-out — while keeping much of the GDPR’s procedural apparatus (data protection officers, impact assessments, breach notification timelines) intact. The result, if it passes as proposed, would be a hybrid: still more procedurally rigorous than most US state laws, but closer to opt-out defaults on the specific question of AI training than GDPR was originally designed to allow.

How to Read Any Privacy Policy After This

Regardless of how the Digital Omnibus negotiation ends, a few habits hold up regardless of which way EU law moves:

Look for an explicit “we do not train AI on your data” statement, not just compliance language. “We comply with GDPR” is a floor, not a commitment — especially as that floor shifts. A specific, structural statement that a company doesn’t use your content for AI training at all is a stronger signal than a generic compliance claim, because it doesn’t depend on which legal basis happens to be available to them this year.

Check for an opt-out and actually use it, rather than assuming silence means protection. If legitimate-interest-by-default becomes more common, “I didn’t say yes” stops being equivalent to “they can’t use it.” The action shifts to you, and it’s worth treating new product settings and updated privacy policies as something to actually read rather than dismiss.

Treat jurisdiction as one signal, not the whole answer. “Based in the EU” or “GDPR compliant” has been a reasonably strong privacy signal for a decade. If the underlying law shifts to be more permissive, that signal weakens — not because the company changed, but because the bar it’s being measured against did.

Where daftei Stands, Regardless of How This Resolves

daftei is built to be GDPR and CCPA compliant as a baseline, not as the entirety of its privacy commitment. Beyond legal compliance, daftei doesn’t train any AI — its own or any third party’s — on user content, doesn’t sell data, and runs no ads, which means there’s no business model pulling toward “legitimate interest” being interpreted as broadly as the law allows. Account deletion is permanent after a 30-day grace window, and that doesn’t change based on which legal basis a regulation happens to authorize this year.

Laws set the floor. What a company actually chooses to do above that floor is the part worth checking — especially in a year when the floor itself is being renegotiated.

See daftei’s approach to your data

What the GDPR Currently Says