GDPR and sovereignty: how your data stays in Europe with RagNight
AI agents managed by major US hyperscalers raise compliance issues that many CIOs prefer to ignore. Let's set things straight.
The real EU AI Act timeline (2025-2027), risk tiers, where an enterprise RAG assistant lands, and a concrete compliance checklist — jargon-free, for DPOs and product teams.
The EU AI Act is too often filed under "a lawyer's problem" — an abstract constraint to deal with "later." That framing is a mistake. The regulation already shapes how you design, document, and operate an AI system, including a simple RAG assistant wired to your internal documentation. Teams that build it in early gain an edge: they avoid refactoring under pressure, and they turn compliance into a sales argument.
Here is what the AI Act actually changes, when the obligations land, and an operational checklist for an enterprise AI system.
The AI Act does not regulate "AI" as a block. It classifies uses by the risk they pose to people's rights, and scales obligations accordingly. Four tiers:
The key reflex is not "am I doing AI?" but "which risk tier does each of my uses fall into?" The answer drives everything else.
Some concrete examples you will meet in the enterprise:
The line that matters in practice sits between limited and high risk. That is where most trade-offs happen for a company deploying generative AI.
The text entered into force on 1 August 2024, but obligations apply in waves:
In practice: if you ship in 2026, transparency and risk classification are no longer optional. And if one of your uses is high-risk, the clock on technical documentation is already running.
The AI literacy obligation, in force since February 2025, is the most underestimated. A quickly signed charter does not settle it. The people who design, deploy, and use your AI systems must understand their capabilities, limits, and risks. For a RAG assistant, that means your support agents know the AI can "hallucinate," that it does not replace their judgment, and that they must verify sensitive answers. A role-tailored, logged training session is usually enough — but it has to exist and be documented.
An internal assistant answering employee questions from your documentation, or a customer-support copilot, most often falls under limited risk. Your main obligations are transparency: tell users they are talking to an AI, make generated content identifiable, and document what the system does and which data it relies on.
But beware the shifts into high risk. The same RAG stack can change tier by use:
Technology does not decide the risk tier: use does. The same RAG pipeline is "limited risk" for an internal FAQ and "high risk" for screening applications.
A SaaS company ships an assistant that answers customers from its product docs and resolved-ticket base. It makes no decision about a person: it informs, routes, and escalates to a human when it does not know. Limited risk. Its obligations fit in a few lines: a "You are chatting with an AI assistant" banner, marking of generated answers, an internal page describing the sources used, and a channel to reach a human agent. Nothing crushing for a serious engineering team.
The same company extends the assistant to HR. First version: it answers staff about leave and health benefits — still limited risk. Then the team adds a function that ranks and scores applications received for a role, on the same RAG pipeline. At that exact moment, the use shifts to high risk: recruitment is explicitly listed among the AI Act's sensitive domains. The obligation perimeter changes entirely: a formal risk-management system, full technical documentation, data quality and representativeness requirements, effective human oversight on every decision, long-term logging, and clear information to candidates. A single feature added to an existing product can move the whole project from one regime to another. Hence the importance of reclassifying risk at every functional change, not just at launch.
For a high-risk system, technical documentation is not a formality: it is the auditable proof that you control your system. It must cover, among other things:
Good news for RAG practitioners: much of this content already exists as engineering decisions. Your chunking parameters, your hybrid retrieval strategy, your relevance thresholds, your evaluation golden set — all of that is technical documentation, provided you write it as you go rather than reconstructing it in a panic. Version those decisions in the repo, next to the code, and the documentation almost builds itself.
Transparency is not a sentence buried in your terms of service. A few formulations that work:
The best transparency notice is the one that helps the user while keeping you compliant. Citing your sources ticks the regulatory box and reduces perceived hallucinations.
A common mistake is believing GDPR compliance is enough. The two texts pursue different goals and stack. GDPR protects personal data (legal basis, minimization, data-subject rights, DPIA). The AI Act governs the AI system itself (risk classification, technical documentation, human oversight, data governance, transparency). A RAG assistant may need a GDPR legal basis, a DPIA if warranted, and AI Act transparency — at the same time.
These two deliverables overlap but are not the same:
On a high-risk RAG project, you run both in parallel and have them cross-reference each other. The DPIA describes the ingested corpus containing personal data and its legal basis; the technical documentation describes how the pipeline processes that corpus, with which guardrails. Running only one leaves you exposed on the other regime.
Good news: a well-built RAG assistant (controlled sources, sourced answers, sovereign hosting, logging) already covers most of this checklist through sound engineering.
Read as a list of bans, the AI Act is scary. Read differently, it formalizes what a serious company should already do: know where its data lives, trace its processing, keep a human in the loop for sensitive decisions, and tell users the truth. Classify your uses now, document as you go, and make transparency a product feature. Compliance then becomes a commercial asset, not technical debt. In a procurement process, being able to produce your risk classification, your technical documentation, and your human-oversight policy on request immediately sets you apart from a competitor just discovering the topic. Handled early, compliance is not a cost: it is a barrier to entry you raise against those who neglected it.
AI agents managed by major US hyperscalers raise compliance issues that many CIOs prefer to ignore. Let's set things straight.
Anonymize or pseudonymize before vectorizing? The GDPR distinction, why perfect anonymization is rare on free text, the techniques (NER, consistent pseudonymization), and the reflexes to adopt.
The GDPR compliance guide for your RAG projects: legal basis, minimization, personal data in embeddings and prompts, the right to erasure cascading into vectors, subprocessors, and articulation with the AI Act.
Start free. First Knowledge Pulse audit in 60 seconds.
Start free