RAG over contracts and internal policies: sourced search, comparison and summaries. Why traceability (document, article, version), versioning and permissions are the precondition for legal use.
Legal teams drown in dense, high-stakes material: contracts, internal policies, case law, standard clauses, precedents. Finding the right clause in the right contract, checking a policy is current, comparing commitments across vendors — each task is time-consuming and errors are costly. It is ideal ground for RAG, provided you respect one non-negotiable requirement in this domain: traceability.
Here is how to build an internal legal assistant that saves time without ever sacrificing rigor.
The legal problem: volume, stakes, zero room for error
Legal documents share three traits: voluminous and heterogeneous (hundreds of contracts, multi-version policies, annexes), high-stakes (a misread clause or stale version can bind the company), and proof-demanding (a lawyer never settles for an answer — they want the exact source: contract, article, version). An assistant that "answers" without citing its precise source is unusable in legal — worse, dangerous.
In legal, an unsourced answer is worthless. The question is not "what does the AI say?" but "in which document, which article, which version?"
What a legal RAG assistant can do
Well-built, it accelerates concrete tasks: contract search ("which vendor contracts contain a non-compete over 24 months?"), compliance checks against current internal policy, clause comparison across contracts, sourced summaries (obligations of a contract with article references), and first-level answers to recurring business questions (NDAs, GDPR, procurement) without engaging a lawyer for every case. The goal is not to replace the lawyer but to spare them tedious search and relieve simple requests.
What a good sourced answer looks like
The difference between a gadget and a working tool lies in the answer format. Compare.
A buyer asks: "Does our service agreement with ACME let us terminate without penalty if the vendor misses deadlines?"
Useless answer: "Yes, you can terminate for delay." No lawyer will act on that.
Workable answer:
Yes, under conditions. The ACME Service Agreement, Article 9.2 "Termination for Breach" (version 2025-03-14, in force), allows termination by right after a formal notice left unanswered for 30 days. The exit penalty in Article 12.1 does not apply here (it covers termination for convenience).
Sources: ACME-MSA · Art. 9.2 · v2025-03-14 ; Art. 12.1 · v2025-03-14.
The second answer cites the document, article, and version, and distinguishes two seemingly conflicting clauses. The lawyer clicks, verifies, validates in two minutes. That is the only acceptable quality bar.
Structure-preserving chunking
Generic "1000 tokens, 200 overlap" chunking is disastrous on a contract: it cuts a clause in half, separates the article heading from its body, and mixes the end of a liability clause with the start of a confidentiality clause. Retrieval then surfaces unusable fragments, and the citation points to a "piece" with no identity.
A contract must be split by legal unit: article, section, sub-clause. Structure is the unit of meaning, not the token counter. In practice, you parse the hierarchy (Title → Article → Paragraph → Sub-paragraph) and emit one chunk per clause, attaching rich metadata:
{
"text": "Article 9.2 — Termination for Breach. In the event that...",
"metadata": {
"document_id": "ACME-MSA",
"document_title": "ACME Service Agreement",
"doc_type": "service_agreement",
"article_number": "9.2",
"article_title": "Termination for Breach",
"parent_section": "9. Term and Termination",
"version": "2025-03-14",
"effective_from": "2025-04-01",
"effective_to": null,
"status": "in_force",
"counterparty": "ACME SAS",
"confidentiality": "restricted",
"owning_team": "legal-procurement"
}
}
A few rules that make the difference:
- One clause = one chunk (or a small group of chunks if the clause is very long, with overlap inside the clause only, never across two distinct clauses).
- Prefix each chunk with its context — the article number and title, even the parent section, inside the vectorized text. This is the spirit of contextual retrieval: an isolated fragment "the period is 30 days" means nothing; "Article 9.2 Termination — the period is 30 days" is retrievable and citable.
- Keep cross-references — contracts are riddled with "subject to Article 12." Storing those references as metadata lets you fetch the linked clause at answer time.
- Tables and annexes: pricing grids and SLAs deserve separate handling (structured extraction), or they become illegible once flattened into text.
On a legal corpus, chunking is the project. A good model never compensates for chunking that destroys clause structure.
Versioning mechanics: which version is authoritative, and when
This is the most underestimated — and most dangerous — point. A remote-work policy exists in v1 (2023), v2 (2024), and v3 (2026). An amendment changes Article 7 of a contract without touching the rest. Serving the wrong version means answering wrong with confidence.
The rule: each clause carries a validity period, and retrieval is filtered by the relevant date.
- Each version of a document (or of a clause amended by an addendum) carries
effective_from and effective_to. The current version has effective_to = null.
- By default, the assistant retrieves only clauses in force as of today. The vector query is paired with a temporal filter.
- For a historical question ("what was our leave policy in June 2024?"), the user can set an as-of date; the filter becomes "clause valid on 2024-06-01."
- An addendum does not "replace" the contract: it closes the old clause's period (
effective_to = addendum date) and opens a new clause. History stays queryable but is never served by mistake as current.
-- Retrieve the clause in force as of a given date
SELECT text, metadata
FROM document_chunks
WHERE document_id = 'ACME-MSA'
AND (metadata->>'effective_from')::date <= :as_of_date
AND ( metadata->>'effective_to' IS NULL
OR (metadata->>'effective_to')::date > :as_of_date )
ORDER BY embedding <=> :query_embedding -- pgvector similarity
LIMIT 10;
The reflex to encode in the system prompt: if multiple versions surface, the assistant states which one is authoritative and flags the existence of others ("version in force since 2025-04-01; an earlier version existed until that date"). Never a silent merge of two versions.
A clause with no validity date is a time bomb. Versioning is not a refinement: without it, the assistant lies from the first amendment.
Access control applied at retrieval
Not every contract is readable by everyone. A confidential M&A contract, an executive's salary terms, an ongoing dispute: these must never surface in the answer of an unauthorized user — including indirectly, through a summary.
The classic mistake is filtering at display: retrieve everything, then hide. That is a guaranteed leak, because the LLM has already seen the content and can restate it. The filter must apply at retrieval, upstream of the model.
In practice, the user's permissions are injected into the vector query as a hard constraint:
-- The user only has access to certain teams / confidentiality levels
SELECT text, metadata
FROM document_chunks
WHERE metadata->>'owning_team' = ANY(:user_teams)
AND metadata->>'confidentiality' = ANY(:user_clearances)
AND (metadata->>'effective_to' IS NULL OR (metadata->>'effective_to')::date > now())
ORDER BY embedding <=> :query_embedding
LIMIT 10;
A concrete example. A Procurement team member asks the assistant about "our non-solicitation commitments." The corpus contains a relevant clause in an M&A contract owned by the M&A team, classified confidential-board. Since confidential-board is not in their clearances, that clause never even enters the LLM's context. The assistant answers based only on Procurement contracts and — importantly — does not hint that a fuller answer exists elsewhere (which would already be a leak). Filtering at the source guarantees you cannot extract through conversation what you could not open directly.
Mini-scenario: comparing a clause across two contracts
Comparison is one of the highest-value uses, and a good test of the architecture. Question: "Compare the limitation of liability clause between the ACME contract and the Globex contract."
A well-built pipeline proceeds like this:
- Per-document targeted query — instead of one global search, the assistant runs two filtered retrievals:
document_id = ACME-MSA and document_id = Globex-MSA, both scoped to the "limitation of liability" theme and to clauses in force.
- Fetch the right clauses — Art. 11 in ACME, Art. 8.3 in Globex (the numbers differ, hence the importance of searching by meaning, not number).
- Structured, sourced answer:
|
ACME (Art. 11, v2025-03-14) |
Globex (Art. 8.3, v2024-11-02) |
| Cap |
12 months of fees |
Total paid over 24 months |
| Indirect damages |
Excluded |
Excluded except gross negligence |
| Exceptions to cap |
Data breach, IP |
None |
Takeaway: the Globex cap is higher, but ACME fully excludes indirect damages while Globex carves out gross negligence — more protective for us on the ACME side on this specific point.
Sources: ACME-MSA · Art. 11 · v2025-03-14 ; Globex-MSA · Art. 8.3 · v2024-11-02.
Every row of the table is traceable to its source clause. The lawyer does not redo the work: they validate an already-sourced comparison. This is where legal RAG truly pays off.
Essential guardrails
- Refuse rather than invent — if the answer is not in the corpus, say so. A confident hallucination is unacceptable in legal.
- No disguised legal advice — the assistant retrieves and summarizes; it does not replace the lawyer's judgment. Frame this explicitly.
- Version freshness — a repealed policy must never be served as current.
- Confidentiality — contracts are among the company's most sensitive data. Sovereign hosting and strict isolation are mandatory.
A legal assistant is judged on its caution as much as its usefulness: an honest "I can't find it" beats a confidently invented clause.
Real-world pitfalls
Beyond principles, here is what sinks projects in production:
- Generic chunking — pitfall #1. Chainsaw chunking destroys clause structure and makes every citation suspect. Invest in structure-aware chunking before any other optimization.
- The forgotten amendment — the main contract is indexed, but the addendum amending Article 7 sits in another folder. The assistant serves a stale clause in good faith. Linking addendum ↔ contract must be systematic.
- The un-OCR'd scan — a signed contract is often an image PDF. Without quality OCR it is invisible to search. Worse, poor OCR introduces typos in amounts or deadlines.
- Citing by page number — fragile: pagination shifts between versions. Cite by article and version, never by page alone.
- Filtering at display — already noted, but worth repeating: it is the most common confidentiality leak. Filter at retrieval.
- Corpus language — contracts in French, in English, sometimes bilingual. A multilingual embedding (
bge-m3, Voyage, Cohere) is essential; a monolingual model misses half the matches.
- No golden set — without a reference question set verified by a lawyer, you cannot measure whether the assistant truly points to the right clause. Set it up from day one.
Measuring value
- Search time saved per query (before/after).
- Share of sourced answers — must target 100%; an unsourced answer is a defect.
- Relief rate — business requests handled without engaging a lawyer.
- Verified accuracy — on a sample (the golden set), do citations point to the right clause/version? Track context precision and context recall (RAGAS), paired with human review on sensitive cases.
- Relevant-refusal rate — does the assistant say "I can't find it" rather than invent? A too-low refusal rate on out-of-corpus questions is a red flag.
Conclusion
Legal perfectly illustrates a truth of enterprise RAG: value comes not from the fluency of the answer but from its verifiability. A legal assistant that systematically cites its source, respects versions and permissions, and knows its limits becomes a genuine accelerator — an augmented lawyer, not a replaced one. Traceability, far from slowing usage, is precisely what makes it possible.
Further reading
- GDPR and generative AI: the compliance guide for your RAG projects
- AI sovereignty for the European enterprise: the 2026 strategic guide