5 signs your knowledge base isn't ready for AI
Your AI pilots dazzle in demos and disappoint in production? The culprit is rarely the model. Here are the five signs that reveal a knowledge base not ready for RAG — and how to fix each one.
A RAG amplifies the quality — or mediocrity — of its corpus. The complete method to audit your base before AI: inventory, authority/freshness scoring, contradiction and gap detection, Knowledge Ops, and health KPIs.
Everyone rushes for the model, the pipeline, the reranking. Almost nobody looks at what truly determines the quality of an AI assistant: the corpus you feed it. A RAG is an amplifier. Give it clear, up-to-date, coherent documentation, and it will produce reliable answers. Give it a jumble of contradictory versions, stale documents and blind spots, and it will produce inaccuracies — confidently, and at scale. Before you vectorise anything, you have to audit what you actually know.
This guide lays out a complete method for auditing knowledge: inventory, score, detect contradictions and gaps, govern, and measure. It's the least glamorous work in a RAG project, and by far the most profitable. You can switch embeddings, add a Cohere Rerank 3 reranker, move to hybrid search: none of these levers will fix a corpus that lies, contradicts itself or stays silent. The work described here is not a prerequisite you tick off once; it's a discipline you install and maintain. Plan for two to four weeks for a first serious audit on a single business scope, then a far lighter recurring effort once governance is in place.
An AI assistant does not "understand" your business: it retrieves, rephrases and combines what it finds in your corpus. This mechanism has a direct consequence:
The danger is insidious: the assistant gives no signal that it's relying on a stale or contradictory document. It answers, fluent and convincing. It's precisely this fluency that makes a bad base so dangerous: it disguises the corpus's flaws as credible answers. A human consulting a wiki often spots that a page "smells old" — dated colours, mention of a defunct tool, the name of a colleague who left long ago. The RAG, on the other hand, has none of these contextual cues: it reads the text, embeds it, and serves it back as if it were authoritative.
You also have to understand what the pipeline does not fix. Reranking reorders passages by relevance to the question, not by their truth or freshness: a stale but highly relevant document will rise to the top. Hybrid search (BM25 + dense) improves recall, so it even tends to surface more mediocre documents. Contextual retrieval (prefixing each chunk with a short LLM-generated context) improves where information is located, not its intrinsic quality. No engineering brick can tell "the current procedure" from "the 2021 procedure" if nothing in the corpus says so.
Investing in the pipeline before auditing the corpus is like polishing the lens of a camera whose objective is dirty. Sharpness will never come from downstream.
You can only audit what you know. The first task is to map your information estate:
Don't start by collecting files, start with the people. Identify 5 to 10 people who produce or consume a lot of knowledge within the scope (support, product, legal, HR, field) and run a 30-minute interview with each: where do they go to find information? which documents do they open ten times a day? which ones do they no longer trust? This interview reveals what no automatic crawl will show — the real hierarchy of usage and tribal knowledge.
Only then should you inventory the locations: Confluence/Notion spaces, SharePoint/Drive folders, Git repos for technical docs, the ticket queue, old PDFs frozen in an "archives" folder that everyone still consults. For each source, fill out a source card.
Source: "Support L1" space (Confluence)
Owner: Support team (lead: J. Martin)
Volume: ~420 pages, ~6 MB of text
Nature: free text + screenshots (low OCR value)
Last overall review: unknown
Sensitivity: low (no personal data), 3 pages with screenshots to blur
Proposed ingestion status: YES ("FAQ" subspace as priority)
Format: HTML export → Markdown
Aggregate these cards into a single table. Example:
| Source | Owner | Volume | Authority | Freshness | Sensitivity | Ingest? |
|---|---|---|---|---|---|---|
| Confluence Support L1 | Support | 420 p. | Official | Mixed | Low | Yes (partial) |
| "HR Procedures" Drive | HR | 1,200 files | Mixed | Variable | High | After triage |
| Product wiki (Notion) | Product | 180 p. | Official | Good | Low | Yes |
| "Archives 2019-2022" folder | none | 3,400 files | Unknown | Stale | Unknown | No (archive) |
| "support@" mailbox | none | ~ unlimited | None | — | High | No |
The deliverable of this step is this source map: what you have, where, in what state, and under whose responsibility. It then serves as the basis for all ingestion decisions. An audit that produces this table has already created value, even before the first vectorisation.
Not all documents are equal. Two dimensions structure the triage:
Authority — is this document authoritative? Is it the official source, or a forgotten working note? A validated policy, or a draft? A RAG that mixes official documents and informal notes produces inconsistent answers.
Freshness — when was it last updated? A procedure document untouched for three years is suspect by default. A date isn't proof of obsolescence, but it's a strong signal.
Score each source (or each family of documents) on two axes from 1 to 5.
Authority (1-5)
- 5 — Official document validated by an identified owner, "approved" status.
- 4 — Widely used reference document, but informal validation.
- 3 — Useful but unvalidated content (team guide, active wiki page).
- 2 — Working note, draft, shared personal content.
- 1 — Unknown source, copy with no origin, unattributable content.
Freshness (1-5)
- 5 — Reviewed/updated less than 6 months ago.
- 4 — 6 to 12 months.
- 3 — 12 to 24 months.
- 2 — 24 to 36 months.
- 1 — More than 36 months, or unknown date.
Weight according to the volatility of the topic: pricing or a compliance procedure changes fast (freshness counts double); the definition of a business concept ages slowly (freshness counts little). Then combine into a simple score, for instance score = authority × freshness (from 1 to 25), and set action thresholds.
| Document | Authority | Freshness | Score (A×F) | Decision |
|---|---|---|---|---|
| Refund policy v4 (approved, March 2026) | 5 | 5 | 25 | Ingest, weight high |
| Support FAQ (active, reviewed 8 months ago) | 4 | 4 | 16 | Ingest |
| Onboarding guide (useful, 20 months) | 3 | 3 | 9 | Ingest after review |
| Exported Slack note "refund process" (2022) | 2 | 1 | 2 | Discard / archive |
| "Pricing" PDF with no date | 3 | 1 | 3 | Refresh before ingesting |
Indicative arbitration rule: ≥ 16 goes straight into the served corpus; 8-15 enters after a quick review by the owner; < 8 is discarded or refreshed first. These thresholds are to be calibrated, but the important thing is that they be explicit and traced: every inclusion decision must be explainable.
Better a restricted corpus of authoritative documents than an exhaustive corpus where the authoritative drowns in the stale. When it comes to knowledge, quantity hurts quality beyond a certain threshold.
This is the silent poison of knowledge bases. When two documents assert incompatible things — an old refund policy and the new one, two divergent procedures — the assistant has no way of knowing which to favour. It may cite one, the other, or worse, blend the two into a single fluent and false answer.
Two documents on refunds: Refund-policy-v4.pdf (March 2026, "14 business days") and a wiki page "Refunds" (undated, "within 30 days"). The RAG, queried about the deadline, may answer "30 days", "14 days", or worse "between 14 and 30 days". The remedy: designate v4 as the source of truth, update or redirect the wiki page, and remove from the served corpus every contradictory version (archived separately, not deleted — traceability has its value).
The general remedy: a single source of truth per topic, and archiving stale versions outside the corpus.
A gap is a topic on which your base has nothing, or nothing satisfactory — while your users will inevitably ask the question. Two moments to detect them:
Confront the source map with the expected questions. Build a reference set of 50 to 200 questions (a golden set) from three materials: the most frequent tickets/emails of the last 6 months, the recurring questions in meetings, and the intuition of business experts. For each question, check: is there an authoritative document on this point? Questions with no documented answer are your cold gaps. This golden set will be reused for continuous evaluation (RAGAS: context recall, answer relevancy).
Tap into real queries, where users themselves tell you where the holes are:
Aggregate these signals by theme, rank by frequency, and you get a gap queue prioritised by real demand — far more reliable than any guesswork.
Gaps are not guessed, they are measured. Before launch, by anticipation; afterwards, by listening to real queries. A mature corpus is one whose gaps are filled at the pace of needs.
A one-off audit doesn't hold: the corpus is alive. Governance — what's sometimes called Knowledge Ops — turns the audit into a durable process.
Set the frequency to the volatility of the topic:
Creation/correction (contributor)
│
Knowledge Owner review (authority + freshness)
│
"approved" status + review date + owner
│
ingestion into the served corpus
│
auto-expiry at the deadline ─► back to review
Write them down in black and white. Example: documents enter the served corpus if they have "approved" status, a score ≥ 16, and no uncontrolled personal data; kept out are drafts, archives, raw mailbox exports, and any untreated sensitive content (anonymisation, access control). Each source on the map carries an explicit and revisable ingestion status.
Sources ─► audit (inventory, scoring, contradictions, gaps)
│
served corpus (authoritative, fresh, coherent)
│
real usage ─► gap/contradiction detection
│
prioritised queue ─► creation/correction ─► (loop)
What isn't measured isn't improved. A few corpus health indicators, with indicative targets to calibrate against your maturity:
Also track the median age of served documents and the number of documents in "to review" status: two early warning signals of degradation. These metrics make corpus quality a steerable object, whose progress can be justified to leadership — and not a fuzzy intuition.
A European SaaS SME deploys an internal assistant for its L1 support. How the audit unfolded:
Result: a smaller corpus (240 vs 1,100 documents) but a healthier one, and an assistant that answers correctly and with sources. Quality didn't come from a better model — it came from a better corpus.
Knowledge auditing is the invisible work that decides the visible success of your RAG. Before optimising the slightest retrieval parameter, take the time to inventory, score, purge contradictions and map gaps — then install governance that maintains this quality over time. An AI assistant will never be better than the knowledge entrusted to it. The good news is that this effort benefits the whole organisation, far beyond AI: a healthy knowledge base is an asset in its own right. AI is merely its most merciless revealer.
Your AI pilots dazzle in demos and disappoint in production? The culprit is rarely the model. Here are the five signs that reveal a knowledge base not ready for RAG — and how to fix each one.
Every user query reveals what is missing from your base. Detect gaps (vs mere retrieval problems), diagnose the three causes, and close the loop for a living corpus driven by real usage.
Start free. First Knowledge Pulse audit in 60 seconds.
Start free