All posts Knowledge Audit

Audit your knowledge base before AI: the complete method

L'équipe RagNight · 15 min read · March 24, 2026

A RAG amplifies the quality — or mediocrity — of its corpus. The complete method to audit your base before AI: inventory, authority/freshness scoring, contradiction and gap detection, Knowledge Ops, and health KPIs.

Everyone rushes for the model, the pipeline, the reranking. Almost nobody looks at what truly determines the quality of an AI assistant: the corpus you feed it. A RAG is an amplifier. Give it clear, up-to-date, coherent documentation, and it will produce reliable answers. Give it a jumble of contradictory versions, stale documents and blind spots, and it will produce inaccuracies — confidently, and at scale. Before you vectorise anything, you have to audit what you actually know.

This guide lays out a complete method for auditing knowledge: inventory, score, detect contradictions and gaps, govern, and measure. It's the least glamorous work in a RAG project, and by far the most profitable. You can switch embeddings, add a Cohere Rerank 3 reranker, move to hybrid search: none of these levers will fix a corpus that lies, contradicts itself or stays silent. The work described here is not a prerequisite you tick off once; it's a discipline you install and maintain. Plan for two to four weeks for a first serious audit on a single business scope, then a far lighter recurring effort once governance is in place.

Why RAG amplifies quality — or mediocrity

An AI assistant does not "understand" your business: it retrieves, rephrases and combines what it finds in your corpus. This mechanism has a direct consequence:

  • excellent documentation produces an excellent assistant;
  • mediocre documentation produces an assistant that answers confidently… with errors.

The danger is insidious: the assistant gives no signal that it's relying on a stale or contradictory document. It answers, fluent and convincing. It's precisely this fluency that makes a bad base so dangerous: it disguises the corpus's flaws as credible answers. A human consulting a wiki often spots that a page "smells old" — dated colours, mention of a defunct tool, the name of a colleague who left long ago. The RAG, on the other hand, has none of these contextual cues: it reads the text, embeds it, and serves it back as if it were authoritative.

You also have to understand what the pipeline does not fix. Reranking reorders passages by relevance to the question, not by their truth or freshness: a stale but highly relevant document will rise to the top. Hybrid search (BM25 + dense) improves recall, so it even tends to surface more mediocre documents. Contextual retrieval (prefixing each chunk with a short LLM-generated context) improves where information is located, not its intrinsic quality. No engineering brick can tell "the current procedure" from "the 2021 procedure" if nothing in the corpus says so.

Investing in the pipeline before auditing the corpus is like polishing the lens of a camera whose objective is dirty. Sharpness will never come from downstream.

Step 1 — Inventory and map

You can only audit what you know. The first task is to map your information estate:

  • Which sources? Knowledge bases, wikis, shared drives, emails, tickets, discussion threads, scattered documents.
  • What volume and what nature? Structured documents, free text, scanned PDFs, tables, code.
  • Who produces and who maintains? A source with no owner is a source that ages badly.
  • What is the sensitivity? Personal, confidential, regulated data — to be handled separately.

The concrete method

Don't start by collecting files, start with the people. Identify 5 to 10 people who produce or consume a lot of knowledge within the scope (support, product, legal, HR, field) and run a 30-minute interview with each: where do they go to find information? which documents do they open ten times a day? which ones do they no longer trust? This interview reveals what no automatic crawl will show — the real hierarchy of usage and tribal knowledge.

Only then should you inventory the locations: Confluence/Notion spaces, SharePoint/Drive folders, Git repos for technical docs, the ticket queue, old PDFs frozen in an "archives" folder that everyone still consults. For each source, fill out a source card.

Template for a source card

Source: "Support L1" space (Confluence)
Owner: Support team (lead: J. Martin)
Volume: ~420 pages, ~6 MB of text
Nature: free text + screenshots (low OCR value)
Last overall review: unknown
Sensitivity: low (no personal data), 3 pages with screenshots to blur
Proposed ingestion status: YES ("FAQ" subspace as priority)
Format: HTML export → Markdown

Aggregate these cards into a single table. Example:

Source Owner Volume Authority Freshness Sensitivity Ingest?
Confluence Support L1 Support 420 p. Official Mixed Low Yes (partial)
"HR Procedures" Drive HR 1,200 files Mixed Variable High After triage
Product wiki (Notion) Product 180 p. Official Good Low Yes
"Archives 2019-2022" folder none 3,400 files Unknown Stale Unknown No (archive)
"support@" mailbox none ~ unlimited None High No

The deliverable of this step is this source map: what you have, where, in what state, and under whose responsibility. It then serves as the basis for all ingestion decisions. An audit that produces this table has already created value, even before the first vectorisation.

Step 2 — Score authority and freshness

Not all documents are equal. Two dimensions structure the triage:

Authority — is this document authoritative? Is it the official source, or a forgotten working note? A validated policy, or a draft? A RAG that mixes official documents and informal notes produces inconsistent answers.

Freshness — when was it last updated? A procedure document untouched for three years is suspect by default. A date isn't proof of obsolescence, but it's a strong signal.

A concrete scoring grid

Score each source (or each family of documents) on two axes from 1 to 5.

Authority (1-5)
- 5 — Official document validated by an identified owner, "approved" status.
- 4 — Widely used reference document, but informal validation.
- 3 — Useful but unvalidated content (team guide, active wiki page).
- 2 — Working note, draft, shared personal content.
- 1 — Unknown source, copy with no origin, unattributable content.

Freshness (1-5)
- 5 — Reviewed/updated less than 6 months ago.
- 4 — 6 to 12 months.
- 3 — 12 to 24 months.
- 2 — 24 to 36 months.
- 1 — More than 36 months, or unknown date.

Weight according to the volatility of the topic: pricing or a compliance procedure changes fast (freshness counts double); the definition of a business concept ages slowly (freshness counts little). Then combine into a simple score, for instance score = authority × freshness (from 1 to 25), and set action thresholds.

A worked example

Document Authority Freshness Score (A×F) Decision
Refund policy v4 (approved, March 2026) 5 5 25 Ingest, weight high
Support FAQ (active, reviewed 8 months ago) 4 4 16 Ingest
Onboarding guide (useful, 20 months) 3 3 9 Ingest after review
Exported Slack note "refund process" (2022) 2 1 2 Discard / archive
"Pricing" PDF with no date 3 1 3 Refresh before ingesting

Indicative arbitration rule: ≥ 16 goes straight into the served corpus; 8-15 enters after a quick review by the owner; < 8 is discarded or refreshed first. These thresholds are to be calibrated, but the important thing is that they be explicit and traced: every inclusion decision must be explainable.

Better a restricted corpus of authoritative documents than an exhaustive corpus where the authoritative drowns in the stale. When it comes to knowledge, quantity hurts quality beyond a certain threshold.

Step 3 — Detect contradictions

This is the silent poison of knowledge bases. When two documents assert incompatible things — an old refund policy and the new one, two divergent procedures — the assistant has no way of knowing which to favour. It may cite one, the other, or worse, blend the two into a single fluent and false answer.

The method

  1. Group by topic. Use embedding clustering (or a simple sort by title/tag) to gather the documents that talk about the same thing. Two to five documents on the same topic — there's your list of contradiction candidates.
  2. Compare key assertions. For each cluster, extract the "hard" facts: amounts, deadlines, thresholds, conditions, effective dates. A "topic → what each document says" grid makes the gaps jump out.
  3. Ask an LLM to play referee. Give it two excerpts and ask: "Are these two passages compatible? If not, on what point do they diverge?" It's an excellent first detection pass, to be validated afterwards by a human.
  4. Test with trick questions. Probe the assistant (or raw search) on sensitive topics. Inconsistencies surface fast.

Examples of trick questions

  • "What is the exact deadline for a refund?" (reveals 14 days vs 30 days depending on the version)
  • "Is the manager's approval needed before or after the request?" (two divergent HR procedures)
  • "Does the Pro plan include SSO?" (obsolete product page vs up-to-date pricing grid)
  • "What is the current remote-work policy?" (2021 note vs 2025 agreement)

A concrete example

Two documents on refunds: Refund-policy-v4.pdf (March 2026, "14 business days") and a wiki page "Refunds" (undated, "within 30 days"). The RAG, queried about the deadline, may answer "30 days", "14 days", or worse "between 14 and 30 days". The remedy: designate v4 as the source of truth, update or redirect the wiki page, and remove from the served corpus every contradictory version (archived separately, not deleted — traceability has its value).

The general remedy: a single source of truth per topic, and archiving stale versions outside the corpus.

Step 4 — Detect gaps

A gap is a topic on which your base has nothing, or nothing satisfactory — while your users will inevitably ask the question. Two moments to detect them:

Cold, before deployment

Confront the source map with the expected questions. Build a reference set of 50 to 200 questions (a golden set) from three materials: the most frequent tickets/emails of the last 6 months, the recurring questions in meetings, and the intuition of business experts. For each question, check: is there an authoritative document on this point? Questions with no documented answer are your cold gaps. This golden set will be reused for continuous evaluation (RAGAS: context recall, answer relevancy).

Hot, in production

Tap into real queries, where users themselves tell you where the holes are:

  • queries with a low retrieval score (no chunk above the similarity threshold);
  • unsourced answers, or ones flagged "information not found";
  • repeated "I don't know" on related themes;
  • successive rephrasings of the same question (a sign the user didn't get their answer).

Aggregate these signals by theme, rank by frequency, and you get a gap queue prioritised by real demand — far more reliable than any guesswork.

Gaps are not guessed, they are measured. Before launch, by anticipation; afterwards, by listening to real queries. A mature corpus is one whose gaps are filled at the pace of needs.

Step 5 — Govern: Knowledge Ops

A one-off audit doesn't hold: the corpus is alive. Governance — what's sometimes called Knowledge Ops — turns the audit into a durable process.

The roles

  • Knowledge Owner (per domain) — responsible for the authority and freshness of their scope (support, product, HR…). It's they who validate that a document is "authoritative".
  • Knowledge Steward / curator — drives the cross-functional process: keeps the source map, prioritises the queue, enforces the ingestion rules.
  • Contributors — anyone who creates or corrects content, within a simple framework.
  • Sponsor (CTO, Head of AI or COO) — carries the trade-offs and indicators at leadership level.

The review cycle

Set the frequency to the volatility of the topic:

  • critical and volatile documents (pricing, compliance, security): quarterly review;
  • stable reference documents: annual review;
  • past the deadline, the document moves to "to review" status and, if it isn't validated, automatically leaves the served corpus.

The workflow

Creation/correction (contributor)
        │
   Knowledge Owner review (authority + freshness)
        │
   "approved" status + review date + owner
        │
   ingestion into the served corpus
        │
   auto-expiry at the deadline ─► back to review

The ingestion rules

Write them down in black and white. Example: documents enter the served corpus if they have "approved" status, a score ≥ 16, and no uncontrolled personal data; kept out are drafts, archives, raw mailbox exports, and any untreated sensitive content (anonymisation, access control). Each source on the map carries an explicit and revisable ingestion status.

  • Clear ownership — each source has an identified owner, accountable for its quality and freshness.
  • A review cycle — critical documents are re-read at a defined interval; obsolete ones are archived.
  • A contribution workflow — creating or correcting a document must be simple, otherwise debt accumulates.
  • A loop with usage — gaps and contradictions detected in production feed a prioritised work queue.
  • Ingestion rules — which documents enter the served corpus, which stay out.
Sources ─► audit (inventory, scoring, contradictions, gaps)
                          │
                  served corpus (authoritative, fresh, coherent)
                          │
              real usage ─► gap/contradiction detection
                          │
              prioritised queue ─► creation/correction ─► (loop)

Step 6 — Measure corpus health

What isn't measured isn't improved. A few corpus health indicators, with indicative targets to calibrate against your maturity:

  • Coverage rate — share of golden-set questions covered by at least one authoritative source. Target: > 90%.
  • Freshness rate — share of critical documents reviewed within the defined period. Target: > 95% on time.
  • Resolved-contradictions rate — topics endowed with a single source of truth vs topics left ambiguous. Target: 100% on the identified sensitive topics.
  • Gap rate — share of production queries with no satisfactory answer (low score, unsourced). Target: a decreasing trend, < 5% at maturity.
  • Sourced-answer rate — the synthesis indicator: share of answers backed by at least one authoritative source. Target: > 85%. If it's high, your corpus is in good health.

Also track the median age of served documents and the number of documents in "to review" status: two early warning signals of degradation. These metrics make corpus quality a steerable object, whose progress can be justified to leadership — and not a fuzzy intuition.

A mini end-to-end case

A European SaaS SME deploys an internal assistant for its L1 support. How the audit unfolded:

  1. Inventory — 6 sources mapped. Key discovery: 70% of the volume lives in an "archives" Drive that nobody maintains but agents still consult.
  2. Scoring — out of 1,100 documents, only 240 reach a score ≥ 16. The rest is discarded or put in the refresh queue. The initial served corpus goes from "everything" to 240 documents.
  3. Contradictions — clustering reveals 3 topics with double truth (refunds, SLA, trial conditions). A source of truth is designated for each; 9 stale documents are archived.
  4. Gaps (cold) — a golden set of 80 questions reveals 11 uncovered topics (including annual billing and customer-side GDPR). Drafting tickets created.
  5. Knowledge Ops — each source receives an owner; quarterly review for the volatile, annual for the stable; ingestion rules written.
  6. Measurement — at launch: coverage 86%, sourced answers 78%. After two months of listening to real queries (hot gaps filled): coverage 94%, sourced answers 89%.

Result: a smaller corpus (240 vs 1,100 documents) but a healthier one, and an assistant that answers correctly and with sources. Quality didn't come from a better model — it came from a better corpus.

Audit pitfalls

  • Ingesting everything "just in case" — exhaustiveness is the enemy of relevance. Audit to discard, not just to collect.
  • Auditing once, then forgetting — without Knowledge Ops, the corpus re-degrades within a few months.
  • Confusing volume with value — 10,000 mediocre documents are worth less than 500 authoritative ones.
  • Ignoring tribal knowledge — the best answers are sometimes in the experts' heads, never written down. The audit must also reveal what's missing in writing.
  • Neglecting sensitivity — an audit is also an opportunity to spot the personal and confidential data to handle separately (see our GDPR articles).
  • Believing the pipeline will catch everything — reranking, hybrid, contextual retrieval improve retrieval, never the truth of a false document.

Conclusion

Knowledge auditing is the invisible work that decides the visible success of your RAG. Before optimising the slightest retrieval parameter, take the time to inventory, score, purge contradictions and map gaps — then install governance that maintains this quality over time. An AI assistant will never be better than the knowledge entrusted to it. The good news is that this effort benefits the whole organisation, far beyond AI: a healthy knowledge base is an asset in its own right. AI is merely its most merciless revealer.

Further reading

  • Detecting the gaps in your base with user queries
  • RAG-augmented customer support: halving resolution time without losing quality
  • Sovereign, GDPR-compliant RAG: the complete 2026 guide

Ready to ground your agents in your data?

Start free. First Knowledge Pulse audit in 60 seconds.

Start free