Your AI pilots dazzle in demos and disappoint in production? The culprit is rarely the model. Here are the five signs that reveal a knowledge base not ready for RAG — and how to fix each one.
Most enterprise AI projects fail not because of the model, but because of the data layer. You can plug in the best LLM on the market, add a Cohere Rerank 3 reranker and a perfectly tuned HNSW index: if the knowledge base you feed it is dirty, stale, or riddled with gaps, your agent will confidently answer nonsense. That's the classic RAG-in-production trap — you polish retrieval and generation, and neglect the raw material.
The good news is that the symptoms of a base that "isn't ready" are recognizable. Here are the five most common ones, the ones we find in nearly every audit. For each: why it poisons a RAG, what it looks like in practice, and how to fix it without rebuilding everything.
A RAG doesn't invent quality. It amplifies what you give it. A mediocre base produces a mediocre assistant — only faster and more self-assured.
1. Your critical documents live in Slack
Important decisions, exceptions, operational workarounds — they all live in threads. "We decided to stop charging shipping fees for Gold customers, valid until further notice": that sentence exists nowhere in an official procedure. It's in an #ops channel between a joke and a GIF.
Why it's a problem for a RAG. Knowledge in Slack is unstructured, contextual, and ephemeral. A message depends on the ten preceding messages to make sense; isolated into a chunk, it becomes ambiguous or misleading. Worse: Slack stacks contradictory decisions over time, with no marker indicating which one is authoritative today. Ingest everything and you inject noise; ingest nothing and your agent ignores real business rules your teams apply every day.
Concrete example. A support agent asked about the shipping policy answers "€4.90 regardless of order size," citing a 2022 PDF, while the living rule — waiver for Gold customers — never left a thread. The Gold customer is wrongly charged, the rep trusts the agent, the incident escalates.
How to fix it.
- Set up a promotion governance: anything that emerges from a conversation and becomes a rule must be promoted into a documented source (procedure page, decision log). Slack is a place to debate, not a source of truth.
- Ingest Slack channels selectively and enriched: take only the channels with documentary value, and use a contextual retrieval technique (prefix each chunk with a short LLM-generated context summarizing the thread) so an isolated message stays interpretable.
- Date and trace it: every item ingested from Slack must carry its date and author, to enable arbitration when contradictions arise.
2. Your PDFs have never been audited for freshness
The official version of your refund policy dates from 2021. Nobody knows if it's still valid. It sleeps in a Drive/Legal/Final/ folder. Your agent will cite it as gospel.
Why it's a problem for a RAG. A retrieval system has no native notion of "stale." A 2021 chunk and a 2025 chunk look identical in vector space; if the obsolete document is better written or semantically closer to the question, it will be retrieved and cited first. Freshness is not a dimension the embedding captures — it's metadata you must manage explicitly.
Concrete example. A company migrates to a new pricing schedule in January. The old grid, in PDF, stays in the base. For six months, the sales agent cites sometimes the old one, sometimes the new, depending on how the question is phrased. Quotes go out wrong. Nobody understands why until the audit.
How to fix it.
- Attach lifecycle metadata to each document: creation date, last review date, expiration date, responsible owner.
- Set up a periodic review: any document not reviewed in N months is flagged "freshness not guaranteed" and can be downgraded or excluded from retrieval.
- Use this metadata in the pipeline: filter or penalize stale chunks at retrieval time, and surface the source date in the agent's answer so the user can judge.
Semantic relevance says nothing about temporal validity. A RAG that doesn't manage freshness cites the past with the confidence of the present.
3. You have 47 versions of the same document
"HRPolicyv3FINAL.docx", "HRPolicyv3FINALREVISED.docx", "HRPolicyv3FINALOKOK.docx". Which one is the truth? You don't know, and your ingestion system knows even less.
Why it's a problem for a RAG. Near-identical duplication is one of the worst enemies of retrieval. Multiple versions of the same text saturate the top-k: of the five chunks retrieved, three say the same thing with minor variations, and the agent believes it sees strong convergence when it's looking at the same document copied over. Worse, two versions may diverge on a key point (2-month notice vs 3-month notice), and the model, seeing both, hallucinates a compromise or picks at random.
Concrete example. An employee asks the HR assistant about the resignation notice period. The base contains three versions of the policy. One says 2 months, another 3 months, the last mentions nothing. The agent answers "between 2 and 3 months depending on the case" — a fabricated answer found in no document, born from the collision of versions.
How to fix it.
- Establish a single source of truth per topic: one canonical document per policy, per procedure, per template contract. Everything else is archive, out of ingestion scope.
- Set up real versioning (not a file-naming convention): a stable identifier, a version number, a status (draft, published, obsolete). Only published enters the index.
- Detect near-duplication at ingestion time (high similarity between chunks) and deduplicate: keep the canonical version, discard the copies.
4. Your sources are scattered across 6 tools
Notion for processes, Drive for contracts, SharePoint for reports, Confluence for tech, Slack for decisions, GitHub for code. Each tool has its own access rights, formats, update cadence. No agent will give you a unified view without prior ingestion work.
Why it's a problem for a RAG. A real business question crosses silos: "What's the onboarding process for an Enterprise customer?" mobilizes a Notion procedure, a Drive template contract, a Confluence checklist, and a GitHub script. If your RAG indexes only one source, it answers partially and with aplomb. Fragmentation also creates format inconsistencies (the same concept named differently across tools) that degrade semantic retrieval.
Concrete example. The onboarding agent perfectly describes the steps documented in Confluence but ignores the reversibility clause that lives only in the Drive template contract. The salesperson promises one thing, the contract says another. Fragmentation created an invisible blind spot.
How to fix it.
- Set up unified multi-source ingestion: connectors to each tool (Notion, Drive, SharePoint, Confluence, Slack, GitHub) that normalize content into a common format, with provenance metadata.
- Preserve access rights at ingestion: the knowledge layer must respect each user's perimeter, not flatten everything into public access.
- Standardize vocabulary and metadata: a common schema (topic, owner, date, status) across sources, so that hybrid retrieval (BM25 + dense, RRF fusion) works on a coherent corpus rather than six disparate islands.
An agent is only as good as the reach of what it can access. Six silos means six chances to answer halfway.
5. You've never measured your documentation coverage
What percentage of the questions your team asks have an answer in your knowledge base? If you don't know, your agent won't either — and it will fill the void by inventing.
Why it's a problem for a RAG. A RAG with no coverage measurement is blind to its own holes. When no relevant source exists, a poorly designed system doesn't say "I don't know": it retrieves the least-distant chunks, even mediocre ones, and generates a plausible but baseless answer. The most dangerous hallucinations don't come from a bad model, but from a documentation gap nobody mapped.
Concrete example. Users regularly ask about the international refund procedure. No document covers it. Rather than admit ignorance, the agent extrapolates from the domestic procedure and invents timelines. Nobody notices until user queries are analyzed.
How to fix it.
- Build a golden set of real questions (from support, user queries, business interviews) and measure the correct-answer rate with a framework like RAGAS (notably context recall and answer relevancy).
- Analyze production user queries: questions with low retrieval scores flag the priority gaps to close with documentation.
- Make coverage a metric tracked over time, not a one-off audit. Every source added, every document removed shifts your coverage — it must be monitored as a quality indicator.
The verdict
None of these five signs is fatal on its own. Taken together, they explain why so many AI pilots dazzle in demos and disappoint in production: the model is good, the data isn't. The right sequence isn't "pick an LLM then plug in the data," but "audit the data, govern it, then plug in an LLM." Governance, single source of truth, versioning, multi-source ingestion, coverage measurement: these are foundational efforts, not hyperparameter tweaks.
Doing this inventory by hand is tedious. That's exactly what Knowledge Pulse automates: it analyzes your base, detects these five families of problems, and returns a maturity score from 0 to 100 with the evaluated dimensions and prioritized recommendations. 60 seconds, free — enough to know where you stand before investing in the AI layer.
Further reading
- Auditing your knowledge base before AI: the complete method
- Detecting gaps in your base with user queries