Vector RAG retrieves isolated passages; GraphRAG links entities via a knowledge graph. Strengths, weaknesses, ingestion cost and a hybrid approach: when the graph genuinely changes the game.
Vector RAG has a blind spot: it retrieves isolated passages without understanding how they connect. As long as your questions are about localized facts ("what does clause X say?"), that is plenty. But as soon as a question requires connecting scattered information ("who worked on projects involving both team A and vendor B?"), vector similarity stalls. That is where GraphRAG comes in.
Here is the concrete difference between the two approaches, and how to decide which your case calls for.
How vector RAG works (and what it misses)
Vector RAG splits documents into passages, turns them into vectors, and retrieves those closest to a query. Fast, robust, and perfect for questions whose answer fits in one or a few passages. Its limit: it treats each passage as an island. It does not know that passage A and passage D refer to the same person, that an entity appears across ten documents, or that two facts complement each other. It retrieves by resemblance, not relationship.
Take an example. You have indexed 800 meeting minutes. Question: "Which vendors were mentioned in meetings where GDPR compliance was also discussed?" Vector RAG looks for the passages closest to that sentence. It might surface a few chunks that mention both vendors and GDPR in the same paragraph — but it will systematically miss the cases where the vendor is cited in the 3 March meeting and the GDPR discussion in the 12 May minutes, even though it is the same meeting series. The connection exists in your data; the vector cannot see it, because it is not written into any single passage.
Vector RAG answers "local" questions well (the answer is in the text). It fails on "global" questions that require linking scattered pieces.
How GraphRAG works
GraphRAG (notably popularized by Microsoft's work, including the open-source implementation of the same name) adds a layer of structure. Instead of storing only vectors, it builds a knowledge graph in four steps:
- Entity and relationship extraction — an LLM scans the corpus chunk by chunk and extracts entities (people, projects, products, organizations) and the links between them, in a structured form (often subject — relation — object triples).
- Graph construction — entities become nodes, relationships become edges. Identical entities mentioned across documents are merged (entity resolution). Scattered information links into a single network.
- Community detection — a graph clustering algorithm (typically Leiden) groups densely connected nodes into thematic communities at several hierarchical levels. Each community is then summarized by the LLM.
- Graph retrieval — answering traverses sub-graphs and community summaries rather than isolated passages. Microsoft distinguishes two modes: global search (queries community summaries for big-picture questions) and local search (starts from an entity and explores its neighborhood for entity-focused questions).
A concrete mini-example
Imagine three chunks from your meeting minutes:
Chunk 1: "Marie Lefort (CIO) approved the contract with Acme Cloud
for sovereign hosting."
Chunk 2: "Acme Cloud operates its datacenters in Roubaix and Strasbourg."
Chunk 3: "At the 12 May committee, Marie Lefort flagged the GDPR
obligations tied to the data transfer."
The extraction step produces triples of this kind:
(Marie Lefort) --[approved]--> (Acme Cloud Contract)
(Marie Lefort) --[holds_role]--> (CIO)
(Acme Cloud Contract) --[concerns]--> (Acme Cloud)
(Acme Cloud) --[hosted_in]--> (Roubaix)
(Acme Cloud) --[hosted_in]--> (Strasbourg)
(Marie Lefort) --[raised]--> (GDPR Obligation)
The resulting sub-graph now links Acme Cloud, GDPR and Marie Lefort even though no single chunk contains all three together. For the question "Does the hosting vendor approved by the CIO raise GDPR concerns?", a local search starting from the Acme Cloud node traverses the edge to Marie Lefort, then to GDPR Obligation, and reconstructs an answer vector would never have assembled. This is exactly the relational query where vector fails: the answer is in no passage, it is in the structure.
The result: GraphRAG excels where vector stalls — global synthesis, entity connection, "big-picture" questions over a corpus.
Strengths and weaknesses
| Criterion |
Vector RAG |
GraphRAG |
| Local factual questions |
Excellent |
Overkill |
| Connection / global synthesis |
Weak |
Excellent |
| Ingestion cost |
Low |
High (LLM graph extraction) |
| Query latency |
Low |
Variable, often higher |
| Maintenance / updates |
Simple |
Complex (graph must evolve) |
| Maturity / tooling |
Very mature |
Improving |
GraphRAG is not "a better RAG." It is a tool for a different kind of question, at a markedly higher ingestion cost and complexity.
The real cost: ingestion
This is the point most teams underestimate. Vector RAG, at ingestion, is one embedding call per chunk — a fraction of a cent, and you are done. GraphRAG runs every chunk through a generative LLM to extract entities and relationships, then one or more extra calls to summarize each community.
Let us set prudent orders of magnitude. A corpus of 10,000 chunks of roughly 600 tokens each:
- Vector: 10,000 embedding calls. With
text-embedding-3-small, the total cost is in the tens of cents.
- GraphRAG: 10,000 extraction calls on a generative model, each consuming the input (the chunk plus a sometimes long extraction prompt) and producing structured output. Depending on the model, expect on the order of tens to hundreds of euros for this corpus alone, before the community-summarization phase and any refinement passes. Microsoft documents cost-reduction strategies (multi-pass "gleanings", cheaper models for some steps) precisely because the bill climbs fast.
The ingestion-cost gap between vector and GraphRAG can reach two to three orders of magnitude. On a living corpus that must be re-ingested regularly, this is not a one-off expense: it is a recurring cost line.
Add time on top: extracting a graph over a large corpus takes hours, even days, where vectorization is massively parallelizable and nearly instant at scale.
When the graph changes the game
GraphRAG adds real value when your questions are relational ("how is X linked to Y across the corpus?"), you need a big picture ("what are the recurring themes/risks across these 500 reports?"), or your domain is rich in interconnected entities (client files, actor networks, technical dependencies, decision genealogies). Conversely, if users mostly ask one-off factual questions, the graph's cost is not justified.
The pragmatic approach: start vector, add the graph if needed
Most projects should start with vector RAG (hybrid + reranking), which covers most needs cheaply. Introduce GraphRAG only when measurement reveals a class of questions — relational, global — that vector consistently fails. An appealing middle path is to combine both: vector for local questions, a graph to enrich context on key entities.
Question ─► classifier
├─ local / factual ──► vector RAG (hybrid + rerank)
└─ relational / global ──► GraphRAG (sub-graphs, communities)
How the question classifier works
The central link in this hybrid architecture is the router placed upstream. Several implementations are possible, from simplest to most robust:
- Heuristics — detect lexical markers ("compare", "relate", "what are all the…", "overview", "how many… in total") that betray a global or relational intent. Fast and free, but brittle.
- Lightweight LLM classifier — a small model (or a short call on a cheap model) receives the question and returns a
local / relational label with a confidence. This is the most common approach in 2026: a few-line prompt, a constrained output, latency on the order of a hundred milliseconds.
- Agentic routing — an agent itself decides whether to query the vector index, the graph, or both, and can chain calls. More powerful, more expensive, harder to make deterministic.
In practice, start with a lightweight LLM classifier plus a fallback: when in doubt, run both retrievals in parallel and let the generation LLM arbitrate over the most relevant context. The marginal cost of a dual retrieval stays modest compared with the error of routing a relational question to vector alone.
GraphRAG pitfalls
- Underestimating ingestion cost — extracting entities and relations over a large corpus is LLM-expensive, as detailed above. Measure on a representative sample before industrializing.
- The aging graph — a living corpus requires updating the graph. And this is not a simple "append": ingesting new documents can create new entities, merge existing nodes, redraw entire communities — hence re-running community detection and summarization. Without an incremental strategy, you re-pay full ingestion at every refresh.
- Extraction quality — a poorly extracted graph propagates false relations, and a false relation is worse than a missing one: it grounds confident but wrong answers. Typical pitfalls: the same entity duplicated under two spellings ("Acme Cloud" vs "ACME"), relations hallucinated by the LLM, relation types inconsistent from chunk to chunk. The extraction model's quality and entity resolution are decisive; plan an evaluation pass on the graph itself, not just on final answers.
- Over-engineering — deploying a graph for questions vector already handled well. This is the most common and most expensive trap: getting excited about the technology before proving the relational question class truly exists in real usage.
Conclusion
Vector RAG and GraphRAG are not opposites: they answer different families of questions. Vector remains the robust, economical default for local questions. GraphRAG becomes relevant when value lies in relationships and the big picture — at a very real cost and complexity. The right decision is made not by trend but by looking at what your users actually ask, measuring where vector fails, and adding the graph only where it genuinely pays back its cost.
Further reading
- Production RAG architecture: from chunking to reranking, the complete guide
- Agentic RAG: when the agent decides what to retrieve (and when to stop)