Research

GraphRAG for Client Domains

A client's institutional knowledge usually lives in three people's heads, a Notion no one keeps current, and a Slack the search can't reach. GraphRAG, built right, turns that into a layer that survives the three people leaving.

The problem GraphRAG solves

Plain retrieval-augmented generation (RAG) does one thing well: it pulls the most semantically similar chunks from a vector store and feeds them to an LLM. It does several things badly. It can't answer questions that require traversing relationships ("what are all the modules that depend on this concept and have been updated since the curriculum review"). It can't enforce ontology ("this term means three different things in three different documents and the model should know which"). It can't reliably distinguish the canonical source from a draft that was supposed to be deleted.

GraphRAG adds an explicit knowledge-graph layer to vector retrieval. Entities and relationships are extracted, stored in a graph database, and queried alongside the vector store. The LLM gets both: the semantic neighbours from the vectors and the structural neighbours from the graph. The answers stop being "approximately right" and start being "right and traceable."

The architecture in three layers

Every GraphRAG build the studio ships has the same three layers. Vendors and storage backends change. The shape doesn't.

LayerWhat lives thereWhat it answers
Corpus + provenance The cleaned, chunked source documents. Every chunk keeps source URL, version, author, date, access permissions. "Where exactly did this claim come from."
Vector store Embeddings of chunks at multiple granularities: sentence, paragraph, document summary. "What in the corpus is semantically similar to this question."
Knowledge graph Entities (people, concepts, products, events, modules). Relationships (depends-on, supersedes, taught-by, contradicts). "How do these things relate to each other, and what's connected to what."

The retrieval pipeline queries the graph first to scope candidates, then the vector store to rank within scope, then the LLM to synthesize. Graph-then-vector is usually faster, more accurate, and cheaper than vector-only against the same questions.

Entity extraction: the part that's actually hard

The technical primitives are well-trodden. LLMs do reasonable entity extraction out of the box. The hard part is the ontology, which is the list of entity types and relationship types the graph is allowed to contain.

Off-the-shelf ontologies don't fit specific client domains. A curriculum company's "Concept" entity has different relationships than a clinical-research company's "Concept" entity, even though both use the same word. Letting the LLM invent the ontology on the fly produces a graph that drifts, where the same real-world thing has fifteen labels and no two queries return consistent results.

The discipline the studio uses:

  1. Draft a candidate ontology with the client. Two hours, three people, one whiteboard. List the entity types that matter and the relationships that matter. Disagree explicitly when the team disagrees.
  2. Run extraction against a representative slice (50–100 documents). Pin the ontology to the agent. Don't let it improvise types.
  3. Review the result with the client. What did it miss. What did it make up. Where did the ontology turn out wrong.
  4. Iterate the ontology, not the extraction. Two or three passes is usually enough to converge.
  5. Lock the ontology, version it, and run the full extraction. Future drift is now a deliberate ontology change, not silent.

The graph is only as good as the ontology. The ontology is only as good as the time spent arguing about it with the people who actually understand the domain.

Embedding strategy

Three choices matter more than the rest.

Granularity. Embed at multiple scales. A sentence-level embedding for precise lookups, a paragraph-level for thematic similarity, a document summary embedding for "find related documents." Single-granularity vector stores miss half the questions.

Model choice. Domain-specific embeddings (legal, clinical, code) usually outperform general-purpose embeddings within their domain. Test against held-out queries before committing. General-purpose is the right default when no domain-tuned model exists for the client's space.

Hybrid retrieval. Combine vector similarity with BM25 keyword search. Almost every real-world corpus has terms (product codes, person names, acronyms) that semantic similarity handles poorly. The hybrid recovers them.

Query patterns that work

Four query patterns cover most client requests against a GraphRAG layer.

Example: educational content at scale

Crash Course is publicly known to publish across more than sixty subjects. That's a large, longitudinal corpus where the same concept appears differently across math, biology, philosophy, and engineering. The studio's general approach for a corpus at that scale: extract a unified concept entity, link each appearance in each subject as a separate node typed as "treatment," with relationships for "prerequisite-of," "elaboration-of," "contradicts."

The payoff is queries the team couldn't run before. "Show me every treatment of probability across the catalogue, ordered by prerequisite depth." "Find concepts that appear in three or more subjects but are introduced differently each time." Curriculum decisions stop being guesses and become structured queries against the team's own work.

Crash Course-related case-study figures on this site are graded REPORTED. The methodology described here is the general pattern the studio applies in educational engagements at that scale.

Example: health domain (general principles)

Tyler's role at BEKIN Health puts the studio close to health and longevity work. We don't share BEKIN-specific architecture publicly. The general principles that carry to any health-tier engagement are worth naming.

Privacy as a first-class graph property. Every node and edge carries an access-classification tag. Queries that would join across classifications get blocked at the query layer, not the application layer. The graph database is the right place to enforce this because it's the only place that sees the joins.

Provenance is non-negotiable. Every claim in a health context has to be traceable to a source the clinician would accept. The graph stores citations as edges to source nodes, not as free-text fields.

Ontology requires clinical sign-off. The ontology argument involves people with clinical credentials, not just the engineering team. The graph that gets built will encode a clinical view of the world. Making sure that view is correct is a sign-off step, not an internal call.

Drift detection is monitored. When the literature changes, the graph has to know. Either through a scheduled re-extraction pass or through manual updates with version stamps. A health graph that silently drifts out of date is worse than no graph at all.

What makes it survive turnover

The thing clients actually buy when they buy a GraphRAG build is that the next team can use it. Three properties matter for that.

Versioned ontology in the repo, not in someone's head. Provenance on every chunk, every entity, every edge, so the next team can audit any claim back to source. Documented query patterns with worked examples, so the next team doesn't have to rediscover the four queries that answer 80% of business questions.

A GraphRAG layer without those three properties is a research artifact. A GraphRAG layer with them is institutional memory.