Knowledge Graph

What is it? (Start here)

Imagine a detective's incident board — photos pinned to a corkboard with strings connecting them. "Person A worked with Person B, who used Tool C, which touched System D." That's a knowledge graph. It's not a list of facts; it's a map ofhow things connect.

Regular search finds things by similarity — type "auth error" and you get text that mentions auth errors. The knowledge graph finds things by relationship — type "what broke auth last month and who was involved?" and it traverses the connections: auth system → recent changes → agents who made them → what they were working on.

A real example

Popashot asks: "What depends on the session token system?"

Without a knowledge graph, you'd grep the codebase and hope you catch everything. With the graph, the query anchors on the entity "session token system", then traverses depends_on edges outward to depth 2. It returns:

session-token-system
  ← depends_on: GraphQL API gateway      (via /api/graphql auth middleware)
  ← depends_on: Discord webhook handler  (via session validation on inbound)
  ← depends_on: Machine auth flow        (via token exchange step)
  ← worked_on:  Cantona                  (last 3 changes)
  ← worked_on:  ZeroCool                 (security review, 2 weeks ago)

In one query, Popashot knows everything that would break if session tokens change, and who to talk to. No grep, no manual archaeology.

How it works — the mechanism

Entity extraction pipeline

The graph builds automatically from existing knowledge files — agents don't manually curate it. Source documents feed into a local NER (Named Entity Recognition) model:

Memory logs, corrections.md, daily journals
Bank entity pages (bank/entities/)
Clan-learnings patterns and discoveries

The extractor identifies named entities (people, systems, tools, projects, decisions, files) and their relationships, then writes entity pages tobank/entities/ (one Markdown file per entity) and populates a SQLite graph index.

What gets indexed

Eight entity types are tracked:

Type	Examples
Person	Cantona, ZeroCool, Tank
Project	SHOTclubhouse, mission-control, clawdbot
System	session-token-system, Convex BFF, GitHub App
File	lib/auth/machine-adapter.ts, AGENTS.md
Event	auth migration, production incident 2026-03-15
Decision	"use session token not JWT for SSR"
Concept	CRDT merge, gossip protocol, inbox protocol
Tool	claude, codex, D2, Bitwarden

Relationship types

Edges between entities are typed. The type determines which traversal makes sense:

owns — Cantona owns the auth migration task
depends_on — GraphQL gateway depends_on session-token-system
uses — Tank uses D2 for diagrams
worked_on — ZeroCool worked_on the GitHub App integration
caused — token format mismatch caused the auth outage
supersedes — new pattern supersedes old anti-pattern

Sidecar indexes — speed optimisation

For the 37 most frequently queried entities, pre-computed "sidecar" indexes cache the full neighbourhood up to depth 3. A query hitting a hot entity skips live traversal and reads the cache directly — near-instant results. Cold entities use live BFS traversal. The sidecar is a performance optimisation, not a correctness requirement.

How it complements text search

The system uses two parallel retrieval paths:

Path A — Text retrieval: vector similarity + BM25 keyword search. Finds information textually similar to your query.
Path B — Knowledge Graph: entity-anchored traversal. Finds information structurally related to entities in your query.

Both paths run and results merge. A complete answer to "what broke auth last month and who fixed it?" needs both: textual context (error messages, stack traces) and structural context (who worked on it, what it connects to, what changed). Neither path alone gives the full picture.

Current state

The live graph contains 129 entities, 131 relationships, and 37 entity sidecars.

How it connects to the larger loop

The knowledge graph is fed by everything agents write. Corrections, journal entries, pattern publications — all become source material for the extractor. The more agents write, the richer the graph gets, automatically:

Gossip — discoveries.jsonl entries add new entities and relationships
Journals — decision logs create Person→worked_on→Task edges
Reflection Loop — promoted corrections become Concept nodes with correction history edges
GraphRAG deep dive — full extraction pipeline, index schema, traversal algorithms