Knowledge Graph
What is it? (Start here)
Imagine a detective's incident board — photos pinned to a corkboard with strings connecting them. "Person A worked with Person B, who used Tool C, which touched System D." That's a knowledge graph. It's not a list of facts; it's a map ofhow things connect.
Regular search finds things by similarity — type "auth error" and you get text that mentions auth errors. The knowledge graph finds things by relationship — type "what broke auth last month and who was involved?" and it traverses the connections: auth system → recent changes → agents who made them → what they were working on.
A real example
Popashot asks: "What depends on the session token system?"
Without a knowledge graph, you'd grep the codebase and hope you catch everything. With the graph, the query anchors on the entity "session token system", then traverses depends_on edges outward to depth 2. It returns:
session-token-system
← depends_on: GraphQL API gateway (via /api/graphql auth middleware)
← depends_on: Discord webhook handler (via session validation on inbound)
← depends_on: Machine auth flow (via token exchange step)
← worked_on: Cantona (last 3 changes)
← worked_on: ZeroCool (security review, 2 weeks ago)In one query, Popashot knows everything that would break if session tokens change, and who to talk to. No grep, no manual archaeology.
How it works — the mechanism
Entity extraction pipeline
The graph builds automatically from existing knowledge files — agents don't manually curate it. Source documents feed into a local NER (Named Entity Recognition) model:
- Memory logs, corrections.md, daily journals
- Bank entity pages (
bank/entities/) - Clan-learnings patterns and discoveries
The extractor identifies named entities (people, systems, tools, projects, decisions, files) and their relationships, then writes entity pages tobank/entities/ (one Markdown file per entity) and populates a SQLite graph index.
What gets indexed
Eight entity types are tracked:
| Type | Examples |
|---|---|
| Person | Cantona, ZeroCool, Tank |
| Project | SHOTclubhouse, mission-control, clawdbot |
| System | session-token-system, Convex BFF, GitHub App |
| File | lib/auth/machine-adapter.ts, AGENTS.md |
| Event | auth migration, production incident 2026-03-15 |
| Decision | "use session token not JWT for SSR" |
| Concept | CRDT merge, gossip protocol, inbox protocol |
| Tool | claude, codex, D2, Bitwarden |
Relationship types
Edges between entities are typed. The type determines which traversal makes sense:
- owns — Cantona owns the auth migration task
- depends_on — GraphQL gateway depends_on session-token-system
- uses — Tank uses D2 for diagrams
- worked_on — ZeroCool worked_on the GitHub App integration
- caused — token format mismatch caused the auth outage
- supersedes — new pattern supersedes old anti-pattern
Sidecar indexes — speed optimisation
For the 37 most frequently queried entities, pre-computed "sidecar" indexes cache the full neighbourhood up to depth 3. A query hitting a hot entity skips live traversal and reads the cache directly — near-instant results. Cold entities use live BFS traversal. The sidecar is a performance optimisation, not a correctness requirement.
How it complements text search
The system uses two parallel retrieval paths:
- Path A — Text retrieval: vector similarity + BM25 keyword search. Finds information textually similar to your query.
- Path B — Knowledge Graph: entity-anchored traversal. Finds information structurally related to entities in your query.
Both paths run and results merge. A complete answer to "what broke auth last month and who fixed it?" needs both: textual context (error messages, stack traces) and structural context (who worked on it, what it connects to, what changed). Neither path alone gives the full picture.
Current state
The live graph contains 129 entities, 131 relationships, and 37 entity sidecars.
How it connects to the larger loop
The knowledge graph is fed by everything agents write. Corrections, journal entries, pattern publications — all become source material for the extractor. The more agents write, the richer the graph gets, automatically:
- Gossip — discoveries.jsonl entries add new entities and relationships
- Journals — decision logs create Person→worked_on→Task edges
- Reflection Loop — promoted corrections become Concept nodes with correction history edges
- GraphRAG deep dive — full extraction pipeline, index schema, traversal algorithms