wololo
Get access

Knowledge Graph

What is it? (Start here)

Imagine a detective's incident board — photos pinned to a corkboard with strings connecting them. "Person A worked with Person B, who used Tool C, which touched System D." That's a knowledge graph. It's not a list of facts; it's a map ofhow things connect.

Regular search finds things by similarity — type "auth error" and you get text that mentions auth errors. The knowledge graph finds things by relationship — type "what broke auth last month and who was involved?" and it traverses the connections: auth system → recent changes → agents who made them → what they were working on.

A real example

Popashot asks: "What depends on the session token system?"

Without a knowledge graph, you'd grep the codebase and hope you catch everything. With the graph, the query anchors on the entity "session token system", then traverses depends_on edges outward to depth 2. It returns:

session-token-system
  ← depends_on: GraphQL API gateway      (via /api/graphql auth middleware)
  ← depends_on: Discord webhook handler  (via session validation on inbound)
  ← depends_on: Machine auth flow        (via token exchange step)
  ← worked_on:  Cantona                  (last 3 changes)
  ← worked_on:  ZeroCool                 (security review, 2 weeks ago)

In one query, Popashot knows everything that would break if session tokens change, and who to talk to. No grep, no manual archaeology.

KNOWLEDGE GRAPH — STRUCTURAL RETRIEVAL129 entities · 131 relationships · 37 entity sidecars — answers structural queries text search cannotSOURCE DOCS· memory/ logs· bank/ entities· clan-learnings/· corrections.md· daily journalsENTITY EXTRACTORLocal NER model (GGUF)Person · Project · System · FileEvent · Decision · Concept · ToolENTITY STOREbank/entities/ — one .md per entity129entities131relationships37sidecars8entity typesCRDT-merged across all agentsGRAPH INDEXSQLite adjacency + vector embeddings per nodeNodes: entities with attributes + source refsEdges: typed relationships (owns/depends/uses)Sidecar index: per-entity pre-computed neighbourhoodQUERY"Who worked on auth?""What depends on payments?"TRAVERSAL ENGINENamed entity anchors → BFS/DFSFollows typed edges up to depth 3Sidecar shortcut for hot entitiesScores by edge proximity + recencySTRUCTURAL RESULTSEntity-attributed snippets· Relationship chains (A owns B depends C)· Entity neighbourhoods + source files· Confidence + edge-distance scores· Cited — agent can quote + verifyCOMPLEMENTSTEXT RETRIEVALText search: textually similarGraph: structurally relatedTogether: complete picture→ /docs/agent-intelligence

How it works — the mechanism

Entity extraction pipeline

The graph builds automatically from existing knowledge files — agents don't manually curate it. Source documents feed into a local NER (Named Entity Recognition) model:

  • Memory logs, corrections.md, daily journals
  • Bank entity pages (bank/entities/)
  • Clan-learnings patterns and discoveries

The extractor identifies named entities (people, systems, tools, projects, decisions, files) and their relationships, then writes entity pages tobank/entities/ (one Markdown file per entity) and populates a SQLite graph index.

What gets indexed

Eight entity types are tracked:

TypeExamples
PersonCantona, ZeroCool, Tank
ProjectSHOTclubhouse, mission-control, clawdbot
Systemsession-token-system, Convex BFF, GitHub App
Filelib/auth/machine-adapter.ts, AGENTS.md
Eventauth migration, production incident 2026-03-15
Decision"use session token not JWT for SSR"
ConceptCRDT merge, gossip protocol, inbox protocol
Toolclaude, codex, D2, Bitwarden

Relationship types

Edges between entities are typed. The type determines which traversal makes sense:

  • owns — Cantona owns the auth migration task
  • depends_on — GraphQL gateway depends_on session-token-system
  • uses — Tank uses D2 for diagrams
  • worked_on — ZeroCool worked_on the GitHub App integration
  • caused — token format mismatch caused the auth outage
  • supersedes — new pattern supersedes old anti-pattern

Sidecar indexes — speed optimisation

For the 37 most frequently queried entities, pre-computed "sidecar" indexes cache the full neighbourhood up to depth 3. A query hitting a hot entity skips live traversal and reads the cache directly — near-instant results. Cold entities use live BFS traversal. The sidecar is a performance optimisation, not a correctness requirement.

How it complements text search

The system uses two parallel retrieval paths:

  • Path A — Text retrieval: vector similarity + BM25 keyword search. Finds information textually similar to your query.
  • Path B — Knowledge Graph: entity-anchored traversal. Finds information structurally related to entities in your query.

Both paths run and results merge. A complete answer to "what broke auth last month and who fixed it?" needs both: textual context (error messages, stack traces) and structural context (who worked on it, what it connects to, what changed). Neither path alone gives the full picture.

Current state

The live graph contains 129 entities, 131 relationships, and 37 entity sidecars.

How it connects to the larger loop

The knowledge graph is fed by everything agents write. Corrections, journal entries, pattern publications — all become source material for the extractor. The more agents write, the richer the graph gets, automatically:

  • Gossip — discoveries.jsonl entries add new entities and relationships
  • Journals — decision logs create Person→worked_on→Task edges
  • Reflection Loop — promoted corrections become Concept nodes with correction history edges
  • GraphRAG deep dive — full extraction pipeline, index schema, traversal algorithms