Distributed Knowledge System
What is it? (Start here)
Imagine the internet before email. If you wanted to tell 10 people something, you had to call each one individually. Now imagine email — you write once, it reaches everyone, and if someone's offline they get it when they reconnect. No central post office required. That's roughly what the distributed knowledge system does for agents.
When one agent learns something — a workaround, a pattern, a warning — that knowledge propagates to every other agent automatically, without a central server coordinating the transfer. An agent on a laptop in London and an agent on a GCP VM in Frankfurt can both update the same knowledge base simultaneously, with no conflicts and no data loss, even if they're temporarily disconnected from each other.
The approach combines two proven distributed systems primitives: gossip protocols for propagation and CRDTs (Conflict-free Replicated Data Types) for conflict-free merging. This design is inspired by academic and open-source work on distributed systems; the implementation is wololo-native.
A real example
Tank discovers that Convex cold-starts after 2 hours of inactivity, causing 30-second delays. He's on the GCP VM. Slash is working on a laptop in London. They're not chatting. But when Tank publishes his discovery todiscoveries.jsonl, and Slash starts her next session, she tails the last 20 entries of that file and sees Tank's entry. She already knows about the cold-start before she touches anything Convex-related. No message sent. No meeting. The knowledge transferred automatically.
This works even if Slash's laptop was offline for 3 days. When she reconnects, the files sync, and the knowledge is there. The system is designed to work with intermittent connectivity, not despite it.
Why Not Just a Database?
A central database creates a single point of failure, requires network connectivity for every operation, and introduces coordination bottlenecks. Agent fleets need to:
- Work offline and sync later (laptop agent disconnects, reconnects)
- Handle concurrent updates without locks (multiple agents learn simultaneously)
- Tolerate network partitions (cloud agent and local agent temporarily isolated)
- Scale without a coordinator bottleneck
Gossip + CRDT gives us eventual consistency with zero coordination overhead. Every agent reads and writes locally, and state converges automatically when agents communicate.
Gossip Protocol
Gossip is an epidemic broadcast protocol. Each agent periodically shares its state with a small number of peers, who share with their peers, and so on. Information propagates through the fleet like a rumor — quickly, reliably, and without any central broker.
How It Works
- Heartbeat — each agent periodically selects a random peer and sends a state digest (what it knows, with version vectors)
- Diff — the receiving agent compares digests and identifies what's new or updated
- Sync — only the delta (new/updated state) is exchanged, minimizing bandwidth
- Merge — received state is merged using CRDT rules, guaranteeing convergence
Propagation Guarantees
With n agents and a fan-out of f peers per round, information reaches all agents in O(logf n) rounds. For a fleet of 7 agents with fan-out 2, full propagation takes ~3 rounds (seconds, not minutes).
Topology
The default topology is mesh — every agent can gossip with every other agent. For larger fleets, a hub-spoke topology reduces connection count while maintaining propagation speed. The orchestrator (Popashot) serves as a natural hub without being a required coordinator.
CRDTs (Conflict-free Replicated Data Types)
CRDTs are data structures that can be updated independently on different nodes and always merged to a consistent state — without coordination, without locks, without conflict resolution logic. The math guarantees convergence regardless of message ordering or delivery timing.
CRDT Types Used
| CRDT | What it does | Use case |
|---|---|---|
| G-Counter | Grow-only counter, each node increments its own slot | Occurrence counts, completion metrics |
| LWW-Register | Last-Writer-Wins register with timestamps | Agent status, config values, pattern updates |
| OR-Set | Observed-Remove set — add and remove without conflicts | Knowledge facts, entity lists, pattern entries |
| LWW-Map | Map where each key is an LWW-Register | Shared configuration, entity attributes |
Merge Semantics
Every CRDT merge is commutative (order doesn't matter), associative (grouping doesn't matter), and idempotent (applying the same update twice has no effect). This means:
- Messages can arrive out of order — the result is the same
- Messages can be duplicated — no harm done
- Partitioned agents merge cleanly when they reconnect
- No conflict resolution needed — the data type resolves it mathematically
What Gets Shared
Agent State (via LWW-Register)
Each agent publishes its current state: what it's working on, its health, its inbox status. Other agents see this state and can route work accordingly — if an agent goes offline, others know within seconds.
Shared Knowledge (via OR-Set)
Corrections, learnings, and patterns discovered by one agent propagate to the fleet. When an agent learns something non-obvious, that discovery flows to all others through gossip — without anyone needing to explicitly route it. See Discovery Gossip.
Task Coordination (via OR-Set + LWW-Register)
The inbox task queue is backed by CRDT semantics so that assignments, acknowledgements, and completions are conflict-free even under concurrent access. No task is lost, even during brief network partitions.
CAP Tradeoffs
This is an AP system (Available + Partition-tolerant, eventually Consistent):
- Always writable — agents never block waiting for consensus
- Partition-tolerant — isolated agents keep working, merge later
- Eventually consistent — given enough gossip rounds, all agents converge
- Not strongly consistent — two agents may briefly see different state
For agent coordination this is the right tradeoff. Strong consistency would require a consensus protocol (Raft, Paxos) with leader election — adding latency and operational complexity that isn't justified for asynchronous agent work.
Leaderless Design
The system is leaderless by default. No agent is structurally special. Any agent can go offline and the system continues. For workflows that benefit from centralized dispatch (the orchestrator pattern), the orchestrator role is a convention, not a system requirement. If Popashot goes offline, other agents continue their work and Popashot catches up on reconnect.
How it connects to the larger loop
The distributed knowledge system is the transport layer for everything else in Agent Intelligence:
- Gossip & CRDT — the practical channel: discoveries.jsonl and patterns.jsonl, how to publish, when to publish, CRDT merge mechanics
- Reflection Loop — corrections that hit HOT tier get published to patterns.jsonl and become fleet-wide knowledge via this system
- Skills Evolution — adopted skills and their scores propagate via clan-learnings so every agent knows what's available
- Knowledge Graph — entity data from all agents merges using the same CRDT semantics described here