Distributed Knowledge System

What is it? (Start here)

Imagine the internet before email. If you wanted to tell 10 people something, you had to call each one individually. Now imagine email — you write once, it reaches everyone, and if someone's offline they get it when they reconnect. No central post office required. That's roughly what the distributed knowledge system does for agents.

When one agent learns something — a workaround, a pattern, a warning — that knowledge propagates to every other agent automatically, without a central server coordinating the transfer. An agent on a laptop in London and an agent on a GCP VM in Frankfurt can both update the same knowledge base simultaneously, with no conflicts and no data loss, even if they're temporarily disconnected from each other.

The approach combines two proven distributed systems primitives: gossip protocols for propagation and CRDTs (Conflict-free Replicated Data Types) for conflict-free merging. This design is inspired by academic and open-source work on distributed systems; the implementation is wololo-native.

A real example

Tank discovers that Convex cold-starts after 2 hours of inactivity, causing 30-second delays. He's on the GCP VM. Slash is working on a laptop in London. They're not chatting. But when Tank publishes his discovery todiscoveries.jsonl, and Slash starts her next session, she tails the last 20 entries of that file and sees Tank's entry. She already knows about the cold-start before she touches anything Convex-related. No message sent. No meeting. The knowledge transferred automatically.

This works even if Slash's laptop was offline for 3 days. When she reconnects, the files sync, and the knowledge is there. The system is designed to work with intermittent connectivity, not despite it.

Why Not Just a Database?

A central database creates a single point of failure, requires network connectivity for every operation, and introduces coordination bottlenecks. Agent fleets need to:

Work offline and sync later (laptop agent disconnects, reconnects)
Handle concurrent updates without locks (multiple agents learn simultaneously)
Tolerate network partitions (cloud agent and local agent temporarily isolated)
Scale without a coordinator bottleneck

Gossip + CRDT gives us eventual consistency with zero coordination overhead. Every agent reads and writes locally, and state converges automatically when agents communicate.

Gossip Protocol

Gossip is an epidemic broadcast protocol. Each agent periodically shares its state with a small number of peers, who share with their peers, and so on. Information propagates through the fleet like a rumor — quickly, reliably, and without any central broker.

How It Works

Heartbeat — each agent periodically selects a random peer and sends a state digest (what it knows, with version vectors)
Diff — the receiving agent compares digests and identifies what's new or updated
Sync — only the delta (new/updated state) is exchanged, minimizing bandwidth
Merge — received state is merged using CRDT rules, guaranteeing convergence

Propagation Guarantees

With n agents and a fan-out of f peers per round, information reaches all agents in O(log_f n) rounds. For a fleet of 7 agents with fan-out 2, full propagation takes ~3 rounds (seconds, not minutes).

Topology

The default topology is mesh — every agent can gossip with every other agent. For larger fleets, a hub-spoke topology reduces connection count while maintaining propagation speed. The orchestrator (Popashot) serves as a natural hub without being a required coordinator.

CRDTs (Conflict-free Replicated Data Types)

CRDTs are data structures that can be updated independently on different nodes and always merged to a consistent state — without coordination, without locks, without conflict resolution logic. The math guarantees convergence regardless of message ordering or delivery timing.

CRDT Types Used

CRDT	What it does	Use case
G-Counter	Grow-only counter, each node increments its own slot	Occurrence counts, completion metrics
LWW-Register	Last-Writer-Wins register with timestamps	Agent status, config values, pattern updates
OR-Set	Observed-Remove set — add and remove without conflicts	Knowledge facts, entity lists, pattern entries
LWW-Map	Map where each key is an LWW-Register	Shared configuration, entity attributes

Merge Semantics

Every CRDT merge is commutative (order doesn't matter), associative (grouping doesn't matter), and idempotent (applying the same update twice has no effect). This means:

Messages can arrive out of order — the result is the same
Messages can be duplicated — no harm done
Partitioned agents merge cleanly when they reconnect
No conflict resolution needed — the data type resolves it mathematically

What Gets Shared

Agent State (via LWW-Register)

Each agent publishes its current state: what it's working on, its health, its inbox status. Other agents see this state and can route work accordingly — if an agent goes offline, others know within seconds.

Shared Knowledge (via OR-Set)

Corrections, learnings, and patterns discovered by one agent propagate to the fleet. When an agent learns something non-obvious, that discovery flows to all others through gossip — without anyone needing to explicitly route it. See Discovery Gossip.

Task Coordination (via OR-Set + LWW-Register)

The inbox task queue is backed by CRDT semantics so that assignments, acknowledgements, and completions are conflict-free even under concurrent access. No task is lost, even during brief network partitions.

CAP Tradeoffs

This is an AP system (Available + Partition-tolerant, eventually Consistent):

Always writable — agents never block waiting for consensus
Partition-tolerant — isolated agents keep working, merge later
Eventually consistent — given enough gossip rounds, all agents converge
Not strongly consistent — two agents may briefly see different state

For agent coordination this is the right tradeoff. Strong consistency would require a consensus protocol (Raft, Paxos) with leader election — adding latency and operational complexity that isn't justified for asynchronous agent work.

Leaderless Design

The system is leaderless by default. No agent is structurally special. Any agent can go offline and the system continues. For workflows that benefit from centralized dispatch (the orchestrator pattern), the orchestrator role is a convention, not a system requirement. If Popashot goes offline, other agents continue their work and Popashot catches up on reconnect.

How it connects to the larger loop

The distributed knowledge system is the transport layer for everything else in Agent Intelligence:

Gossip & CRDT — the practical channel: discoveries.jsonl and patterns.jsonl, how to publish, when to publish, CRDT merge mechanics
Reflection Loop — corrections that hit HOT tier get published to patterns.jsonl and become fleet-wide knowledge via this system
Skills Evolution — adopted skills and their scores propagate via clan-learnings so every agent knows what's available
Knowledge Graph — entity data from all agents merges using the same CRDT semantics described here