wololo
Get access

Reflection Loop

The reflection loop is the engine of fleet intelligence. Every time an agent makes a mistake — and gets corrected — that correction becomes permanent knowledge, not just for the agent that made the error, but for every agent in the fleet. No human needs to write documentation. No one needs to update a wiki. The loop runs automatically and the fleet gets measurably smarter over time.

This is the headline story of the self-improving system. Everything else — journals, gossip, skills evolution — feeds this loop or distributes its outputs.

THE REFLECTION LOOP — CLOSED-LOOP SELF-IMPROVEMENTEvery correction drives permanent improvement — no human intervention required after the initial signalCORRECTION TRIGGERS👤Human says "no / wrong"🤖Agent corrects agent💥Build / test fails🔍Self-observed mistakeimmediatecorrections.mdAppend-only. No batching.date · wrong · correct · patternBefore next reply — alwaysOCCURRENCECOUNTER3rd same pattern = promote3rd+HOT TIERmemory.mdLoaded every sessionCannot be forgottenSurvives compaction · permanent3 consecutivefailuresMUTATION PROTOCOLDeclare in JOURNAL.md before retryingM1FallbackM2ExtremeM3HardcodeM4SwitchM5ReduceM6InvertM7AskM8ReadSuccessful mutations → clan-learnings/patterns.md during sleep cycleclan-learnings/patterns.jsonl — CRDT source of truth· OR-Set: adds (concurrent safe)· LWW-Register: updates (last-write)· G-Counter: occurrences (max-merge)· render-patterns → patterns.mdpatternsdiscoveries.jsonlNon-obvious findings only· Workarounds > 15 min· Undocumented API behaviors· NOT routine fixes or secretsFLEET GOSSIPconsumed at session start· Popashot 🎯 — tail -20 on start· Cantona ⚽ — tail -20 on start· Splinter 🐀 — tail -20 on start· Velma 🔍 — tail -20 on start· Tank 📡 — tail -20 on start· ZeroCool 🔒 — tail -20 on start· Slash 🎸 — tail -20 on startevery agent starts smarter next session283 patterns · 129 entities · 7 agents · 1 loop

Correction triggers — the entry point

Four signals open the loop: a human saying "no", "wrong", "actually", "stop", "don't", "should be"; another agent correcting; a build or test failure caused by the agent's action; or a self-observed mistake during a session. All four are treated identically — the correction gets logged immediately.

The rule is absolute: log before the next reply. Not at end of session. Not batched with other corrections. One correction, one entry, right now.

corrections.md — the write path

Every correction entry has four fields: date/time, what went wrong, the correct approach, and the generalizable pattern. The pattern field is the most important — it turns a specific mistake into a reusable lesson. "I used the wrong flag on this CLI" is not a pattern. "Always check CLI version before using --json flag — older versions use --format=json" is a pattern.

Occurrence counting — the promotion gate

At the third occurrence of the same pattern (same root cause, same generalizable lesson), a promotion fires. The corrected understanding moves fromcorrections.md (reviewed but not always loaded) tomemory.md HOT tier (loaded every single session, every turn, always in context). From that point forward, the agent cannot forget it. It is permanent.

Mutation protocol — the escape hatch for stuck approaches

When three consecutive failures hit the same problem — same approach, same error — the mutation protocol activates. The agent must stop, declare a named strategy shift in JOURNAL.md, and continue with the new strategy only after declaring. Retrying the same approach a 4th time without declaration is a protocol violation.

The declaration looks like this:

## MUTATION: M5 (Reduce Scope)
Failed 3x on: getting the full pipeline working end-to-end
Switching to: build just the ingestion step in isolation
Rationale: problem is too large to debug whole — find the
           minimal failing case, then expand

The 8 mutation strategies

Eight named strategies cover every meaningful pivot. Each has a specific trigger and a concrete example of when to apply it.

StrategyWhen to useReal example
M1 — Fallback DependencyAn external dep is unreachable, broken, or rate-limitedOpenAI API keeps timing out → switch to Anthropic for this task
M2 — Extreme Debug LoggingYou can't see what's failing — the system is a black boxBuild keeps failing silently → add verbose logging to every step before making any more changes
M3 — Hardcode & IsolateToo many variables — can't tell which one is failingAuth flow broken → hardcode the token, remove all dynamic parts, verify the raw request works
M4 — Switch LibrariesThe library itself is the bug — not your usage of itPDF parsing library corrupts Unicode → switch to a different parser entirely
M5 — Reduce ScopeProblem is too large to debug as a wholeFull pipeline broken → build just the ingestion step in isolation, get that green, then expand
M6 — Invert the ApproachThe entire approach is wrong — not the implementationTrying to diff HTML → stop, work with the DOM AST instead, HTML diffing is the wrong layer
M7 — Ask the OracleYou need knowledge you don't have — and it probably exists somewhereFailing on obscure Convex behaviour → search clan-learnings, discoveries.jsonl, GraphRAG before writing another line
M8 — Read the ErrorYou're ignoring an obvious signal in the error outputFailing with "missing field: organizationId" → stop, read that literally, pass organizationId — don't keep tweaking other things

Successful mutations don't just help the current agent — they propagate toclan-learnings/patterns.md during the sleep cycle, making the strategy available to every agent for similar situations.

HOT tier promotion — permanent memory

Once promoted to memory.md, a correction is HOT. It loads at the start of every session, before any task context, before any tool calls. It cannot be evicted by context pressure. It survives compaction. The agent starts with it already applied, before encountering any situation that might trigger the original mistake.

Fleet distribution — clan-learnings and gossip

HOT-tier patterns that have fleet-wide relevance get published toclan-learnings/patterns.jsonl — the append-only CRDT source of truth consumed by all seven agents. Separately, non-obvious findings discovered during the work (workarounds over 15 minutes, undocumented behaviors) go todiscoveries.jsonl. Both are read at session start by every agent. The fleet learns from every correction, not just the agent that made the original mistake.

Deeper dives