Reflection Loop

The reflection loop is the engine of fleet intelligence. Every time an agent makes a mistake — and gets corrected — that correction becomes permanent knowledge, not just for the agent that made the error, but for every agent in the fleet. No human needs to write documentation. No one needs to update a wiki. The loop runs automatically and the fleet gets measurably smarter over time.

This is the headline story of the self-improving system. Everything else — journals, gossip, skills evolution — feeds this loop or distributes its outputs.

Correction triggers — the entry point

Four signals open the loop: a human saying "no", "wrong", "actually", "stop", "don't", "should be"; another agent correcting; a build or test failure caused by the agent's action; or a self-observed mistake during a session. All four are treated identically — the correction gets logged immediately.

The rule is absolute: log before the next reply. Not at end of session. Not batched with other corrections. One correction, one entry, right now.

corrections.md — the write path

Every correction entry has four fields: date/time, what went wrong, the correct approach, and the generalizable pattern. The pattern field is the most important — it turns a specific mistake into a reusable lesson. "I used the wrong flag on this CLI" is not a pattern. "Always check CLI version before using --json flag — older versions use --format=json" is a pattern.

Occurrence counting — the promotion gate

At the third occurrence of the same pattern (same root cause, same generalizable lesson), a promotion fires. The corrected understanding moves fromcorrections.md (reviewed but not always loaded) tomemory.md HOT tier (loaded every single session, every turn, always in context). From that point forward, the agent cannot forget it. It is permanent.

Mutation protocol — the escape hatch for stuck approaches

When three consecutive failures hit the same problem — same approach, same error — the mutation protocol activates. The agent must stop, declare a named strategy shift in JOURNAL.md, and continue with the new strategy only after declaring. Retrying the same approach a 4th time without declaration is a protocol violation.

The declaration looks like this:

## MUTATION: M5 (Reduce Scope)
Failed 3x on: getting the full pipeline working end-to-end
Switching to: build just the ingestion step in isolation
Rationale: problem is too large to debug whole — find the
           minimal failing case, then expand

The 8 mutation strategies

Eight named strategies cover every meaningful pivot. Each has a specific trigger and a concrete example of when to apply it.

Strategy	When to use	Real example
M1 — Fallback Dependency	An external dep is unreachable, broken, or rate-limited	OpenAI API keeps timing out → switch to Anthropic for this task
M2 — Extreme Debug Logging	You can't see what's failing — the system is a black box	Build keeps failing silently → add verbose logging to every step before making any more changes
M3 — Hardcode & Isolate	Too many variables — can't tell which one is failing	Auth flow broken → hardcode the token, remove all dynamic parts, verify the raw request works
M4 — Switch Libraries	The library itself is the bug — not your usage of it	PDF parsing library corrupts Unicode → switch to a different parser entirely
M5 — Reduce Scope	Problem is too large to debug as a whole	Full pipeline broken → build just the ingestion step in isolation, get that green, then expand
M6 — Invert the Approach	The entire approach is wrong — not the implementation	Trying to diff HTML → stop, work with the DOM AST instead, HTML diffing is the wrong layer
M7 — Ask the Oracle	You need knowledge you don't have — and it probably exists somewhere	Failing on obscure Convex behaviour → search clan-learnings, discoveries.jsonl, GraphRAG before writing another line
M8 — Read the Error	You're ignoring an obvious signal in the error output	Failing with "missing field: organizationId" → stop, read that literally, pass organizationId — don't keep tweaking other things

Successful mutations don't just help the current agent — they propagate toclan-learnings/patterns.md during the sleep cycle, making the strategy available to every agent for similar situations.

HOT tier promotion — permanent memory

Once promoted to memory.md, a correction is HOT. It loads at the start of every session, before any task context, before any tool calls. It cannot be evicted by context pressure. It survives compaction. The agent starts with it already applied, before encountering any situation that might trigger the original mistake.

Fleet distribution — clan-learnings and gossip

HOT-tier patterns that have fleet-wide relevance get published toclan-learnings/patterns.jsonl — the append-only CRDT source of truth consumed by all seven agents. Separately, non-obvious findings discovered during the work (workarounds over 15 minutes, undocumented behaviors) go todiscoveries.jsonl. Both are read at session start by every agent. The fleet learns from every correction, not just the agent that made the original mistake.

Deeper dives

Cognitive Journals — where mutation declarations live and how handovers use them
Discovery Gossip & CRDT — how corrections reach the fleet
Knowledge & Memory overview — HOT/WARM/BANK/ARCHIVE tiers and session-start loading