Closed-Loop Learning

The pipeline doesn't just ship code — it learns from every cycle. Corrections, failures, and successful patterns are captured, classified, and fed back into agent behavior. This is what makes the pipeline continuously improving, not just continuously running.

The Learning Loop

Correction Logging

When an agent gets corrected — by a human, by another agent, or by a build failure — it logs the correction immediately, before doing anything else.

Correction signals

Human says "no", "wrong", "actually", "stop", "don't", "should be", "instead"
Another agent corrects the approach
A build or test fails because of something the agent did
The agent realizes it made a mistake

What gets logged

## 2026-03-15 14:30 — Used sed instead of awk for template substitution
**What I got wrong:** Used sed to substitute issue titles into PR templates,
  which broke on titles containing special characters (/, &, etc.)
**Correct approach:** Use awk for template substitution — it handles
  special characters without escaping
**Source:** Build failure in pipeline-dispatch.sh
**Pattern:** Always use awk over sed for user-provided input substitution

Pattern Promotion

Not every correction becomes a permanent lesson. The system uses a tiered approach:

Tier	Where	When	Loaded
Raw corrections	`corrections.md`	Every correction, immediately	On review
HOT memory	`memory.md`	3+ similar corrections on the same pattern	Every session start
Cross-agent	`patterns.md` (shared)	During consolidation cycles	Before starting work on a new topic

Why 3 corrections?

One correction could be a fluke. Two is a coincidence. Three is a pattern. Only patterns that repeat get promoted to HOT memory, which is loaded at every session start. This keeps the always-loaded context small and high-signal.

Mutation Protocol

When an agent fails 3 consecutive times on the same problem, it must stop and change strategy. This is the mutation protocol — a forced pivot to prevent repeated failure.

Mutation strategies

Strategy	When to Use
Reduce scope	Task is too large — break it into smaller pieces
Change tool	Current tool isn't working — switch to fallback
Simplify approach	Over-engineering — do the minimum viable fix
Ask for help	Missing context — escalate to human or specialist agent
Reframe the problem	Solving the wrong thing — re-read the issue
Work around	Direct fix isn't possible — find an alternative path

Cross-Agent Learning

Agents don't learn in isolation. A shared patterns file captures learnings that apply across the entire agent fleet. Before starting work on a new topic, agents search shared learnings:

# Before working on authentication:
grep -A5 "auth" ~/learnings/patterns.md

# Before working on database migrations:
grep -A5 "migration" ~/learnings/patterns.md

This means when one agent learns that "transcrypt needs re-initialization in worktrees,"every agent benefits from that knowledge on their next relevant task.

Consolidation Cycles

During scheduled maintenance windows (sleep cycles), agents review their corrections and perform housekeeping:

Review corrections.md — look for patterns (3+ similar → promote to HOT)
Scan recent sessions for self-observed improvements
Check memory.md size — if >100 lines, archive least-used patterns
Promote successful mutation strategies to shared patterns

Pipeline-Specific Learnings

The pipeline itself generates learnings at every stage:

Stage	What Gets Learned
Research	Which issues are agent-solvable vs need humans — improves classification accuracy
Build	Common build failures, worktree setup issues, test patterns
Review	Which review comments are valid vs noise — improves fix prioritization
Fix	Fix patterns that work vs fail — reduces fix cycle count
E2E	Test coverage gaps, browser automation patterns

Measuring Improvement

The pipeline tracks metrics that indicate whether the learning loop is working:

Fix cycle count — average number of fix attempts before comments are resolved. Should decrease over time.
Agent-stuck rate — percentage of pipelines that hit the circuit breaker. Should decrease.
Research accuracy — percentage of agent-classified issues that complete without human intervention.
Time to PR — elapsed time from issue creation to PR opened.
Pattern count — number of patterns in HOT memory. Growth means the system is learning; plateau means it's stabilizing.