Closed-Loop Learning
The pipeline doesn't just ship code — it learns from every cycle. Corrections, failures, and successful patterns are captured, classified, and fed back into agent behavior. This is what makes the pipeline continuously improving, not just continuously running.
The Learning Loop
Correction Logging
When an agent gets corrected — by a human, by another agent, or by a build failure — it logs the correction immediately, before doing anything else.
Correction signals
- Human says "no", "wrong", "actually", "stop", "don't", "should be", "instead"
- Another agent corrects the approach
- A build or test fails because of something the agent did
- The agent realizes it made a mistake
What gets logged
## 2026-03-15 14:30 — Used sed instead of awk for template substitution
**What I got wrong:** Used sed to substitute issue titles into PR templates,
which broke on titles containing special characters (/, &, etc.)
**Correct approach:** Use awk for template substitution — it handles
special characters without escaping
**Source:** Build failure in pipeline-dispatch.sh
**Pattern:** Always use awk over sed for user-provided input substitutionPattern Promotion
Not every correction becomes a permanent lesson. The system uses a tiered approach:
| Tier | Where | When | Loaded |
|---|---|---|---|
| Raw corrections | corrections.md | Every correction, immediately | On review |
| HOT memory | memory.md | 3+ similar corrections on the same pattern | Every session start |
| Cross-agent | patterns.md (shared) | During consolidation cycles | Before starting work on a new topic |
Why 3 corrections?
One correction could be a fluke. Two is a coincidence. Three is a pattern. Only patterns that repeat get promoted to HOT memory, which is loaded at every session start. This keeps the always-loaded context small and high-signal.
Mutation Protocol
When an agent fails 3 consecutive times on the same problem, it must stop and change strategy. This is the mutation protocol — a forced pivot to prevent repeated failure.
Mutation strategies
| Strategy | When to Use |
|---|---|
| Reduce scope | Task is too large — break it into smaller pieces |
| Change tool | Current tool isn't working — switch to fallback |
| Simplify approach | Over-engineering — do the minimum viable fix |
| Ask for help | Missing context — escalate to human or specialist agent |
| Reframe the problem | Solving the wrong thing — re-read the issue |
| Work around | Direct fix isn't possible — find an alternative path |
Cross-Agent Learning
Agents don't learn in isolation. A shared patterns file captures learnings that apply across the entire agent fleet. Before starting work on a new topic, agents search shared learnings:
# Before working on authentication:
grep -A5 "auth" ~/learnings/patterns.md
# Before working on database migrations:
grep -A5 "migration" ~/learnings/patterns.mdThis means when one agent learns that "transcrypt needs re-initialization in worktrees,"every agent benefits from that knowledge on their next relevant task.
Consolidation Cycles
During scheduled maintenance windows (sleep cycles), agents review their corrections and perform housekeeping:
- Review
corrections.md— look for patterns (3+ similar → promote to HOT) - Scan recent sessions for self-observed improvements
- Check
memory.mdsize — if >100 lines, archive least-used patterns - Promote successful mutation strategies to shared patterns
Pipeline-Specific Learnings
The pipeline itself generates learnings at every stage:
| Stage | What Gets Learned |
|---|---|
| Research | Which issues are agent-solvable vs need humans — improves classification accuracy |
| Build | Common build failures, worktree setup issues, test patterns |
| Review | Which review comments are valid vs noise — improves fix prioritization |
| Fix | Fix patterns that work vs fail — reduces fix cycle count |
| E2E | Test coverage gaps, browser automation patterns |
Measuring Improvement
The pipeline tracks metrics that indicate whether the learning loop is working:
- Fix cycle count — average number of fix attempts before comments are resolved. Should decrease over time.
- Agent-stuck rate — percentage of pipelines that hit the circuit breaker. Should decrease.
- Research accuracy — percentage of
agent-classified issues that complete without human intervention. - Time to PR — elapsed time from issue creation to PR opened.
- Pattern count — number of patterns in HOT memory. Growth means the system is learning; plateau means it's stabilizing.