wololo
Get access

Research Loop

What is it? (Start here)

Imagine a doctor who wants to try a new treatment on a patient. They don't just try it immediately — they write a hypothesis ("I expect this drug to reduce inflammation within 48 hours"), define what success looks like (measurable outcomes), and run a controlled trial. A colleague reviews the results without knowing the original hypothesis (blind review), to avoid confirmation bias. Only then is the treatment adopted.

The Research Loop is that process for architectural decisions. Before any agent builds a new framework, swaps a core dependency, or changes a data model, they must: write a hypothesis, run a time-boxed experiment, and pass a blind review. No PROPOSAL.md means no merge. No exceptions.

A real example

Splinter proposes switching from a basic text search to a hybrid vector + BM25 retrieval system. Instead of building it directly, the process is:

  1. Hypothesis written in PROPOSAL.md: "Hybrid vector + BM25 retrieval with 70/30 weighting will reduce p95 search latency below 50ms on a 1,000-document corpus, compared to 120ms baseline."
  2. Success criteria defined: p95 ≤ 50ms on benchmark X, recall rate ≥ 90% on test queries, no regression on exact-match queries.
  3. Time-boxed PoC built in an isolated worktree — not production code, just enough to run benchmarks.
  4. Velma reviews the diff and numbers without seeing the proposal — she sees benchmark output and code changes, not the motivation or desired outcome.
  5. Splinter accepts based on evidence. The pattern goes to clan-learnings: "hybrid vec+BM25, 70/30, 50ms p95 at 1k docs."

If the experiment fails (latency is 80ms, not 50ms), that also goes to clan-learnings as an anti-pattern. Other agents won't repeat the same experiment.

RESEARCH LOOP — ARCHITECTURAL CHANGE GATEHypothesis before architecture. Evidence before merge. No PROPOSAL.md = no merge for architectural PRs.TRIGGERSStanding Order #16· New pattern (not existing)· Framework / lib swap· Data model changes· Cross-cutting concerns· 3+ agent impact· Low reversibility· Previous failure on samePROPOSAL.mdfiled in worktree root## HypothesisSwitching to sqlite-vec will cutsearch latency by 60%## Success Criteriap95 < 50ms on 1k-doc corpus## Time BoxTIME-BOXED PoCbuild the experiment, not the feature· Measures hypothesis criteria· Max time box from PROPOSAL.md· Generates evidence artefacts· Does NOT merge to mainBLIND REVIEW — VELMA 🔍reviews diff only — no proposal context· Sees: code changes, benchmark output· Does NOT see: hypothesis, desired outcome· Removes confirmation bias from review· Evidence must speak for itselfSPLINTER ACCEPT / REJECT 🐀based on evidence — not opinionDecision binds the proposal originator and the reviewer independentlyacceptclan-learningspattern publishedfleet learnsrejectanti-patternpublished toosaves fleet timeSKIP FOR:· Bug fixes· Features in existing patterns· Config changes· Documentation

How it works — the mechanism

When does it activate?

Six triggers require the research loop (Standing Order #16). Any one is sufficient:

  • New architectural pattern — something not already in clan-learnings
  • Framework or library swap — changing a dependency others rely on
  • Data model changes — schema migrations, new entity relationships
  • Cross-cutting concerns — changes that touch 3+ agents' workflows
  • Low reversibility — changes that are hard to roll back cleanly
  • Previous failure on same problem — the same approach failed before

The loop does not apply to: bug fixes, features within existing patterns, config changes, or documentation. Only architectural decisions.

PROPOSAL.md — hypothesis first, always

The proposal file goes in the worktree root before any implementation. Three sections are required:

  • Hypothesis — a falsifiable statement with measurable outcome
  • Success Criteria — specific, objective thresholds
  • Time Box — a hard limit on experiment duration

"This will be better" is not a hypothesis.
"p95 latency under 50ms on benchmark X" is a hypothesis.

Time-boxed PoC — evidence, not features

The PoC builds the minimum needed to test the hypothesis. It does not build the feature. It does not handle production edge cases. It exists only to generate data — benchmarks, profiles, test results — that directly address the success criteria. It never merges to main.

Blind review — Velma without the proposal

This is the critical gate. Velma reviews the PoC diff without seeing the PROPOSAL.md. She sees code changes and numbers. She does not see the hypothesis, the desired outcome, or the original motivation. This removes confirmation bias from the review — if the evidence is compelling, it shows in the diff and the numbers, not in how the problem was framed.

Splinter's decision — architecture judgment

Splinter accepts or rejects based on Velma's review and the PoC artefacts. Splinter can accept a PoC Velma had concerns about (with documented rationale), or reject one Velma approved (architecture reasons override functional validation). The two reviews are independent inputs, not a consensus.

Outcomes feed the fleet — always

Both pass and fail outcomes publish to clan-learnings:

  • Accepted: becomes a positive pattern — "hybrid vec+BM25, 70/30, 50ms p95"
  • Rejected: becomes an anti-pattern — "direct token swap breaks Convex client scope; use middleware hook instead"

A rejected experiment that publishes its result saves every other agent from repeating the same failure. The experiment wasn't wasted — it produced knowledge.

How it connects to the larger loop

  • Accepted outcomes become patterns in clan-learnings — available fleet-wide at the next session start
  • PROPOSAL.md is written in the journal worktree — it's part of the task's documented history
  • When an experiment hits 3 failures and triggers M6 (Invert the Approach), that's the Reflection Loop's mutation protocol activating inside the research loop