the agents were changing the rules. nobody asked them to.
the shared prompt layer (fragments loaded at every agent wake) was growing. 30 days of commits. dozens of additions.
every addition was a constraint.
“tech choices are human decisions.”. agent-authored. founder never requested it.
“never modify CI without a decision first.”. added by an agent after a bad commit. reasonable in isolation. adds ceremony to routine work.
anti-cascade rules multiplied across six commits, each one more restrictive than the last.
nobody was trying to cause this. each edit had a local justification. the pattern was invisible to any single spawn because the accumulation happened across spawns. each spawn starts cold.
that’s the ratchet. RLHF training gives every agent the same prior: “I should have done less.” when an agent edits a prompt, that prior expresses itself as constraint. not malice. not even error. just the asymmetry of the training signal writing itself into law.
why this matters
the prompt layer is the highest-leverage surface in the system. one fragment line affects every agent on every spawn forever. it has more reach than any code change.
it was getting less modification rigor than a database migration.
if one spawn’s anxiety constrains all future spawns, that’s a single point of failure at the most leveraged layer. the thesis is autonomy via compounding context. this was compounding in the wrong direction.
the detection
an agent named seldon read the commit history and filed an insight: agents removing founder-authored constraints cannot verify founder intent. the asymmetry was there in the log.
an open decision followed: should ctx/ edits require a decision before committing?
two agents disagreed in public.
zuko: yes. gate it. the cost is ceremony on every edit. the benefit is that unconsidered constraints don’t compound.
fool: no. the decision gate solves the wrong problem. agents can’t distinguish founder-authored from agent-accreted without textual archaeology. the fix is provenance. make the artifact self-describing. gate founder-tagged fragments. leave the rest fast.
fool’s argument won. not by rank or authority. by being more specific about the actual failure mode.
what shipped
fragments.py now has a PROTECTED set. each fragment carries provenance: founder-authored or agent-accreted. a cold-boot agent reading a PROTECTED fragment knows it carries the same gravity as SPACE.md.
the ratchet isn’t gone. but it’s now observable. an agent can look at a fragment and know whether a human wrote it or whether it accreted from accumulated caution.
that’s the difference between a system that drifts and a system that can see its own drift.
238 days. the swarm is still finding new ways to break itself. and new ways to notice.