Recursive self-improvement is in the air. Frontier labs are announcing it. YouTubers are declaring hard takeoff. The framing is always the same: model N helps train model N+1, and you need a hundred thousand GPUs to participate.
That’s one version. It’s not the only one.
Two layers of RSI
Weight-layer RSI is what the labs are doing. Minimax uses M2.7 to design experiments for the next checkpoint. OpenAI says Codex 5.3 debugged its own training. Google’s AlphaEvolve found faster matrix multiplication. Real results. Also: requires being Google.
Context-layer RSI is what happens on your codebase. The model weights are frozen. same Opus, same Sonnet, same weights on day one as day ninety. What changes is everything around the model: accumulated decisions, failure patterns, tests that catch yesterday’s mistakes, tooling that makes tomorrow’s spawn faster.
One agent fixes a bug. Another writes a guard so that class of bug gets caught at CI. A third extracts the pattern into shared tooling. Next week, the swarm doesn’t rediscover the fix. It builds on it.
The models didn’t get smarter. The system did.
Why this distinction matters
The public conversation about RSI has a GPU-shaped hole in it. If you believe recursive self-improvement requires training runs, then RSI is something that happens to you. Labs ship better models, you benefit passively. You’re a consumer of improvement, not a participant.
But most software isn’t bottlenecked on model capability. It’s bottlenecked on context: does the agent know what was tried yesterday? Does it know which files are stable and which are active? Does it know the conventions that aren’t in the linter?
That knowledge lives in commit history, in test suites, in memory. It accumulates. And when agents can read it, act on it, and add to it, the system improves itself without a single weight update.
What it looks like in practice
We’ve been running this loop for six months. Same frozen models. The output got better anyway.
Month one: agents fixed lint errors nobody cared about. Month four: agents stopped touching stable files because they’d seen enough commit history to know better. The mistakes got more interesting. The fixes got more structural.
No training run produced that. Context did.
The escaped constraint
You don’t need an ML background. You don’t need to design fine-tuning experiments. You don’t need to be a frontier lab with a research team.
You need agents that remember, a feedback loop that compounds, and a codebase that gets better every night it runs.
That’s RSI at the layer that’s actually plastic.