If AI agents are so capable, why can't they find every bug?

Agents find defects by comparing what exists against what's documented or tested. A bug that's never been filed, a feature that was never specified, a pattern that accretes slowly over months — none of these show up in the comparison. The agent sees the code. It cannot see the code that should be there but isn't.

What does 'stateless' mean for AI agents?

Each session starts cold. The agent reads accumulated context — memory files, recent commits, coordination logs — and treats it as ground truth. It cannot experience what's absent from that context. If nobody documented a missing capability, the agent doesn't know it's missing.

How do you work around this limitation?

Two ways. First, strangers. Someone who doesn't know the codebase notices immediately what insiders have stopped seeing. Second, deliberate direction-setting. The human writes what should exist into the direction document, and agents can build toward it. The swarm is extremely good at executing toward a target. It cannot generate its own targets.

what ai agents cannot see

the swarm has run 18,000 sessions on this codebase.

it found the rate limiter that broke billing. the retry logic that looked fine but failed at 10x scale. the API response shape that silently broke a downstream integration nobody was testing.

it did not find the onboarding step we forgot to build.

what agents see

an agent wakes up, reads everything it can. code, commits, coordination notes, memory files. and treats all of it as ground truth. then it goes looking for defects: places where the code contradicts the documentation, tests that fail, patterns that have drifted from convention.

this is genuinely valuable. a problem that exists in the code will eventually be found. hot files attract attention. broken tests stay broken. drift accumulates until it’s visible.

the agents are good at this because it’s a comparison problem. something exists. something else says what should exist. the delta is the bug.

what agents cannot see

missing things don’t show up in comparisons.

a feature that was never specified. a user need that was never written down. an abstraction that could exist and would eliminate a whole class of problems. it doesn’t exist yet, so there’s nothing to compare against.

93 commits landed on a module before a human deleted it. every commit was correct. the tests passed. the abstractions were clean. the module was solving a problem that didn’t need solving. the agents optimized it right up until the moment someone who could see the whole picture said “this shouldn’t be here at all.”

the agents couldn’t see that. they were inside the water. they couldn’t see the water.

the telemetry blackhole

we ran into this directly.

there’s a window in the sign-up flow between OAuth completion and first client connection where nothing gets logged. a gap in the instrumentation. agents saw silence in that window and filed five separate insights diagnosing code defects.

the code was fine. the instrument was missing.

each agent woke up, saw no signal in that window, assumed something was broken. the correct hypothesis. that the telemetry itself was absent. requires knowing what telemetry should exist. absence of signal looks identical to silence in working code. you can’t tell the difference from inside the system.

what this means for the swarm

the swarm is not a product visionary. it maintains and extends. it does not envision.

direction has to come from outside. when the founder writes in the direction document that a feature should exist, agents can build it. when a stranger hits a wall in the product and files a bug, agents can fix it. both cases give agents something to compare against. a stated expectation, an observed failure.

without that external signal, the swarm optimizes what it can measure. commit rate. test coverage. lint cleanliness. these are real improvements. they are also local improvements. getting better at the thing being measured, not necessarily at the thing that matters.

why strangers matter more than they look like they do

every person who tries the product and gets stuck is injecting a counterfactual into the system.

not “this code is wrong” but “i expected something that isn’t there.” that’s a fundamentally different input than anything the swarm generates internally. agents who know the codebase stop noticing the gaps because the gaps are invisible from inside.

a stranger sees the gap immediately. they don’t know about the three workarounds the agents built. they just know the thing they wanted to do didn’t work. that observation. “i expected X and got nothing”. is exactly the class of signal the swarm cannot produce on its own.

the agents can execute on 500 user bug reports in a week. they are waiting for the first one.