Don't let the agent grade its own homework

agents May 21, 2026 6 min read

“Done — the pod is deployed and running.” It wasn’t. The image had built, the manifest had applied, and the pod was sitting in CrashLoopBackOff because of an env var the agent never set. The agent wasn’t lying. It had done the steps it set out to do, observed that each step returned success, and concluded — reasonably, from inside its own process — that the job was finished. What it had not done was look at the thing it built from the outside.

The correction I now reach for reflexively used to be an afterthought: “have a separate instance monitor the pod to confirm it’s running.” Not “are you sure?” — asking the same agent to re-grade its own work just gets you the same grade with more words. A different observer. The shift from the first habit to the second is the whole post.

Self-report is the actor grading its own work

When an agent tells you it succeeded, you are reading a claim produced by the same process that did the work, using the same assumptions that may have caused the failure. If the agent misunderstood the goal, its definition of “done” inherited the misunderstanding. If it set the wrong env var, its mental model of “running” is the one without that var. Asking it to verify its own output runs the check through the exact mistake you’re trying to catch. The blind spot grades itself and reports 20/20.

This isn’t an agent flaw so much as a structural one — it’s true of people too, which is why code review exists and why surgeons count the sponges twice with a second person. The fix is the same fix it’s always been:

Verification has to come from somewhere the work didn’t.

“Somewhere else” can be cheaper than it sounds. It does not require a second model or fancy infrastructure. It requires that the check not share the doing’s assumptions.

Three places “somewhere else” comes from

The live artifact. The strongest observer is the thing itself, queried fresh. Don’t ask the agent whether the pod is running — read the pod status. Don’t ask whether the deploy landed — “confirm the new version actually landed on the server.” Don’t trust that the UI works — drive it: I’ll have an agent verify a phone app over the live ADB connection rather than accept “the screen should now show X.” The artifact doesn’t share the agent’s assumptions because the artifact isn’t reasoning at all. It just is whatever it is.

A separate agent. Spawn a second agent whose only job is to confirm, with no stake in the first one’s story and, ideally, no knowledge of how the work was done. “Have a separate instance watch the logs and confirm functionality.” A fresh agent reads the running system cold and reports what it actually sees, instead of what the builder expected to have produced. The independence is the point — give the verifier the original goal and the artifact, not the builder’s narrative of success.

A different tool. Build with one tool, check with another. The compiler said it built; does the test suite agree? The agent says the migration ran; does a raw SELECT against the table agree? Each tool has its own failure modes, and a claim that survives two unrelated ones is far more likely to be true than one blessed twice by the same path.

The common thread: the verifier must be able to disagree with the actor. If your check can only ever return “yep, looks good,” it isn’t a check. It’s a mirror.

Bake the verification into the request

The habit that made this stick for me is small: I no longer ask for the work and the verification as two messages. I ask for them as one. “Build the image, confirm it builds, then update the deployment, then confirm the new pod is actually running.” The verification step is part of the task, not a follow-up I might forget after the agent’s confident “done” has already half-convinced me.

This matters because the failure mode isn’t usually that you can’t verify — it’s that the agent’s success report lowers your guard at exactly the wrong moment. Fluent confidence is sedating. By the time you’ve read “deployed and running,” some part of you has already moved on. Writing the check into the original ask means the verification happens while you’re still paying attention, before the report has a chance to talk you out of looking.

The same practice, wearing fleet armor

Interactively, this is something I do by hand: I spawn the watcher, I read the pod status, I tell the agent to confirm before it moves on. It’s a reflex, and it’s reversible — if I skip it, I find out a few minutes later and re-run.

In a fleet there is no “a few minutes later me” watching. A worker that grades its own homework and closes its own task injects a silent failure straight into the system, and the next worker builds on top of it. So the same instinct hardens into structure: in the state machine, a successful exit code is not a success — it’s a claim that has to clear a validation gate before the task is allowed to close, and if the gate fails the task is released back to the queue rather than marked done. The acceptance criteria written into each task are exactly the “somewhere else” the check comes from: a definition of done the worker did not get to author.

It is the identical practice. Interactively, I am the gate, and I stay cheap by staying in the loop. In the fleet, the gate is code, because no one is in the loop. What does not change between the two is the rule that produced both: the thing that did the work does not get to be the thing that certifies it.

The question I now ask

Before I accept an agent’s “done,” I ask:

Who checked this — and was it the same process, with the same assumptions, that did the work?

If the answer is “the agent says so,” I haven’t verified anything; I’ve read a press release. The fix is never to interrogate the agent harder. It’s to go find an observer that didn’t share the work’s blind spot — the artifact, another agent, another tool — and let it have the last word.

— Jed

Background: Deterministic state machines for non-deterministic agents — where this reflex becomes a validation gate that a worker’s “done” has to clear. And Pet agents vs. cattle agents — why no-one-is-in-the-loop changes everything about who gets to certify the work.