The plan is the prompt

agents May 16, 2026 14 min read

When a pet agent misunderstands the task, you correct it. Two sentences, maybe three. The session absorbs the correction and continues. The cost is negligible — a few seconds of your time and a handful of tokens.

When a cattle worker misunderstands the task, it runs to completion. It produces output that does not match what you wanted, the orchestrator classifies the outcome as failure, and the task goes back on the queue. Another worker picks it up, starts cold, misunderstands in a slightly different direction, and fails again. If you have twenty workers running the same workspace, the same misunderstanding plays out twenty different ways at twenty different costs before you notice the pattern and fix the input.

The input is the plan.

What you are actually paying for in a pet session

Pet sessions accumulate context through dialogue. You describe what you want, the agent responds, you correct, it adjusts. The eventual output reflects not just your initial description but everything you added along the way: the clarifications, the course corrections, the “no, I meant the other thing.” By the time the pet agent produces something useful, it has received substantially more information than what was in your first message.

This is not a flaw — it is how conversations work. But it conceals the actual input cost. The information the agent needs to do the job was never written down in one place. It accumulated, across turns, in a format that is impossible to hand directly to a stateless worker.

When you move to cattle, that cost comes due. The worker starts cold. It gets exactly what you wrote, nothing more. The turns of dialogue that would have filled in the blanks do not exist — so the blanks have to be filled before dispatch. The plan is how you fill them.

The compression problem

A running system has enormous context attached to it: commit history, bead history, comments in the code, decisions made and reversed over weeks of work. A worker technically has access to all of this. The question is whether it can find the relevant signal in time to use it.

Commit history is retrospective and fragmented. You learn what changed but rarely why the tradeoff landed where it did. Bead bodies describe individual tasks, not the coherent shape of the whole. Code captures implementation, not intent. None of these is dense enough.

The plan document is different. It is written to be read by a worker who knows nothing — which means it has to contain everything necessary in as few tokens as possible. A good plan is not comprehensive in the way a specification is comprehensive. It is compressed in a specific way: it records the decisions that were made without recording the deliberations that led to them. It says this tradeoff resolved in this direction rather than here are both sides of the argument.

The compression is the point. A worker does not need to re-litigate the decisions. It needs to know what was decided and work within that. A plan that explains why every decision was made is longer than a plan needs to be and often less useful, because the why is background and the worker needs foreground.

The hierarchy that makes cold starts work

NEEDLE’s task structure has four layers:

A genesis bead sits at the root of any significant project. It exists to tie phases together and track overall progress. Its body references the plan document — that reference is the load-bearing connection. The genesis bead does not contain the plan; it points to it.

Phase beads derive from the plan’s phasing section. Each phase bead describes a coherent unit of work — a vertical slice, a capability, a milestone — with its own acceptance criteria and its own list of child tasks. Phase beads block the genesis bead; when all phases close, the genesis closes.

Task beads are the atomic units a worker actually executes. A task bead is scoped to a single piece of work a worker can complete in one run: one file, one function, one test suite, one configuration change.

A worker assigned a task bead reads: the task bead body, the parent phase bead, and the plan document at the genesis bead’s reference path. In that order, smallest to largest scope, most specific to least specific. By the time it has read all three, it knows exactly what it is doing, how it fits into the phase, and what the overall project is trying to accomplish.

This only works if the plan document is coherent enough to anchor the hierarchy. A plan that is vague at the project level produces phase beads that are vague at the phase level, which produce task beads that leave workers guessing. The failure propagates down; the fix has to start at the top.

What has to be in the plan

There are several categories of content that make a plan useful to a cold-start worker. Missing any of them degrades the plan proportionally.

Scope lock. What the system does, stated precisely enough that a worker can determine whether a given change is in scope or out of scope without asking. This is harder to write than it sounds. The failure mode is a scope statement that is technically accurate but vague enough to be consistent with many different implementations — which means every worker is free to pick a different implementation.

Acceptance criteria. The conditions under which the project is done. Not aspirations (“the system should be fast”) but testable criteria (“p99 response time under 200ms with N concurrent users”). Acceptance criteria are what let the orchestrator evaluate whether a worker’s output counts as success. If the criteria are absent or vague, the orchestrator cannot classify the outcome reliably, and the outcome table loses a row.

Phase boundaries. Where one phase ends and the next begins, stated as conditions rather than calendar dates. A phase boundary is a checkpoint: the system is in this state before this phase, and in this state after. If the boundary is defined as a date, it is almost certainly wrong the moment implementation starts. If it is defined as a condition, it stays true regardless of how long the phase takes.

Known unknowns. The things you do not know yet, stated explicitly. A plan that does not acknowledge its own uncertainty is pretending to more confidence than it has — and workers will act on that pretended confidence. A plan that says “we do not yet know how to handle the edge case of X; this will be resolved in phase 3” gives workers license to defer the question cleanly rather than improvise an answer that may conflict with what phase 3 eventually decides.

Constraint inventory. The fixed points that eliminate solution space: existing interfaces you cannot change, performance budgets you cannot exceed, security requirements you cannot trade away. Constraints are more useful than requirements because they narrow the space of valid implementations without prescribing a specific one. A worker that knows the constraints can make autonomous design decisions within them; a worker that does not know the constraints makes design decisions that may violate them invisibly.

Rollback plan. What happens if phase N fails. Not “we will figure it out” but the actual fallback: which state the system can be safely returned to, which changes are reversible and which are not, what the recovery path is. Workers do not need this to execute normally — they need it to handle the abnormal cases that the orchestrator surfaces.

The plan is how you avoid pivoting mid-flight

The most expensive thing that can happen to a cattle fleet is not a worker crashing. It is a worker succeeding at the wrong thing — because the plan did not make “the right thing” unambiguous.

A mid-flight pivot in a pet session costs a turn of conversation. A mid-flight pivot in a cattle fleet costs every task derived from the misunderstanding: the work already done, the retries queued, the downstream tasks that were built on the wrong foundation. The later in the implementation the pivot happens, the more work has to be undone. This is why the plan-review gate exists.

There is also an asymmetry in how expensive the pivot is depending on what changed. If the plan needed adjustment, the code artifacts can be wholesale deleted and workers restarted from the revised plan. Code is cheap. A well-scoped implementation takes hours for a fleet of workers, not days. The correct response to discovering your plan was wrong is not to patch the existing code — it is to fix the plan, clear the code, and run again. The plan is the expensive artifact; the code is the output.

This is the rule everything above resolves to: the plan is the source of truth. When the plan and the artifacts disagree, the plan is right by definition, and the artifacts are what gets amended. The direction is not negotiable. Editing the plan to match whatever the code drifted into feels like keeping the document current, but it is laundering a mistake into the source of truth — the next cold-start worker reads the retrofitted plan and treats the drift as intent. You conform the artifacts to the plan, never the plan to the artifacts. The only thing that legitimately changes a plan is a changed decision, made deliberately — and that change leads the code rather than trailing it.

This changes what you invest in. You spend the effort on the plan. You spend comparatively little worrying about whether any individual implementation is precious, because it is not — it is reproducible from the plan in a matter of hours.

The plan-review gate

Before any worker touches the code, the plan goes through /plan-review.

The skill checks 80+ structural patterns across scope, acceptance criteria, architecture, preflight safety, phasing, testing, security, performance, operations, API design, and risk. It was developed from analysis of high-quality planning documents by Jeffrey Emanuel, whose methodology for writing plans that survive contact with implementation influenced how I think about this. The patterns were extracted from what those plans had in common — and, more usefully, from what the plans that failed mid-implementation were missing.

The most common failure patterns cluster around the same four gaps: no acceptance criteria (workers cannot self-evaluate output), no phase gates (workers do not know when a phase is complete), no rollback plan (failures have no recovery path), and no constraint inventory (workers make design decisions in an unconstrained space and produce incompatible implementations). Plan-review checks for all four explicitly, along with everything else.

The output of a review is a scorecard with PRESENT / PARTIAL / MISSING ratings for each item, followed by an offer to draft the missing sections. The offer is worth taking. A plan that passes review at 90% is close enough to deploy; a plan at 60% has gaps that will compound across a fleet.

The skill is available at jedarden/jeds-curated-skills.

The cost multiplier

The math for why plan quality matters more in cattle than in pets is straightforward.

In a pet session, a plan gap costs one correction: a few seconds of your time, a few tokens, the session absorbs the fix. The cost is O(1).

In a cattle fleet with N workers, a plan gap costs N failed executions before you notice the pattern. Each failed execution burns its full budget — time, tokens, whatever the worker spent before the orchestrator classified the outcome as failure. The cost is O(N × execution budget). At twenty workers with a per-task budget of 100K tokens, one plan gap that takes two failed iterations to surface costs four million tokens in wasted execution before you see it in the outcome distribution and trace it back to the plan.

This is not hypothetical. It is the most expensive class of bug in a cattle system — more expensive than a bad prompt, more expensive than a misconfigured model, more expensive than a network issue. Network issues affect individual calls; plan gaps affect every worker on every task derived from the plan.

The fix is to treat the plan as a first-class artifact with a quality gate, not as a rough sketch you clarify in conversation.

What a plan is not

A plan is not a specification. A specification describes every detail of the implementation. A plan describes the decisions that constrain the implementation without prescribing every detail of it. Workers fill in the details; the plan tells them which details are fixed and which are theirs to choose.

A plan is not a design document. A design document explains how the system will be built. A plan records what the system will do and what success looks like. The how is the worker’s job; the what and the done are the plan’s job.

A plan is not a changelog. It records the current state of decisions, not the history of how those decisions evolved. A plan that accumulates commentary about why things changed over time is a plan that is getting harder to read with each revision. Keep the plan current and put the history in commit messages.

What I’d change

Two things.

Plans should be versioned with the code — and the plan leads. The plan lives next to the code it describes, and the two travel together. But “together” has a direction. When the code diverges from the plan — which it always does, in small ways — that divergence is a defect in the code, not a fact to be transcribed into the plan. You amend the artifacts to match the plan in the same commit that surfaces the drift. The plan itself changes only when the decision changes, and then it changes first: revise the plan, then regenerate the code from it. The habit I want to kill is the reflexive one — editing the plan to describe whatever the implementation drifted into, because that keeps the document looking current while quietly demoting it from source of truth to changelog. The convention is not “plan changes follow code changes.” It is “code changes follow plan changes.”

Staleness should be explicit. A plan written at the start of a project is not the same as that plan after six weeks of implementation. When a later phase revises an earlier decision, every section that rested on the old decision is now wrong — and there is a window between making the new decision and propagating it through the document. Today there is no marker for that window: nothing in the plan says “§Architecture still describes the phase-1 decision; phase 3 superseded it, rewrite pending.” There should be. A worker that reads an un-updated section and acts on it produces work that has to be undone. The marker is a stopgap, not a resting state — staleness is a defect in the plan to be closed, not a permanent admission that the code has outrun the document. The goal is always a plan with no stale sections, because the plan is what every worker trusts.

The question I now ask

Before I commit a plan and start creating beads from it:

Could a worker who has never spoken to me, reading only this document and the task bead it has been assigned, produce something I would accept on the first try?

Both parts matter. The first — never spoken to me — rules out plans that rely on context you have accumulated in conversation. The second — on the first try — rules out plans where success requires multiple iterations to clarify. If the answer is no, the plan is not done. I work on the plan, not the code.

The workers are ready. The question is whether the inputs are.

— Jed

Plan methodology: derived from Jeffrey Emanuel’s (@dicklesworthstone) approach to high-quality planning documents. Plan-review skill: jedarden/jeds-curated-skills. The orchestration layer this sits on: NEEDLE. The task structure (genesis beads, phase beads): beads_rust.