Skip to content

/build-plan

Synced verbatim from the repo on every site build. Edit the source file on GitHub (link in the page footer); do not edit the rendered copy here.

Description

Executes the plan generated by the refine-specs command. Each agent phase is launched as an isolated subagent via the Agent tool — this keeps each phase’s context minimal and prevents token accumulation across phases.

For full-stack features, Backend Developer and Frontend Developer run in parallel because they both work against the same API contract defined in the spec.


Step 0 — Spec sign-off (mandatory, before spawning any agent)

  1. Read the spec file, plan file, and task file. Lazy-load lessons-learned: do NOT read all *.md under the project’s lessons-learned directory at sign-off time. Instead, run ls {project-docs}/lessons-learned/*.md to discover which files exist. The lessons-learned directory path is in {project-docs}/workspace.md under the lessons-learned: key (typically {project-name}-docs/lessons-learned/; resolve {project-docs} from ai-standards/.workspace-config-path). Specific files are read later, only when constructing each subagent’s prompt — back.md for Backend phases, front.md for Frontend phases, infra.md for DevOps, general.md for Tester. This trims orchestrator-side reading from ~all-files (growing with the project) to ~one-file-per-spawn.
  2. Display a summary to the developer:
    • Feature name and affected services
    • Phases that will run and in what order
    • Key API contracts defined in the spec (endpoints, request/response shapes)
    • Which standards each agent will read (from the plan’s Standards Scope section)
  3. Ask the developer:

    “Does this spec look complete and correct? Any gaps or ambiguous edge cases before I start? Confirm to proceed, or describe what needs to change and run /refine-specs first.”

  4. Do not spawn any agent until the developer explicitly confirms.

Step 0.5 — Generate per-phase bundles (after sign-off, before agents)

Before spawning any subagent, generate two bundle files that distill the standards into only the rules relevant to this feature. This prevents every subagent from re-reading 4-5 full standards files (~1,000 lines each time).

Why two bundles, not one: the Developer needs the full implementation surface (Domain Service patterns, controller wiring, services.yaml examples, scaffold details). The Tester does not — the Tester does not modify src/, only tests/. Loading 200+ lines of implementation rules into every Tester run is duplicate context the Tester will never act on. Splitting the bundle keeps the Developer’s surface intact and trims the Tester’s by 30-40%.

Order matters: write each bundle MOST-STATIC FIRST, MOST-DYNAMIC LAST. Anthropic’s prompt cache (5-minute TTL) keys on the prefix of the prompt — content that is identical across the subagents spawned in this session reuses the cache; content that changes between subagents (or between iterations) invalidates the cache from that byte forward. Static-first ordering means the Developer’s first call warms the cache for every later call.

Anti-duplication rule for both bundles

The spec is already in the subagent prompt’s reading order (step 3 of the Developer / Tester / DevOps prompt template; step 4 of the Reviewer template). Do NOT reproduce spec content inside the bundle. Reproducing 200-300 lines of spec inside a bundle that every subagent ALSO reads the spec for is duplicate context billed once per spawn.

The spec section in each bundle below is a pointer + a 5-10 line digest, NEVER a copy:

## Spec digest
See `{spec_path}` § Technical Details (and § Definition of Done for the tester bundle).
Key shape (for routing decisions only — read the spec for full requirements):
- {1 line: aggregate(s) touched}
- {1 line: write or read; sync or async; HTTP or message handler}
- {1 line: external dependencies (LLM / payments / signature / file / geo)}
- {1 line: surfaces a UI — yes/no}
- {1 line: anything unusual the subagent should know to scope its reading}

If the spec digest of the dev bundle exceeds ~15 lines, the orchestrator wrote too much. Trim and re-emit. Same for the tester bundle’s digest.

Cheap-extraction protocol for standards (load-bearing)

Before writing the bundles, the orchestrator must extract sections from individual standards files without reading their full bodies. Empirical baseline: the bundle generator was costing ~111k Sonnet tokens per /build-plan because it read every in-scope standard’s full body to decide which sections to copy. Roughly 95% of that cost was reading-to-decide, not writing.

Use this protocol for steps 3 (dev bundle) and 3-5 (tester bundle):

  1. Index first. For each standard listed in the plan’s Standards Scope, run grep -nE "^##+ " standards/<name>.md ONCE to get the section index (line numbers + heading text). Roughly 50-100 tokens per file vs. ~3-5k for a full read.
  2. Match feature type to sections. The plan’s Standards Scope already names the relevant sections (e.g. §Authorization → AZ-001..AZ-012). Match heading text to the named sections; do NOT re-derive relevance from prose.
  3. Read targeted ranges. For each matched section, use Read with offset + limit against the line range from step 1. Reading ~150 lines × N matched sections costs far less than reading 8-12 standards × ~300 lines each.
  4. Full-file read permitted ONLY when 4+ sections of the SAME standard are matched. At that point the offset+limit overhead exceeds the saving and a single full-file read is cheaper.

Same shape as the Reviewer’s coverage-aware loading (PR #102) but applied to the bundle-generator’s read pattern. Estimated savings: 60-90k Sonnet tokens per /build-plan (bundle phase 111k → ~20-50k).

Sections in the dev bundle

Path: {workspace_root}/handoffs/{feature-name}/dev-bundle.md. Consumers: Developer, Dev+Tester, DevOps. Target size: 200-400 lines.

  1. Invariants (invariants.md, full — non-negotiable, identical across every project + feature)
  2. Naming conventions + git rules (from CLAUDE.md — identical across projects)
  3. Selected standards sections (from the plan’s Standards Scope, extracted per feature type via the cheap-extraction protocol above — skip frontend rules for a backend-only feature, skip backend rules for a frontend-only CSS feature). Include the full implementation surface: layering rules (Domain / Application / Infrastructure), service patterns, controller patterns, services.yaml wiring examples, console-command details, scaffold details. Do NOT read full standards bodies to decide what to include — index by grep -nE "^##+ " first, then Read with offset + limit.
  4. decisions.md entries that overlap with this feature’s aggregates or services (project-level; same across the agents working on this feature)
  5. design-decisions.md entries when the feature has a frontend component (all entries — short, all relevant to visual consistency)
  6. Spec digest pointer (5-10 lines, see anti-duplication rule above) — pointer to {spec_path} § Technical Details + a one-line-per-bullet routing summary. NOT a reproduction. Subagents read the spec separately at step 3 of their reading order.

Sections in the tester bundle

Path: {workspace_root}/handoffs/{feature-name}/tester-bundle.md. Consumer: Tester. Target size: 150-200 lines.

  1. Invariants (invariants.md, full — same prefix as the dev bundle so cache reuse works across roles)
  2. Naming + git rules (from CLAUDE.md, compacted — keep the canonical names + branch naming, drop the prose around the convention table)
  3. Logging + redaction rules — the Tester writes assertions about which fields appear in which log lines. Apply the cheap-extraction protocol above: index logging.md headings, include only sections matching the spec’s logging assertions (typically §Sensitive fields redaction + §JSON shape); full-file read only when 4+ sections match.
  4. GDPR / PII rules — the Tester writes assertions that PII is not persisted/logged where it shouldn’t be. Apply the cheap-extraction protocol: include only the sections referenced by the feature’s pii-inventory.md rows (typically §Classification + §Encryption at rest when sensitive fields exist). Full-file read only when 4+ sections match.
  5. Attack-surface-hardening rules (when the project is internet-reachable) — the Tester writes assertions for CSRF, lockout, rate-limit, redirect-allowlist, header presence. Apply the cheap-extraction protocol: include only sections matching the spec’s attack-surface assertions (typically §CSRF + §Headers + §Rate limiting for an auth feature, §Lockout for a login feature). Full-file read only when 4+ sections match.
  6. Spec digest pointer (5-10 lines, see anti-duplication rule above) — pointer to {spec_path} § Technical Details AND § Definition of Done + a one-line-per-bullet routing summary. NOT a reproduction. The Tester reads the spec separately.
  7. Lessons-learned filtered to test design — entries marked [Tester] or otherwise about test patterns / fixture design / flaky-test workarounds. Drop entries exclusive to Domain Service / Controller / Application Service implementation
  8. DROP from this bundle: layering rules (Domain / Application / Infrastructure), service patterns, controller patterns, services.yaml wiring examples, console-command details, scaffold details. The Tester does not modify src/; those rules do not apply.

The dev bundle replaces the individual standards file reads in Developer / Dev+Tester / DevOps subagent prompts. The tester bundle replaces them in the Tester subagent prompt. Agents still read their own agent definition file (which is short and role-specific).

Reviewer agents do NOT receive any bundle. They receive the static review checklist (backend-review-checklist.md or frontend-review-checklist.md) plus the previous handoff’s file list. Checklists are derived from the standards and contain only verifiable rules — see “Reviewer prompt template” below.

DoD-checker does NOT receive any bundle. It reads only the task file’s ## Definition of Done and the developer handoff’s ## DoD coverage section — see agents/dod-checker-agent.md.


How agent phases work

Each phase is spawned with Agent(subagent_type: "general-purpose", model: "{tier}"). The subagent starts with a clean context — it does not inherit this conversation’s history.

The pipeline includes a DoD-checker phase between the Developer and the Reviewer in the standard and complex flows. It is a Haiku-tier mechanical gate that confirms every ## Definition of Done checkbox has an artefact on disk. Its only verdict is APPROVED or BLOCKED:

  • APPROVED → orchestrator invokes the Reviewer normally.
  • BLOCKED → orchestrator returns to the Developer with the gap list. This iteration does NOT count against the Reviewer’s max-3 loop budget — the DoD-checker is not a Reviewer.

The simple flow does NOT have a DoD-checker — the Dev+Tester agent runs the DoD verification gate internally and writes the same ## DoD coverage section, which is consumed downstream (by update-specs and human review) without an intermediate verifier. See agents/dod-checker-agent.md for the agent definition.

model is required — never omit it. Before each spawn, read the ## Model section of the phase’s agent definition file (e.g. agents/backend-developer-agent.md) and pass its tier (opus or sonnet) as the model argument. The workspace settings.json contains a PreToolUse hook that rejects any Agent invocation without an explicit model — if you see a tool error saying the model is missing, re-read the agent definition and retry. See also CLAUDE.md → “Agent model tiering” for the classification rules.

The prompt passed to each subagent must be self-contained and include:

  • The absolute path to the agent definition file (which tells the agent what other files to read)
  • The absolute path to the appropriate bundle for the agent’s role:
    • Developer / Dev+Tester / DevOps → dev-bundle.md
    • Tester → tester-bundle.md
    • Reviewer → no bundle (uses static checklist)
    • DoD-checker → no bundle (uses task DoD + dev handoff ## DoD coverage only)
  • The absolute path to the full spec file (for reference if needed)
  • The absolute path to the task file
  • The absolute path to the previous handoff (if any)
  • The working directory for the relevant service
  • One clear instruction

Execution flows by complexity

The plan file includes a ## Complexity classification (simple, standard, or complex). Use the matching flow:

Simple flow (Complexity: simple)

[Sign-off]
→ Generate per-phase bundles (dev + tester)
→ Developer + Tester (single agent) ← implements, writes tests, runs tests, runs DoD gate internally
→ Reviewer (optional — only if flagged) ← lightweight review, 1 iteration max
→ update-specs → Done
  • One agent does implementation AND testing in a single session. The prompt instructs it to implement the feature, then write and run tests, then run linters, then run the DoD verification gate internally.
  • No DoD-checker phase. The Dev+Tester agent runs the DoD verification gate inside its own session (see agents/{role}-developer-agent.md § “Definition-of-Done verification gate”) and writes the ## DoD coverage section directly. Adding an external Haiku verifier on top would not pay back for simple complexity — the gate already ran inline.
  • Reviewer is optional. Only spawn if the developer handoff includes an ## Open Questions section with items. If no open questions, skip directly to update-specs.
  • This flow typically uses 1-2 subagents instead of 3-4.

Standard flow (Complexity: standard)

[Sign-off]
→ Generate per-phase bundles (dev + tester)
→ Developer
→ DoD-checker ← Haiku, mechanical task-DoD verification
→ Reviewer (loop if needed, max 3)
→ Tester [unit + integration + run]
→ update-specs → Done

For single-service features. Use the appropriate Developer/Reviewer type (Backend or Frontend) based on the affected service.

If the DoD-checker returns BLOCKED, the orchestrator routes back to the Developer with the gap list and re-spawns the Developer for another iteration. The Reviewer is NOT invoked. The DoD-checker bounce does NOT count against the Reviewer’s max-3 loop.

Complex flow (Complexity: complex)

[Sign-off]
→ Generate per-phase bundles (dev + tester)
→ DevOps (only if new infra is needed) ← sequential: both devs may depend on it
→ Backend Developer ‖ Frontend Developer ← PARALLEL: both read the same spec
→ DoD-checker (backend) ‖ DoD-checker (frontend) ← PARALLEL: one per side, mechanical
→ Backend Reviewer ‖ Frontend Reviewer ← PARALLEL: independent codebases
→ Tester [unit + integration + run] ← sequential: needs both sides complete
→ update-specs → Done

Feedback loops (per side, independent):

  • DoD-checker (backend) BLOCKED → Backend Developer → DoD-checker (backend) again
  • Backend Reviewer → Backend Developer → Backend Reviewer (max 3 iterations)
  • Same pair of loops on the frontend side, independent.

If max iterations reached without approval: stop that side, report the final review report to the developer, and wait for a decision (see Failure Handling).

The DoD-checker on each side runs against the matching developer handoff (backend-dev-handoff.md / frontend-dev-handoff.md) and writes its own handoff (backend-dod-checker-handoff.md / frontend-dod-checker-handoff.md). A BLOCKED verdict on one side does NOT block the other side — the sides remain independent through the loop.


Steps

  1. Sign-off — show summary, wait for developer confirmation (see Step 0)

  2. Read the plan file and task file. Read the ## Complexity line from the plan to determine the execution flow.

  3. Pre-flight branch check — for every affected repository, verify HEAD is on master. The set of affected repositories includes:

    • Every service repo touched by the plan.
    • The docs repo ({project-docs}/) — included automatically because every feature touches INDEX.md (status flip) and the spec’s ## As-built notes (Step 9), and most features also touch pii-inventory.md and lessons-learned/*.md.

    For each repo NOT on master:

    • Show the developer: repo name, current branch, commits ahead of master.
    • Ask:

      “Repo {repo} is on {branch} (not master). Choose: (a) merge {branch} into master here and continue from a clean master, (b) keep working on {branch} for this feature (no new branch will be created), (c) abort /build-plan.”

    • Do not proceed until every affected repo has been resolved. Never silently branch from a non-master HEAD.
  4. Create feature branch — from master, create feature/{aggregate}/{feature-name} in every affected repository (services + {project-docs}/). If the branch already exists, check it out. Do not proceed until the branch is created in all affected repos.

  5. Generate per-phase bundles (see Step 0.5) — write {workspace_root}/handoffs/{feature-name}/dev-bundle.md (for Developer / Dev+Tester / DevOps) and {workspace_root}/handoffs/{feature-name}/tester-bundle.md (for the Tester)

  6. Execute each phase using the subagent prompt template below, following the flow matching the plan’s complexity:

    • Before each spawn: read the ## Model line from the agent definition file and pass its tier as the model argument of Agent. This is mandatory — the workspace settings.json hook rejects Agent invocations without model.
    • Sequential phases: spawn and wait for the result before proceeding
    • Parallel phases: spawn the first with run_in_background: true, immediately spawn the second (foreground), then process both results before continuing
  7. Handle feedback loops per side — each loop reruns only the affected side (skip for simple — no review loop)

  8. After all agents are done and tests pass, check the final handoff for a ## Lessons Learned section. If it contains new entries, append each one to the appropriate per-category file under the project’s lessons-learned directory (path from {project-docs}/workspace.md lessons-learned: key). Use this mapping to pick the file:

    Agent role of the entryDefault file
    Backend Developer / Backend Reviewerback.md
    Frontend Developer / Frontend Reviewerfront.md
    DevOpsinfra.md
    Testergeneral.md — or the file that matches the subsystem that failed (e.g. a Tester entry about a Docker desync goes to infra.md)

    Reclassify when the content clearly belongs in a different file — e.g. a DevOps entry about Symfony CORS config goes to back.md, not infra.md. If a lesson is about the framework itself (agent prompts, checklist design, command flow) and would recur across projects, promote it to the matching standard/command/agent/checklist file in ai-standards/ in the same commit — never keep a framework-level lessons-learned registry, and never promote a lesson that only surfaced once. If any project lesson duplicates an existing standard, promote it there and do not add it to lessons-learned.

  9. Run update-specs — this is the only automatic invocation of /update-specs in the workflow. It distills the plan + task into an ## As-built notes section in the spec and retires the -plan.md / -task.md files per the retention table in commands/update-specs-command.md (delete on simple/standard, archive to specs/_archive/{feature-name}/ on complex). Developers do not need to call /update-specs manually after a successful /build-plan.

  10. Delete the entire {workspace_root}/handoffs/{feature-name}/ directory

  11. Verify Docker services — check if the feature introduced changes that require a Docker rebuild or restart in affected services. Apply the appropriate action:

    Change detectedAction
    New or modified database migration (current backend stack: Phinx)docker compose down && docker compose up -d (restart runs migrations on boot)
    Backend or frontend dependency manifest modified (current examples: composer.json, package.json)docker compose down && docker compose build && docker compose up -d
    Dockerfile modifieddocker compose down && docker compose build && docker compose up -d
    New service addedFull build: docker compose build && docker compose up -d
    Only application source files changed (mounted by volume — current stack: PHP-FPM reloads classes, Vite HMR reloads modules)docker compose restart
    No infrastructure-related changesNo action needed

    After the action, verify each affected service responds correctly (e.g. curl -s -o /dev/null -w "%{http_code}" http://localhost:{port}/api/... returns a non-502 status). If a service fails to start, investigate logs (docker compose logs) and fix before committing.

  12. Commit all changes — stage and commit in every affected repo with a descriptive message. Do not push yet. The merge prompt in Step 14 handles push and master integration.

    Affected repos include {project-docs}/ (typically trades-docs/, red-profesionales-docs/, …) whenever the feature touched any of: the spec file, INDEX.md, pii-inventory.md, lessons-learned/*.md, or any other workspace doc. Treat the docs repo with the same pre-flight, branch, commit, push, PR rules as a service repo. The pre-flight check in Step 3 must include it; the branch created in Step 4 is also feature/{aggregate}/{feature-name}.

  13. Verify all Definition of Done conditions are met

  14. Post-feature merge prompt — once every affected repo is committed locally and DoD is met, ask the developer:

    “Feature {feature-name} is committed on feature/{aggregate}/{feature-name} in: {repo list}. Open PRs and merge them now? Choose: (a) yes — for each affected repo: push the feature branch, open a PR via gh pr create, then merge with gh pr merge --merge (or --auto if branch protection requires checks); (b) no — leave branches as-is for manual review.”

    • If (a) — execute the PR-first flow in this exact order:

      1. Push the feature branch in every affected repo: git push -u origin feature/{aggregate}/{feature-name}. Direct push to master is forbidden by sandbox and by invariants.md; the branch always lands on origin first as a feature branch.
      2. Open one PR per repo with gh pr create --base master --head feature/{aggregate}/{feature-name} --title {conventional-commit-prefix}: {feature summary} --body {body}. The body MUST include:
        • A ## Summary section with a 3–6 bullet recap of what changed, citing rule IDs / ADRs that gated the design.
        • A ## Cross-service merge ordering section when more than one service repo is in the PR set, naming the producer-first → consumer-second sequence (per data-migrations.md). Cite the companion PR numbers (e.g. “merge identity-service#4 BEFORE comms-service#3”).
        • A ## Test plan checklist enumerating the gates that already passed locally (test-runner count, static-analyser level, formatter, migration apply, infra smoke — current backend-stack examples: PHPUnit count, PHPStan level, PHP-CS-Fixer, Phinx apply, Mailpit smoke).
        • A trailing 🤖 Generated with [Claude Code](https://claude.com/claude-code) line.
      3. Merge the PRs with gh pr merge {N} --merge (or --auto if the repo’s branch protection requires CI checks). Honour the cross-service ordering: producer first, consumer second, docs last (or in parallel with either, since docs PRs almost never have CI gates that depend on service PRs).
        • Docs repo ({project-docs}/): typically has no GitHub Actions; merge directly with gh pr merge {N} --merge.
        • Service repos with CI: prefer gh pr merge {N} --merge --auto so the merge fires once quality gates reports SUCCESS. The orchestrator does NOT poll — --auto is fire-and-forget. If branch protection has been disabled the merge is immediate.
      4. Sync local master in every affected repo: git checkout master && git pull origin master && git branch -d feature/{aggregate}/{feature-name}. Verify HEAD is on master and the working tree is clean. Only fail-loud if the local feature branch refuses to delete (it shouldn’t — the PR merge should have integrated every commit).

      If a PR cannot be merged because checks are still running and --auto is rejected by the repo’s settings, do NOT poll inside the orchestrator. Report the open PR URLs to the developer, mark the local branches as “merge pending CI”, and stop. The developer merges manually when checks pass.

      Report any merge conflict immediately and stop — do not auto-resolve. The developer rebases the feature branch by hand.

    • If (b): skip to Step 15 with a reminder of the unmerged branches and the open PR URLs (or the un-pushed local branches if the developer also declined the push).

  15. Report final status to the developer, including:

    • Per repo: final branch (master if merged, feature/... if not), last commit hash, and the PR URL (so the developer can revisit the merge later).
    • If unmerged: reminder of which branches need attention.

Why PR-first, not local-merge-and-push: the workspace’s sandbox rejects git push origin master directly (and invariants.md forbids it explicitly). GitHub branch-protection rules on the typical service repo also reject non-PR pushes. A PR-first flow respects both layers — branches land on origin, the PR carries the test plan and cross-service ordering, and gh pr merge --auto honours any CI gate the repo declares without the orchestrator polling. Earlier versions of this command described a git push origin master flow that worked locally but always failed in practice the first time CI was wired up; the PR-first flow is the version that survives contact with branch protection.


Subagent prompt template

There are two prompt templates: one for Developer / Tester / DevOps (uses the appropriate bundle — dev-bundle.md or tester-bundle.md) and one for Reviewer (static checklist, no bundle). The DoD-checker has its own minimal prompt — see its agent definition. Replace placeholders with absolute paths.

Developer / Tester / DevOps prompt template

The order below is cache-friendly: most-static reads first (agent definition, bundle), feature-stable reads next (spec, task, references), and the dynamic-per-iteration previous_handoff_path last. The trailing instruction is always dynamic so it sits at the end.

The {bundle_path} placeholder resolves to:

  • dev-bundle.md for Developer / Dev+Tester / DevOps
  • tester-bundle.md for the Tester
You are the {Agent Role} agent for the {Project Name} project.
Read these files in order before doing anything else:
1. {agent_definition_path} ← most static (per role, across features)
2. {bundle_path} ← dev-bundle.md or tester-bundle.md depending on role; stable across this feature's subagents
3. {spec_path} ← stable across this feature
4. {task_path} ← stable across this feature
{conditional: 5. {reference_files — only if Standards Scope says so}}
6. {previous_handoff_path — if any} ← dynamic across iterations
{instruction}
Working directory: {service_path}
{conditional: Warnings from past features: {relevant_lessons — entries from the project's lessons-learned file matching this agent's role (back.md / front.md / infra.md / general.md)}}
When done, write your handoff to: {handoff_path}.
The handoff MUST start with a `## Status` block per `templates/feature-handoff-template.md` — value `complete` (finished cleanly), `blocked` (hit ambiguity, populated `## Open Questions`, no destructive change), `failed` (unrecoverable error, populate `## Status reason`), or `incomplete` (hit budget, populate `## Status reason`). The orchestrator parses this block to decide whether to advance — absent or unrecognised value is treated as `failed`.

Reviewer prompt template (Backend Reviewer / Frontend Reviewer)

Same cache-friendly ordering rule: most-static first, dynamic-per-iteration last. The reviewer’s previous_developer_handoff_path and previous_reviewer_handoff_path change every iteration of the loop — placing them after the static reads keeps the cache warm for iterations 2 and 3.

The full review checklist is NOT pre-loaded. Reviewers follow the coverage-aware protocol in their agent definition: PRIMARY paths first, SECONDARY only on coverage gap, then per-section reads of the checklist with gap citation. Pre-loading backend-review-checklist.md / frontend-review-checklist.md here would defeat the optimisation by burning ~30-50k Sonnet tokens per spawn before step 1 of the protocol can decide what is actually needed.

On iter ≥ 2, the orchestrator passes the previous Reviewer handoff so the reviewer is state-aware across iterations and follows the anti-reflag protocol declared in its agent definition (lock loaded sections to iter-1 union, only load NEW sections that match NEW diff lines, fast-path findings the Developer’s ## Iteration response reports as fixed). Skipping that handoff path is what produces “moving goalposts” — see agents/backend-reviewer-agent.md § “Iteration ≥ 2 anti-reflag protocol”.

You are the {Backend|Frontend} Reviewer agent for the {Project Name} project.
Read these files in order before doing anything else:
1. {agent_definition_path} ← most static (per role, across features); declares the 7-step coverage-aware protocol AND the iter ≥ 2 anti-reflag protocol
2. {critical_path_files} ← critical-paths/{kind}.md matching the diff (load every path with a matching PRIMARY trigger; add SECONDARY only on coverage gap)
3. {task_path} ← stable across this feature
{conditional: 4. design-decisions.md ← only for Frontend Reviewer when the diff touches UI}
{conditional: 5. {previous_reviewer_handoff_path} ← only on iter ≥ 2; read FIRST among the dynamic files so the anti-reflag protocol applies}
6. {previous_developer_handoff_path} ← dynamic across iterations — read ONLY the files listed in this handoff. On iter ≥ 2, walk `## Iteration response` entry by entry per the anti-reflag protocol
Do NOT pre-load the full review-checklist.md, the dev/tester bundle, individual standards files, the spec, or any source file outside the developer's handoff list. Follow the 7-step coverage-aware protocol declared in your agent definition: identify PRIMARY paths from the diff, add SECONDARY only on coverage gap, compute the union of each path's `## Coverage map vs full checklist`, then load checklist SECTIONS in the gap only via `Read` `offset` + `limit`. Reading the full checklist file in one go is permitted ONLY when 3+ different sections are needed. Every section load MUST cite the gap that triggered it in your handoff — a checklist load without citation is rejected as defensive overhead.
On iter ≥ 2 ALSO apply the anti-reflag protocol: lock loaded sections to the iter-1 union (read the previous Reviewer handoff's `## Reading scope used` to recover that union), MAY add NEW sections only when iter-2 diff lines that were not present in iter 1 trigger them (cite the NEW diff line), and walk the Developer's `## Iteration response` to fast-path `fixed` findings (one spot-check tool call per row, no re-load).
For each violation, report severity (critical/major/minor), file:line, and the rule ID that was violated. If you find a violation NOT covered by any loaded path AND NOT in the sections you loaded, report it as `minor` and flag it for inclusion in a future critical path / checklist update.
This is review iteration {N} of max 3.
When done, write your handoff to: {handoff_path}.
The handoff MUST start with a `## Status` block per `templates/feature-handoff-template.md` — value `complete` (review finished, verdict APPROVED or REQUEST_CHANGES), `blocked` (could not review — unreadable handoff, missing files cited by Dev), `failed` (the gate tooling crashed and you could not work around it), or `incomplete` (hit turn budget). The orchestrator gates on this block before routing the verdict.
On iter ≥ 2, the handoff MUST also include a `## Iteration response` section per `templates/feature-handoff-template.md` mapping every iter-1 finding to a `closed | regression | disputed_accepted | disputed_rejected | new` verdict. The orchestrator parses this section to detect `[late-discovery]` findings (no cited NEW diff line) and route them to `## Follow-ups` instead of re-spawning the Developer.

Why reviewers do not get a bundle: the bundles (dev + tester) are for implementation and test design (rules + examples + design context). Review is verification — a closed list of checks against a diff. The critical paths plus per-section checklist reads are shorter, denser, and unambiguous. Re-deriving rules from prose every iteration wastes tokens and produces inconsistent reviews.

Files per phase

PhaseAgent DefinitionModelContext bundleHandoff readsHandoff writes
DevOpsagents/devops-agent.mdopusDev bundleplan filedevops-handoff.md
Backend Devagents/backend-developer-agent.mdopusDev bundledevops-handoff.md (if exists)backend-dev-handoff.md
Frontend Devagents/frontend-developer-agent.mdopusDev bundledevops-handoff.md (if exists)frontend-dev-handoff.md
DoD-checker (backend)agents/dod-checker-agent.mdhaikuNo — only task DoD + dev handoff ## DoD coveragebackend-dev-handoff.md (only ## DoD coverage)backend-dod-checker-handoff.md
DoD-checker (frontend)agents/dod-checker-agent.mdhaikuNo — only task DoD + dev handoff ## DoD coveragefrontend-dev-handoff.md (only ## DoD coverage)frontend-dod-checker-handoff.md
Backend Revieweragents/backend-reviewer-agent.mdsonnetNo — uses critical-paths/*.md (PRIMARY-matched) + per-section reads of standards/backend-review-checklist.md (gap-cited)backend-dev-handoff.md (after DoD-checker APPROVED); on iter ≥ 2 also backend-reviewer-handoff.md from iter N-1backend-reviewer-handoff.md
Frontend Revieweragents/frontend-reviewer-agent.mdsonnetNo — uses critical-paths/*.md (PRIMARY-matched) + per-section reads of standards/frontend-review-checklist.md (gap-cited) (+ design-decisions.md if UI diff)frontend-dev-handoff.md (after DoD-checker APPROVED); on iter ≥ 2 also frontend-reviewer-handoff.md from iter N-1frontend-reviewer-handoff.md
Testeragents/tester-agent.mdsonnetTester bundlebackend-reviewer-handoff.md, frontend-reviewer-handoff.mdtester-handoff.md
Dev+Tester (simple)agents/{role}-developer-agent.mdopusDev bundledevops-handoff.md (if exists)dev-tester-handoff.md

The Model column is a quick reference — always source of truth is the ## Model line in the agent definition file. If they diverge, the agent file wins and this table needs an update.

Conditional reference files — include when the plan’s Standards Scope indicates:

ConditionAdd to prompt
Feature uses async messagingbackend-reference.md (RabbitMQ sections)
Feature scaffolds a new servicebackend-reference.md + new-service-checklist.md
First controller/subscriber in a servicebackend-reference.md (AppController, ApiExceptionSubscriber sections)
First composable/store/page patternfrontend-reference.md

Instruction per phase

PhaseInstruction
DevOpsConfigure infrastructure as described in the plan. Verify docker build . succeeds.
Backend DevImplement the backend for the {feature} feature as described in the spec. You do NOT write unit/integration tests — see standards/test-ownership.md for the contract (every test row is ### Tester scope). After implementation, run the existing test suite for sibling-regression detection per standards/docker-test-execution.md, then run every gate declared in standards/quality-gates.md § Backend (current backend-stack tools: PHPStan, PHP-CS-Fixer, PHPUnit, composer audit). All must pass before writing the handoff. After implementation, run the Definition-of-Done verification gate (see agent definition § “Definition-of-Done verification gate”) — mark every ### Tester scope row as ⚠️ Tester scope, never . Block writing the handoff if any non-Tester-scope DoD row is . The handoff MUST include ## Quality-Gate Results (one line per gate with the tool’s verbatim summary) and ## DoD coverage (verbatim task DoD with //⚠️ Tester scope/⚠️ marks).
Frontend DevImplement the frontend for the {feature} feature as described in the spec. You do NOT write composable/page tests and do NOT invoke Playwright — see standards/test-ownership.md for the contract (every test row and every visual/interactive item is ### Tester scope). After implementation, run every gate declared in standards/quality-gates.md § Frontend (current frontend-stack tools: type checker, linter, formatter, test runner, dependency audit — the existing test suite covers sibling-regression detection). All must pass before writing the handoff. Injecting a Dev-side browser smoke or new test files wastes tokens by duplicating work the Tester will redo. After implementation, run the Definition-of-Done verification gate (see agent definition § “Definition-of-Done verification gate”) — mark every ### Tester scope row as ⚠️ Tester scope, never . Block writing the handoff if any non-Tester-scope DoD row is . The handoff MUST include ## Quality-Gate Results (one line per gate with the tool’s verbatim summary) and ## DoD coverage (verbatim task DoD with //⚠️ Tester scope/⚠️ marks).
DoD-checkerVerify every ## Definition of Done checkbox in the task file against the Developer’s ## DoD coverage section in {dev_handoff_path}. For each row, run a single grep/ls/Read spot-check at the cited path and downgrade to if the artefact is missing. After the DoD sweep, run the hard-blocker pre-reviewer scan declared in agents/dod-checker-agent.md § “Hard-blocker pre-reviewer scan” (7 stable greps for SSL-verify-off / CORS-wildcard / sensitive log fields / TS escape hatch / raw HTML / client-bundled secrets / token-in-storage). Output verdict APPROVED (zero AND zero scan hits) or BLOCKED (one or more OR one or more scan hits) with the gap list AND the hard-blocker hit list. Do NOT read the spec, the bundle, the reviewer checklists, the critical paths, or any source file outside the developer’s ## Files Created / ## Files Modified lookup table.
Backend ReviewerReview the backend code listed in the handoff. This is review iteration {N} of max 3. The DoD-checker has already verified that every ## Definition of Done checkbox has an artefact on disk — do NOT re-walk the DoD; focus on rule compliance against the loaded critical paths.
Frontend ReviewerReview the frontend code listed in the handoff. This is review iteration {N} of max 3. The DoD-checker has already verified that every ## Definition of Done checkbox has an artefact on disk — do NOT re-walk the DoD; focus on rule compliance against the loaded critical paths.
TesterEnsure all Docker containers are running for each backend service. Run all test suites and linters. Browser-level verification is the Tester’s sole scope in the pipeline (no Dev-side browser smoke). Decide whether to drive Playwright MCP based on the task file’s DoD: if the DoD lists visual or interactive items (gradients, rendered error copy, light/dark parity, viewport-size checks, end-to-end regression scenarios that cannot be faithfully simulated in jsdom), run Playwright — resize to every target viewport, toggle dark mode, drive forms, and save screenshots to {workspace_root}/handoffs/{feature}/screenshots/. If the DoD has no such items (pure logic changes fully covered by jsdom unit tests), skip Playwright entirely. Fall back to “requires human verification” ONLY if the Playwright MCP is unavailable in-session, and state the reason explicitly. If any test fails, identify which developer needs to fix it.
Dev+Tester (simple)Implement the {feature} feature as described in the spec. After implementation, write unit tests as specified in the task file (this is the only pipeline mode where the Developer also writes tests; see standards/test-ownership.md § “Simple-flow carve-out” for the rationale). For backend, follow standards/docker-test-execution.md to run the test suite inside the service container; for frontend, run the test runner declared in standards/quality-gates.md § Frontend. Run every gate declared in the matching § of standards/quality-gates.md. All must pass before writing the handoff. After implementation and tests, run the Definition-of-Done verification gate (see the developer agent definition § “Definition-of-Done verification gate”) — block writing the handoff if any DoD checkbox is . The handoff MUST include ## Quality-Gate Results and ## DoD coverage (verbatim task DoD with //⚠️ marks). When the DoD lists visual or interactive items, switch into Tester mode and follow the Tester row’s Playwright guidance — do not skip and do not run a second Dev-phase browser smoke.

Docker pre-flight for subagent prompts

When spawning a backend subagent, the orchestrator must include in the prompt a one-line directive pointing to the canonical protocol:

DOCKER: Run backend tests per `standards/docker-test-execution.md` (start the service container first, never skip test execution, never stop sibling services' containers when running as a parallel subagent).

The full protocol — exact docker compose invocations, the parallel-subagent rule, and the targeted-execution flags — lives in standards/docker-test-execution.md so a single update-site keeps every consumer (Backend Dev, Tester, this orchestrator) in lockstep.


Handoff reading protocol (orchestrator-side, load-bearing)

After every subagent spawn, the orchestrator reads the handoff. The orchestrator is NOT the next agent — the next agent receives the handoff path via its own prompt and reads it independently in its isolated context. The orchestrator’s read is purely for ITS OWN routing decisions. So the orchestrator must NOT eagerly read the full handoff (200-800 lines = 30-120k tokens accumulated across 6-8 phases per /build-plan); it reads selectively per the rules below.

Always read (4 fields, ~10-30 lines):

  1. ## Status — gate decision per the Failure Handling section below.
  2. ## Status reason — surfaced verbatim to human when Status ≠ complete.
  3. ## Abstract — the five structured fields (outcome, verdict, files, next_phase, open_questions). This is the routing index.

That is the orchestrator’s default read budget per phase transition. ~1-3k tokens, not 30-120k.

Conditional deep-reads (only when the Abstract triggers them):

Trigger in AbstractOrchestrator additionally readsWhy
Status: blocked AND open_questions > 0## Open Questions section verbatimSurface to human; pipeline stops
verdict: REQUEST_CHANGES (Reviewer)## Findings / ## Change requests / review report sectionsConstruct upstream Dev’s next-iteration prompt with the verbatim findings
verdict: REQUEST_CHANGES_TRIVIAL (Reviewer)## Findings filtered to trivial entriesConstruct upstream Dev’s prompt with the trivial-fix list. Do NOT increment the Reviewer iter counter — the loop budget is reserved for rule-compliance reasoning
verdict: BLOCKED (DoD-checker)## Gaps sectionConstruct upstream Dev’s re-spawn prompt with the gap list
End-of-feature commit step (after Tester completes)## Files Created + ## Files Modified + ## Key Decisions (per affected handoff)Build the commit message body with files touched + design rationale
Sanity check fails (e.g. Abstract.files count mismatches ## Files Modified line count)The mismatched section, to investigateFail-loud — corrupt handoff

What the orchestrator NEVER deep-reads as part of routing:

  • ## Quality-Gate Results — the Tester reads these directly when applying its quality-gate trust policy. The orchestrator only needs to know that gates ran, which is captured in outcome.
  • ## DoD coverage — the DoD-checker validated this; the orchestrator only needs the resulting verdict field.
  • ## Iteration — the orchestrator already tracks the iteration counter independently (max 3 for Reviewer loop).
  • ## For the Next Agent — this is for the next agent, not the orchestrator. The next agent reads the full handoff via path.

This protocol drops orchestrator-side reading from ~30-120k tokens per /build-plan to ~10-40k. The 50-90k savings is the largest single contributor to the orchestrator-overhead reduction in the pass-5 audit.

Tester-handoff gate (load-bearing)

After reading the Tester handoff (per the protocol above), and before advancing to update-specs, scan the handoff’s ## Files Modified section. The Tester is read-only over production code (see agents/tester-agent.md § “Production code is read-only for the Tester, period” and standards/test-ownership.md § “Tester scope is tests-only”); any path that does NOT match the test-path allowlist declared in standards/test-ownership.md § “Test path allowlist” is a contract violation:

^(tests/|.*/__tests__/|.*\.(spec|test)\.[tj]sx?$|.*Test\.php$|cypress/|playwright/|e2e/)

If every path in the Tester’s ## Files Modified matches the regex, the gate passes and the orchestrator advances to update-specs normally.

If one or more paths do NOT match, the orchestrator:

  1. Overwrites the Tester handoff ## Status to blocked. The Tester’s self-reported status is no longer authoritative — the gate has detected a contract violation the Tester either failed to flag or flagged wrongly while still committing the change.
  2. Constructs a synthetic Developer handoff for the matching side (Backend or Frontend, picked from the path’s location — src/ of a backend service routes to Backend, src/ of web-front or any frontend repo routes to Frontend). The synthetic handoff carries:
    • The Tester’s diff for the offending paths (verbatim) under ## Open Questions so the Developer reviews the production change as if reviewing a colleague’s branch.
    • The root-cause file:line the Tester identified (extracted from the Tester’s prose; if absent, surfaced as “root cause unidentified — Developer to triage”).
    • The suspected fix as DESCRIPTION ONLY per the bounce protocol (NOT the Tester’s actual diff promoted to a patch — the Developer re-implements after triage).
  3. Routes back to the matching Developer phase with iter ≥ 2 per the anti-reflag protocol. The Reviewer iter counter advances normally — the Tester-side production drift counts as a Dev-side gap, not as a fresh feature.
  4. Notifies the human FYI with the offending paths, the synthetic handoff path, and the next-phase plan. This is informational; no blocking approval is required because the Developer phase is the gate that re-runs after triage.

This is defense in depth alongside the Tester’s own bounce protocol. The Tester is supposed to bounce when a gap requires production code (per agents/tester-agent.md); this gate catches the case where the Tester applied a fix anyway, intentionally or not. A Reviewer’s tester-side production-drift rule (next layer — standards/backend-review-checklist.md and standards/frontend-review-checklist.md) is the third and final fallback.

The path-classification heuristic (Backend vs Frontend routing) lives here because the orchestrator already knows which side each repository belongs to from Step 3’s pre-flight check; for monorepos with mixed paths, route to whichever side has more matching paths and surface ambiguity to the human in step 4.

Progress reporting style (orchestrator-side narration)

To keep orchestrator narration cheap (~10-20k savings per /build-plan vs verbose prose):

  • Between phases — emit ONE line per phase transition with this shape:
    [Phase {N}/{total}] {role} {iter}: {status}, {tokens_if_known}, {duration} → {next_phase_or_outcome}
    Example: [Phase 4/7] Backend Reviewer iter 1: complete, 84,434 tokens, 5m19s → REQUEST_CHANGES, re-spawning Backend Developer.
  • Sign-off (Step 0): keep the full summary — this is the human-gate, signal density matters.
  • Blocker / failure surfacing: keep verbatim — quoting the agent’s ## Status reason + ## Open Questions is exactly the value the human needs.
  • End-of-feature merge prompt + Token Usage Report: keep full — this is the audit trail for the run.
  • Tool-result narration during routine ops (git status, Read of an Abstract): suppress unless the operation surfaces a problem.

Verdict to action (load-bearing)

The Reviewer’s verdict field in ## Abstract is not authoritative for routing on its own — the orchestrator maps verdict + finding severities to a routing decision. Without this mapping, every minor finding could (and historically did) re-spawn the Developer for an iteration the framework cannot afford. Empirical baseline: half of the iter-2 / iter-3 spawns observed in /build-plan runs were minor-only, no blocker present.

Approval threshold: verdict: APPROVED is enforced iff the review report contains zero critical AND zero major findings. The Reviewer agent’s “approve when every checklist item passes” prose is a hard floor, not a complete rule — minor findings do not block.

VerdictMeaningIter slot consumedRouting
APPROVEDZero critical AND zero major findings. minor findings allowed but moved to ## Follow-ups for trackingn/a — verdict means doneAdvance to Tester (or Reviewer’s normal next phase)
REQUEST_CHANGESAt least one critical OR major findingYes — counts against the max-3 Reviewer loopRe-spawn upstream Dev with verbatim findings; Dev populates ## Iteration response next iteration
REQUEST_CHANGES_TRIVIALOnly minor findings, but the Reviewer judges them important enough to fix in this PR (e.g. typos, doc gaps, naming nits, unused imports). NOT for architectural / security / contract issuesNo — does NOT count against the max-3 budget. Mirrors the DoD-checker BLOCKED carve-outRe-spawn Dev with the trivial-fix list. Iter counter stays at the same N. If the Dev’s iteration response addresses every trivial finding, next Reviewer spawn returns APPROVED
BLOCKED (Reviewer-level)Reviewer could not run (unreadable handoff, missing files cited by Dev, gate tooling crash). Distinct from DoD-checker’s BLOCKEDYes (treat as failed iter)Stop pipeline, surface to human

Severity definitions (apply to every Reviewer agent):

SeverityExamplesVerdict implication
criticalSecurity violation (CORS *, secret in logs, SSL verify off), authorization bypass, data corruption risk, hard-blocker rule from the checklist (BE-001..DM-001 / FE-001..SE-003)Forces REQUEST_CHANGES
majorArchitectural violation (Domain depends on Infrastructure, Service with two public methods, Repository throws), missing audit log on protected action, contract break (API shape change), test absent on a gated pathForces REQUEST_CHANGES
minorNaming nit, missing comment, unused import, doc gap, refactor opportunity, perf optimisation that is not a budget violationGoes to ## Follow-ups (with APPROVED) OR triggers REQUEST_CHANGES_TRIVIAL if the Reviewer judges them important to fix in this PR

Why REQUEST_CHANGES_TRIVIAL does not count: the iter budget exists to bound iterations on rule-compliance reasoning, the same way DoD-checker’s BLOCKED bounce does not count for unfinished-work gates. A trivial fix has no architectural reasoning surface — re-spawning the Dev for “rename userlist to userList and fix two doc typos” is bookkeeping cost, not an architectural review iteration. Capping it at the same 3-iter ceiling caused real “we hit max-3 because the Reviewer found typos in iter 2” escalations in pre-this-version runs.

Hard rule: the Reviewer NEVER emits REQUEST_CHANGES_TRIVIAL when at least one critical or major finding exists. The orchestrator treats such a verdict as malformed and downgrades it to REQUEST_CHANGES.

The Reviewer agent’s ## Verdict mapping section (declared in agents/backend-reviewer-agent.md and agents/frontend-reviewer-agent.md) is the single source of truth for these severities; the orchestrator enforces them at routing time.

Failure Handling

Status gate (mandatory before advancing to the next phase):

Every handoff produced by a subagent MUST start with a ## Status block per templates/feature-handoff-template.md. The orchestrator parses this block before reading anything else and decides:

Status valueOrchestrator behaviour
completeApply ## Verdict to action mapping for Reviewer / DoD-checker handoffs (APPROVED → advance; REQUEST_CHANGES → consume iter, re-spawn Dev; REQUEST_CHANGES_TRIVIAL → no iter consumed, re-spawn Dev with trivial-fix list; BLOCKED at Reviewer-level → stop). For non-Reviewer handoffs, advance to the next phase as planned
blockedStop pipeline. Read ## Status reason and ## Open Questions from the handoff. Surface to the human verbatim. Do NOT re-spawn the same agent — the ambiguity needs human input
failedStop pipeline. Read ## Status reason. Report to human with the reason and the failing handoff path
incompleteStop pipeline. Read ## Status reason (which budget was hit). Ask human: retry with extended budget, accept-with-gaps, or abort
Absent / unrecognisedTreat as failed — fail-loud safe default. Never advance on missing signal

The DoD-checker’s existing APPROVED / BLOCKED verdict is independent of StatusAPPROVED is a semantic verification of the developer’s work; Status is the agent’s own self-report of run health. A DoD-checker handoff with Status: complete + verdict BLOCKED means the agent ran cleanly AND verified the developer’s work has gaps; both signals are needed.

Subagent returns a tool-level error (Agent tool itself fails):

  • Stop immediately — do not continue to the next phase
  • Report the failure to the developer with the error message verbatim
  • Wait for the developer to decide: retry, skip, or abort

DoD-checker returns BLOCKED:

  • Do NOT invoke the Reviewer for this iteration
  • Re-spawn the Developer (same side) with a prompt that includes the DoD-checker handoff’s ## Gaps section verbatim, instructing the Developer to address each row and re-run the DoD verification gate before producing the next handoff
  • The DoD-checker bounce does NOT count against the Reviewer’s max-3 loop budget — the loop budget is for rule-compliance iterations, not for unfinished-work gates
  • If the SAME rows appear in three consecutive Developer iterations, stop and report to the human developer — at that point the gap is not an oversight, it is a misunderstanding the human needs to resolve

Max review iterations reached without approval:

  • Stop the affected side (backend or frontend)
  • Report the full list of unresolved issues from the final review
  • Ask the developer:
    • (a) Accept with known issues — document them in the task file and continue
    • (b) Fix manually — developer fixes, then re-run the reviewer once more
    • (c) Abort — stop the entire feature

Do not continue to the Tester until both sides are either approved or explicitly accepted.


Output

  • All code implemented, reviewed and tested
  • Updated task file with all Definition of Done conditions marked
  • Updated spec file (via update-specs)
  • Final status report to the developer
  • Token usage estimate (see below)

Token Usage Report

After the final status report, display an actual token usage summary using the figures the runtime returns from each Agent call. Each subagent’s final message includes a <usage> block with total_tokens, tool_uses and duration_ms — these are authoritative; do not estimate from line counts.

Steps:

  1. Collect the total_tokens, tool_uses and duration_ms from every <usage> block returned by the subagents this run (one per phase, including bundle-generation, retries and reviewer-loop iterations).

  2. Sum total_tokens across phases. Note: total_tokens already accumulates across each subagent’s tool uses (each tool call replays the prefix), so it is the right field to sum — do NOT also add per-tool-use estimates on top.

  3. Display a markdown table with columns: Phase | Model | Tool uses | Total tokens | Duration. Add a Total subagents row.

  4. Orchestrator overhead instrumentation — the orchestrator’s own context (this conversation) is not captured in subagent <usage> blocks. Three tiers of instrumentation, in order of accuracy:

    Tier A (most accurate, when supported by the runtime) — if your runtime exposes session-total tokens via a slash command (e.g. Claude Code’s /cost), prompt the developer to capture it after the report:

    Orchestrator overhead — capture exact figure now:
    Run `/cost` in this session. Subtract the subagent total ({sum}) from the
    reported session total to get orchestrator overhead. Append below for
    audit trail.
    Session total (run `/cost`): _____ tokens
    Orchestrator overhead = session_total − {sum} = _____ tokens
    Orchestrator share = orchestrator / session_total = _____%

    Tier B (heuristic baseline if Tier A is unavailable) — emit:

    Orchestrator overhead (estimated): ~25-35% of subagent total ≈ {0.25 * sum}–{0.35 * sum} tokens.
    Empirical baseline (red-profesionales, N=2): orchestrator share fell in the 30-40% range.
    This estimate is a heuristic — capture exact via `/cost` for trend tracking.

    Tier C (per-flow ballparks for sanity check) — if neither Tier A nor B applies, document simple: ~50-150k overhead, standard: ~200-400k, complex: ~300-500k.

  5. Append the run to the project’s token-baseline log (when the orchestrator can write to {project-docs}/):

    File: {project-docs}/token-baseline.md (append-only)
    ## {YYYY-MM-DD} — {feature-name} ({complexity})
    Subagents:
    {markdown table from step 3}
    Subagent total: {sum} tokens across {N} phases.
    Orchestrator overhead: {Tier-A exact | Tier-B estimate range}.
    Orchestrator share: {Tier-A %| Tier-B 25-35%}.
    Notes: {anything anomalous, e.g. "Tester ran 2 iters; bundle generator ran twice due to retry"}.

    This log is append-only; future audits / scheduled agents can compute medians, detect regressions, and gate scorecard bumps on empirical data instead of estimates. If {project-docs}/token-baseline.md does not yet exist, create it with a brief header explaining its purpose.

  6. Optional cost note (skip if unsure): a rough rule of thumb is opus ≈ $15/Mtok in, sonnet ≈ $3/Mtok in, haiku ≈ $0.80/Mtok in for input — output is ~5× input but tiny in agent flows. State a $-range only if the per-tier subtotals are clearly separable from the table.

Why this replaces the old lines × 8 formula: that estimate undercounted by ~30-50% because it treated each file as read once, while subagents replay the prefix on every tool call. With the runtime’s total_tokens field readily available there is no reason to estimate. The orchestrator-overhead piece is genuinely unmeasurable from inside the orchestrator’s own session (the SDK reports per-subagent, not session-self) — Tier A makes it exact via the runtime’s own slash command; the append-only log makes the data trendable across features.

Context Checkpoint

After completing this command, evaluate whether there are more features to build (e.g. the developer asked to build multiple plans, or there are other refined specs pending). If so, suggest to the developer:

“Feature {name} is complete. The conversation context is heavy after {N} subagent phases. To keep the orchestrator context fresh for the next feature, I recommend opening a new session and running: /build-plan for {next-plan-file-path}

This matters because the orchestrator accumulates context from each subagent’s handoff, spec reads, and test results. A fresh session for the next build-plan means:

  • Smaller orchestrator context → faster responses, lower cost
  • Subagents are always isolated (they don’t inherit context), so they are unaffected
  • The handoff files and specs on disk carry all state — nothing is lost by starting a new session