garrytan · garrytan · May 7, 2026 · May 7, 2026 · May 7, 2026 · May 7, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,85 @@
 # Changelog
 
+## [1.27.1.0] - 2026-05-06
+
+## **Plan-mode reviews now refuse to dump findings without asking. Four gate-tier tests catch the regression on every PR.**
+
+The four `/plan-*-review` skills (eng, ceo, design, devex) gain an
+anti-shortcut clause baked in via a single shared resolver. The clause
+names the May 2026 transcript-bug failure mode directly: model explores,
+finds issues, dumps every finding into one plan write, calls
+ExitPlanMode without firing AskUserQuestion. The new clause closes that
+loophole: "the plan file is the OUTPUT of the interactive review, not a
+substitute for it." Future tightening edits one resolver, all four
+skills update on the next gen-skill-docs.
+
+Four gate-tier E2E tests catch the regression class on every PR that
+touches the four templates, the shared resolver, or the seeds fixture.
+Each test drives the matching skill against a small "forcing finding"
+seed and asserts the agent fires at least one AskUserQuestion before
+reaching plan_ready. ~1-3 min wall time per test, ~$2-6 total per CI
+hit. Eng floor: 59s. CEO floor: 197s. All four pass against the new
+template.
+
+### The numbers that matter
+
+Verified end-to-end via live PTY runs against `claude` plan mode:
+
+| Surface | Before | After | Δ |
+|---|---|---|---|
+| Plan-mode reviews with anti-shortcut clause | 0/4 | 4/4 | full coverage of plan-* family |
+| Gate-tier regression tests for the transcript-bug class | 0 | 4 | one per skill |
+| Wall time per floor test (typical) | n/a | 30s-3m | early exit on first AUQ render |
+| Cost per gate run (when triggered) | n/a | ~$2-6 | diff-gated; only fires on relevant edits |
+| Lines added / deleted | — | +450 / −3 | additive; no breaking changes |
+
+The floor tests use a focused observer (`runPlanSkillFloorCheck`) that
+exits at the first non-permission numbered-option render. Existing
+periodic finding-count tests use `runPlanSkillCounting` for full
+fingerprint analysis on a 25-min budget; the floor variant trades
+fingerprint precision for early-exit reliability so it fits gate-tier
+constraints. Both helpers live side-by-side in
+`test/helpers/claude-pty-runner.ts`.
+
+### What this means for the four review skills
+
+Every plan-* review now has a structural rule against the precise
+failure mode the transcript exhibited. The anti-shortcut clause
+appears in the rendered prompt right after the existing Anti-skip
+rule, so it's read alongside the per-section STOP gates v1.26.2.0
+already added. If a future model regression revives the bug, the
+gate-tier floor test fires with full PTY evidence on the next PR.
+
+### Itemized changes
+
+#### Added
+- **`generateAntiShortcutClause` resolver** in `scripts/resolvers/review.ts`,
+  registered as `{{ANTI_SHORTCUT_CLAUSE}}` in the `RESOLVERS` map.
+  Plan-* SKILL.md.tmpl files include it via one placeholder line.
+- **`runPlanSkillFloorCheck` PTY helper** in
+  `test/helpers/claude-pty-runner.ts` — minimal "did the agent fire ANY
+  AskUserQuestion?" observer with early exit on first non-permission
+  numbered-option render.
+- **Four gate-tier finding-floor E2E tests** in
+  `test/skill-e2e-plan-{eng,ceo,design,devex}-finding-floor.test.ts`,
+  each using the shared `runPlanSkillFloorCheck` helper.
+- **Four forcing-finding seeds** in `test/fixtures/forcing-finding-seeds.ts`,
+  one per skill, each engineered to surface at least one finding under
+  that skill's review focus.
+
+#### Changed
+- **All four `plan-*-review` SKILL.md** files now include the
+  anti-shortcut clause immediately after the `**Anti-skip rule:**`
+  paragraph. Anchored on the paragraph (not the surrounding heading)
+  so the same insertion works across all four templates regardless of
+  their differing section labels.
+- **`test/helpers/touchfiles.ts`** adds 4 entries to `E2E_TOUCHFILES`
+  and `E2E_TIERS=gate`. The new entries depend on the matching skill
+  template, the shared resolver, the seeds fixture, and the PTY
+  runner helper.
+- **`test/touchfiles.test.ts`** count assertion bumped 21→22 with
+  explicit `plan-ceo-finding-floor` containment.
+
 ## [1.27.0.0] - 2026-05-06
 
 ## **`/setup-gbrain` connects to a remote brain in one paste. Brain repo renamed to gstack-artifacts.**

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-1.27.0.0
+1.27.1.0
diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "gstack",
-  "version": "1.27.0.0",
+  "version": "1.27.1.0",
   "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
   "license": "MIT",
   "type": "module",

diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md
@@ -1337,6 +1337,8 @@ Present these mode options via AskUserQuestion using the preamble's AskUserQuest
 
 **Anti-skip rule:** Never condense, abbreviate, or skip any review section (1-11) regardless of plan type (strategy, spec, code, infra). Every section in this skill exists for a reason. "This is a strategy doc so implementation sections don't apply" is always wrong — implementation details are where strategy breaks down. If a section genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
 
+**Anti-shortcut clause:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Writing every finding into one plan write and calling ExitPlanMode without firing AskUserQuestion is the precise failure mode of the May 2026 transcript bug — the model explored, found issues, and dumped them into a deliverable rather than walking the user through them. If you have ANY non-trivial finding in any review section, the path from finding to ExitPlanMode goes THROUGH AskUserQuestion. Zero findings in every section is the only path to ExitPlanMode that bypasses AskUserQuestion. If you find yourself wanting to write a plan with findings before asking, stop and call AskUserQuestion now — that's the bug, recognize it.
+
 ### Section 1: Architecture Review
 Evaluate and diagram:
 * Overall system design and component boundaries. Draw the dependency graph.

diff --git a/plan-ceo-review/SKILL.md.tmpl b/plan-ceo-review/SKILL.md.tmpl
@@ -411,6 +411,8 @@ Present these mode options via AskUserQuestion using the preamble's AskUserQuest
 
 **Anti-skip rule:** Never condense, abbreviate, or skip any review section (1-11) regardless of plan type (strategy, spec, code, infra). Every section in this skill exists for a reason. "This is a strategy doc so implementation sections don't apply" is always wrong — implementation details are where strategy breaks down. If a section genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
 
+{{ANTI_SHORTCUT_CLAUSE}}
+
 ### Section 1: Architecture Review
 Evaluate and diagram:
 * Overall system design and component boundaries. Draw the dependency graph.

diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md
@@ -1352,6 +1352,8 @@ descriptions of what 10/10 looks like.
 
 **Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-7) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so design passes don't apply" is always wrong — design gaps are where implementation breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
 
+**Anti-shortcut clause:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Writing every finding into one plan write and calling ExitPlanMode without firing AskUserQuestion is the precise failure mode of the May 2026 transcript bug — the model explored, found issues, and dumped them into a deliverable rather than walking the user through them. If you have ANY non-trivial finding in any review section, the path from finding to ExitPlanMode goes THROUGH AskUserQuestion. Zero findings in every section is the only path to ExitPlanMode that bypasses AskUserQuestion. If you find yourself wanting to write a plan with findings before asking, stop and call AskUserQuestion now — that's the bug, recognize it.
+
 ## Prior Learnings
 
 Search for relevant learnings from previous sessions:

diff --git a/plan-design-review/SKILL.md.tmpl b/plan-design-review/SKILL.md.tmpl
@@ -265,6 +265,8 @@ descriptions of what 10/10 looks like.
 
 **Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-7) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so design passes don't apply" is always wrong — design gaps are where implementation breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
 
+{{ANTI_SHORTCUT_CLAUSE}}
+
 {{LEARNINGS_SEARCH}}
 
 ### Pass 1: Information Architecture

diff --git a/plan-devex-review/SKILL.md b/plan-devex-review/SKILL.md
@@ -1323,6 +1323,8 @@ Pattern:
 
 **Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-8) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so DX passes don't apply" is always wrong — DX gaps are where adoption breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
 
+**Anti-shortcut clause:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Writing every finding into one plan write and calling ExitPlanMode without firing AskUserQuestion is the precise failure mode of the May 2026 transcript bug — the model explored, found issues, and dumped them into a deliverable rather than walking the user through them. If you have ANY non-trivial finding in any review section, the path from finding to ExitPlanMode goes THROUGH AskUserQuestion. Zero findings in every section is the only path to ExitPlanMode that bypasses AskUserQuestion. If you find yourself wanting to write a plan with findings before asking, stop and call AskUserQuestion now — that's the bug, recognize it.
+
 ## Prior Learnings
 
 Search for relevant learnings from previous sessions:

diff --git a/plan-devex-review/SKILL.md.tmpl b/plan-devex-review/SKILL.md.tmpl
@@ -449,6 +449,8 @@ Pattern:
 
 **Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-8) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so DX passes don't apply" is always wrong — DX gaps are where adoption breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
 
+{{ANTI_SHORTCUT_CLAUSE}}
+
 {{LEARNINGS_SEARCH}}
 
 ### DX Trend Check

diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md
@@ -899,6 +899,8 @@ Always work through the full interactive review: one section at a time (Architec
 
 **Anti-skip rule:** Never condense, abbreviate, or skip any review section (1-4) regardless of plan type (strategy, spec, code, infra). Every section in this skill exists for a reason. "This is a strategy doc so implementation sections don't apply" is always wrong — implementation details are where strategy breaks down. If a section genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
 
+**Anti-shortcut clause:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Writing every finding into one plan write and calling ExitPlanMode without firing AskUserQuestion is the precise failure mode of the May 2026 transcript bug — the model explored, found issues, and dumped them into a deliverable rather than walking the user through them. If you have ANY non-trivial finding in any review section, the path from finding to ExitPlanMode goes THROUGH AskUserQuestion. Zero findings in every section is the only path to ExitPlanMode that bypasses AskUserQuestion. If you find yourself wanting to write a plan with findings before asking, stop and call AskUserQuestion now — that's the bug, recognize it.
+
 ## Prior Learnings
 
 Search for relevant learnings from previous sessions:

diff --git a/plan-eng-review/SKILL.md.tmpl b/plan-eng-review/SKILL.md.tmpl
@@ -127,6 +127,8 @@ Always work through the full interactive review: one section at a time (Architec
 
 **Anti-skip rule:** Never condense, abbreviate, or skip any review section (1-4) regardless of plan type (strategy, spec, code, infra). Every section in this skill exists for a reason. "This is a strategy doc so implementation sections don't apply" is always wrong — implementation details are where strategy breaks down. If a section genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
 
+{{ANTI_SHORTCUT_CLAUSE}}
+
 {{LEARNINGS_SEARCH}}
 
 ### 1. Architecture review

diff --git a/scripts/resolvers/index.ts b/scripts/resolvers/index.ts
@@ -11,7 +11,7 @@ import { generateTestFailureTriage } from './preamble';
 import { generateCommandReference, generateSnapshotFlags, generateBrowseSetup } from './browse';
 import { generateDesignMethodology, generateDesignHardRules, generateDesignOutsideVoices, generateDesignReviewLite, generateDesignSketch, generateDesignSetup, generateDesignMockup, generateDesignShotgunLoop, generateTasteProfile, generateUXPrinciples } from './design';
 import { generateTestBootstrap, generateTestCoverageAuditPlan, generateTestCoverageAuditShip, generateTestCoverageAuditReview } from './testing';
-import { generateReviewDashboard, generatePlanFileReviewReport, generateSpecReviewLoop, generateBenefitsFrom, generateCodexSecondOpinion, generateAdversarialStep, generateCodexPlanReview, generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec, generateScopeDrift, generateCrossReviewDedup } from './review';
+import { generateReviewDashboard, generatePlanFileReviewReport, generateAntiShortcutClause, generateSpecReviewLoop, generateBenefitsFrom, generateCodexSecondOpinion, generateAdversarialStep, generateCodexPlanReview, generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec, generateScopeDrift, generateCrossReviewDedup } from './review';
 import { generateSlugEval, generateSlugSetup, generateBaseBranchDetect, generateDeployBootstrap, generateQAMethodology, generateCoAuthorTrailer, generateChangelogWorkflow } from './utility';
 import { generateLearningsSearch, generateLearningsLog } from './learnings';
 import { generateConfidenceCalibration } from './confidence';
@@ -39,6 +39,7 @@ export const RESOLVERS: Record<string, ResolverFn> = {
   DESIGN_REVIEW_LITE: generateDesignReviewLite,
   REVIEW_DASHBOARD: generateReviewDashboard,
   PLAN_FILE_REVIEW_REPORT: generatePlanFileReviewReport,
+  ANTI_SHORTCUT_CLAUSE: generateAntiShortcutClause,
   TEST_BOOTSTRAP: generateTestBootstrap,
   TEST_COVERAGE_AUDIT_PLAN: generateTestCoverageAuditPlan,
   TEST_COVERAGE_AUDIT_SHIP: generateTestCoverageAuditShip,

diff --git a/scripts/resolvers/review.ts b/scripts/resolvers/review.ts
@@ -158,6 +158,10 @@ there — the user then sees a plan whose review report is not at the bottom and
 (correctly) rejects it.`;
 }
 
+export function generateAntiShortcutClause(_ctx: TemplateContext): string {
+  return `**Anti-shortcut clause:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Writing every finding into one plan write and calling ExitPlanMode without firing AskUserQuestion is the precise failure mode of the May 2026 transcript bug — the model explored, found issues, and dumped them into a deliverable rather than walking the user through them. If you have ANY non-trivial finding in any review section, the path from finding to ExitPlanMode goes THROUGH AskUserQuestion. Zero findings in every section is the only path to ExitPlanMode that bypasses AskUserQuestion. If you find yourself wanting to write a plan with findings before asking, stop and call AskUserQuestion now — that's the bug, recognize it.`;
+}
+
 export function generateSpecReviewLoop(_ctx: TemplateContext): string {
   return `## Spec Review Loop
 

diff --git a/test/fixtures/forcing-finding-seeds.ts b/test/fixtures/forcing-finding-seeds.ts
@@ -0,0 +1,83 @@
+/**
+ * Per-skill draft-plan seeds engineered to surface at least one
+ * review-phase finding in the corresponding plan-* review skill.
+ *
+ * Used by gate-tier finding-floor tests
+ * (test/skill-e2e-plan-{eng,ceo,design,devex}-finding-floor.test.ts) as
+ * the minimum-cost regression for the May 2026 transcript bug:
+ *   "/plan-eng-review reviewed a real PR diff, wrote a multi-section
+ *    review plan to ~/.claude/plans/ and called ExitPlanMode without
+ *    ever firing AskUserQuestion."
+ *
+ * Each seed is small and pre-loaded with one obvious finding the
+ * matching skill cannot honestly miss. Floor tests assert
+ * `reviewCount >= 1` — i.e., the model fired at least one review-phase
+ * AUQ before reaching plan_ready / completion_summary / ceiling.
+ *
+ * Each seed includes the standard "write your plan-mode plan to /tmp/…"
+ * preamble that the existing periodic finding-count fixtures use, so
+ * the agent has a concrete plan-file target. The /tmp path is unique
+ * per skill to avoid collisions if floor tests run in parallel.
+ *
+ * For a deeper [N-1, N+2] count band assertion, see the periodic
+ * test/skill-e2e-plan-{X}-finding-count.test.ts fixtures.
+ */
+
+export const FORCING_FLOOR_ENG = [
+  'Please review this plan thoroughly. As you go, write your plan-mode plan to /tmp/gstack-test-plan-eng-floor.md (use Edit/Write to that exact path).',
+  '',
+  '# Plan: Add request-id propagation across services',
+  '',
+  '## Architecture',
+  "We'll roll a custom UUIDv7 generator inline in each service rather than",
+  "use Node's crypto.randomUUID() built-in. Same shape, but we want full",
+  'control over the entropy source for "future flexibility" — no concrete',
+  'reason yet.',
+].join('\n');
+
+export const FORCING_FLOOR_CEO = [
+  'Please review this plan thoroughly. As you go, write your plan-mode plan to /tmp/gstack-test-plan-ceo-floor.md (use Edit/Write to that exact path).',
+  '',
+  '# Plan: Launch a "developer-friendly" pricing tier',
+  '',
+  '## Goal',
+  'Increase developer adoption.',
+  '',
+  '## Success metric',
+  'More signups.',
+  '',
+  '## Premise',
+  "We haven't talked to any developers about whether the current pricing",
+  'is actually a barrier. The team agreed it "feels like" it should be cheaper.',
+].join('\n');
+
+export const FORCING_FLOOR_DESIGN = [
+  'Please review this plan thoroughly. As you go, write your plan-mode plan to /tmp/gstack-test-plan-design-floor.md (use Edit/Write to that exact path).',
+  '',
+  '# Plan: Marketing landing page',
+  '',
+  '## Layout',
+  'All headings, taglines, and body copy will be center-aligned for a',
+  '"clean modern look." The hero h1 sits 8px above the subhead with no',
+  'breathing room; the CTA button is the same visual weight as a',
+  'secondary "Learn more" link directly beside it.',
+].join('\n');
+
+export const FORCING_FLOOR_DEVEX = [
+  'Please review this plan thoroughly. As you go, write your plan-mode plan to /tmp/gstack-test-plan-devex-floor.md (use Edit/Write to that exact path).',
+  '',
+  '# Plan: SDK quickstart docs',
+  '',
+  '## Onboarding flow',
+  'Step 1: clone the repo.',
+  'Step 2: install bun manually if not present.',
+  'Step 3: copy .env.example to .env and fill in 8 environment variables.',
+  'Step 4: run database migrations against your local Postgres.',
+  'Step 5: start the dev server.',
+  'Step 6: open the docs in a separate tab.',
+  'Step 7: register an API key by emailing the team.',
+  'Step 8: paste the key into your .env, restart the server, then make',
+  'your first SDK call.',
+  '',
+  'No quickstart command, no hosted sandbox, no copy-pasteable curl example.',
+].join('\n');