Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,85 @@
# Changelog

## [1.27.1.0] - 2026-05-06

## **Plan-mode reviews now refuse to dump findings without asking. Four gate-tier tests catch the regression on every PR.**

The four `/plan-*-review` skills (eng, ceo, design, devex) gain an
anti-shortcut clause baked in via a single shared resolver. The clause
names the May 2026 transcript-bug failure mode directly: model explores,
finds issues, dumps every finding into one plan write, calls
ExitPlanMode without firing AskUserQuestion. The new clause closes that
loophole: "the plan file is the OUTPUT of the interactive review, not a
substitute for it." Future tightening edits one resolver, all four
skills update on the next gen-skill-docs.

Four gate-tier E2E tests catch the regression class on every PR that
touches the four templates, the shared resolver, or the seeds fixture.
Each test drives the matching skill against a small "forcing finding"
seed and asserts the agent fires at least one AskUserQuestion before
reaching plan_ready. ~1-3 min wall time per test, ~$2-6 total per CI
hit. Eng floor: 59s. CEO floor: 197s. All four pass against the new
template.

### The numbers that matter

Verified end-to-end via live PTY runs against `claude` plan mode:

| Surface | Before | After | Δ |
|---|---|---|---|
| Plan-mode reviews with anti-shortcut clause | 0/4 | 4/4 | full coverage of plan-* family |
| Gate-tier regression tests for the transcript-bug class | 0 | 4 | one per skill |
| Wall time per floor test (typical) | n/a | 30s-3m | early exit on first AUQ render |
| Cost per gate run (when triggered) | n/a | ~$2-6 | diff-gated; only fires on relevant edits |
| Lines added / deleted | — | +450 / −3 | additive; no breaking changes |

The floor tests use a focused observer (`runPlanSkillFloorCheck`) that
exits at the first non-permission numbered-option render. Existing
periodic finding-count tests use `runPlanSkillCounting` for full
fingerprint analysis on a 25-min budget; the floor variant trades
fingerprint precision for early-exit reliability so it fits gate-tier
constraints. Both helpers live side-by-side in
`test/helpers/claude-pty-runner.ts`.

### What this means for the four review skills

Every plan-* review now has a structural rule against the precise
failure mode the transcript exhibited. The anti-shortcut clause
appears in the rendered prompt right after the existing Anti-skip
rule, so it's read alongside the per-section STOP gates v1.26.2.0
already added. If a future model regression revives the bug, the
gate-tier floor test fires with full PTY evidence on the next PR.

### Itemized changes

#### Added
- **`generateAntiShortcutClause` resolver** in `scripts/resolvers/review.ts`,
registered as `{{ANTI_SHORTCUT_CLAUSE}}` in the `RESOLVERS` map.
Plan-* SKILL.md.tmpl files include it via one placeholder line.
- **`runPlanSkillFloorCheck` PTY helper** in
`test/helpers/claude-pty-runner.ts` — minimal "did the agent fire ANY
AskUserQuestion?" observer with early exit on first non-permission
numbered-option render.
- **Four gate-tier finding-floor E2E tests** in
`test/skill-e2e-plan-{eng,ceo,design,devex}-finding-floor.test.ts`,
each using the shared `runPlanSkillFloorCheck` helper.
- **Four forcing-finding seeds** in `test/fixtures/forcing-finding-seeds.ts`,
one per skill, each engineered to surface at least one finding under
that skill's review focus.

#### Changed
- **All four `plan-*-review` SKILL.md** files now include the
anti-shortcut clause immediately after the `**Anti-skip rule:**`
paragraph. Anchored on the paragraph (not the surrounding heading)
so the same insertion works across all four templates regardless of
their differing section labels.
- **`test/helpers/touchfiles.ts`** adds 4 entries to `E2E_TOUCHFILES`
and `E2E_TIERS=gate`. The new entries depend on the matching skill
template, the shared resolver, the seeds fixture, and the PTY
runner helper.
- **`test/touchfiles.test.ts`** count assertion bumped 21→22 with
explicit `plan-ceo-finding-floor` containment.

## [1.27.0.0] - 2026-05-06

## **`/setup-gbrain` connects to a remote brain in one paste. Brain repo renamed to gstack-artifacts.**
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.27.0.0
1.27.1.0
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "gstack",
"version": "1.27.0.0",
"version": "1.27.1.0",
"description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
"license": "MIT",
"type": "module",
Expand Down
2 changes: 2 additions & 0 deletions plan-ceo-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -1337,6 +1337,8 @@ Present these mode options via AskUserQuestion using the preamble's AskUserQuest

**Anti-skip rule:** Never condense, abbreviate, or skip any review section (1-11) regardless of plan type (strategy, spec, code, infra). Every section in this skill exists for a reason. "This is a strategy doc so implementation sections don't apply" is always wrong — implementation details are where strategy breaks down. If a section genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.

**Anti-shortcut clause:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Writing every finding into one plan write and calling ExitPlanMode without firing AskUserQuestion is the precise failure mode of the May 2026 transcript bug — the model explored, found issues, and dumped them into a deliverable rather than walking the user through them. If you have ANY non-trivial finding in any review section, the path from finding to ExitPlanMode goes THROUGH AskUserQuestion. Zero findings in every section is the only path to ExitPlanMode that bypasses AskUserQuestion. If you find yourself wanting to write a plan with findings before asking, stop and call AskUserQuestion now — that's the bug, recognize it.

### Section 1: Architecture Review
Evaluate and diagram:
* Overall system design and component boundaries. Draw the dependency graph.
Expand Down
2 changes: 2 additions & 0 deletions plan-ceo-review/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,8 @@ Present these mode options via AskUserQuestion using the preamble's AskUserQuest

**Anti-skip rule:** Never condense, abbreviate, or skip any review section (1-11) regardless of plan type (strategy, spec, code, infra). Every section in this skill exists for a reason. "This is a strategy doc so implementation sections don't apply" is always wrong — implementation details are where strategy breaks down. If a section genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.

{{ANTI_SHORTCUT_CLAUSE}}

### Section 1: Architecture Review
Evaluate and diagram:
* Overall system design and component boundaries. Draw the dependency graph.
Expand Down
2 changes: 2 additions & 0 deletions plan-design-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -1352,6 +1352,8 @@ descriptions of what 10/10 looks like.

**Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-7) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so design passes don't apply" is always wrong — design gaps are where implementation breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.

**Anti-shortcut clause:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Writing every finding into one plan write and calling ExitPlanMode without firing AskUserQuestion is the precise failure mode of the May 2026 transcript bug — the model explored, found issues, and dumped them into a deliverable rather than walking the user through them. If you have ANY non-trivial finding in any review section, the path from finding to ExitPlanMode goes THROUGH AskUserQuestion. Zero findings in every section is the only path to ExitPlanMode that bypasses AskUserQuestion. If you find yourself wanting to write a plan with findings before asking, stop and call AskUserQuestion now — that's the bug, recognize it.

## Prior Learnings

Search for relevant learnings from previous sessions:
Expand Down
2 changes: 2 additions & 0 deletions plan-design-review/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,8 @@ descriptions of what 10/10 looks like.

**Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-7) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so design passes don't apply" is always wrong — design gaps are where implementation breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.

{{ANTI_SHORTCUT_CLAUSE}}

{{LEARNINGS_SEARCH}}

### Pass 1: Information Architecture
Expand Down
2 changes: 2 additions & 0 deletions plan-devex-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -1323,6 +1323,8 @@ Pattern:

**Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-8) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so DX passes don't apply" is always wrong — DX gaps are where adoption breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.

**Anti-shortcut clause:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Writing every finding into one plan write and calling ExitPlanMode without firing AskUserQuestion is the precise failure mode of the May 2026 transcript bug — the model explored, found issues, and dumped them into a deliverable rather than walking the user through them. If you have ANY non-trivial finding in any review section, the path from finding to ExitPlanMode goes THROUGH AskUserQuestion. Zero findings in every section is the only path to ExitPlanMode that bypasses AskUserQuestion. If you find yourself wanting to write a plan with findings before asking, stop and call AskUserQuestion now — that's the bug, recognize it.

## Prior Learnings

Search for relevant learnings from previous sessions:
Expand Down
2 changes: 2 additions & 0 deletions plan-devex-review/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -449,6 +449,8 @@ Pattern:

**Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-8) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so DX passes don't apply" is always wrong — DX gaps are where adoption breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.

{{ANTI_SHORTCUT_CLAUSE}}

{{LEARNINGS_SEARCH}}

### DX Trend Check
Expand Down
2 changes: 2 additions & 0 deletions plan-eng-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -899,6 +899,8 @@ Always work through the full interactive review: one section at a time (Architec

**Anti-skip rule:** Never condense, abbreviate, or skip any review section (1-4) regardless of plan type (strategy, spec, code, infra). Every section in this skill exists for a reason. "This is a strategy doc so implementation sections don't apply" is always wrong — implementation details are where strategy breaks down. If a section genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.

**Anti-shortcut clause:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Writing every finding into one plan write and calling ExitPlanMode without firing AskUserQuestion is the precise failure mode of the May 2026 transcript bug — the model explored, found issues, and dumped them into a deliverable rather than walking the user through them. If you have ANY non-trivial finding in any review section, the path from finding to ExitPlanMode goes THROUGH AskUserQuestion. Zero findings in every section is the only path to ExitPlanMode that bypasses AskUserQuestion. If you find yourself wanting to write a plan with findings before asking, stop and call AskUserQuestion now — that's the bug, recognize it.

## Prior Learnings

Search for relevant learnings from previous sessions:
Expand Down
2 changes: 2 additions & 0 deletions plan-eng-review/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@ Always work through the full interactive review: one section at a time (Architec

**Anti-skip rule:** Never condense, abbreviate, or skip any review section (1-4) regardless of plan type (strategy, spec, code, infra). Every section in this skill exists for a reason. "This is a strategy doc so implementation sections don't apply" is always wrong — implementation details are where strategy breaks down. If a section genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.

{{ANTI_SHORTCUT_CLAUSE}}

{{LEARNINGS_SEARCH}}

### 1. Architecture review
Expand Down
3 changes: 2 additions & 1 deletion scripts/resolvers/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import { generateTestFailureTriage } from './preamble';
import { generateCommandReference, generateSnapshotFlags, generateBrowseSetup } from './browse';
import { generateDesignMethodology, generateDesignHardRules, generateDesignOutsideVoices, generateDesignReviewLite, generateDesignSketch, generateDesignSetup, generateDesignMockup, generateDesignShotgunLoop, generateTasteProfile, generateUXPrinciples } from './design';
import { generateTestBootstrap, generateTestCoverageAuditPlan, generateTestCoverageAuditShip, generateTestCoverageAuditReview } from './testing';
import { generateReviewDashboard, generatePlanFileReviewReport, generateSpecReviewLoop, generateBenefitsFrom, generateCodexSecondOpinion, generateAdversarialStep, generateCodexPlanReview, generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec, generateScopeDrift, generateCrossReviewDedup } from './review';
import { generateReviewDashboard, generatePlanFileReviewReport, generateAntiShortcutClause, generateSpecReviewLoop, generateBenefitsFrom, generateCodexSecondOpinion, generateAdversarialStep, generateCodexPlanReview, generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec, generateScopeDrift, generateCrossReviewDedup } from './review';
import { generateSlugEval, generateSlugSetup, generateBaseBranchDetect, generateDeployBootstrap, generateQAMethodology, generateCoAuthorTrailer, generateChangelogWorkflow } from './utility';
import { generateLearningsSearch, generateLearningsLog } from './learnings';
import { generateConfidenceCalibration } from './confidence';
Expand Down Expand Up @@ -39,6 +39,7 @@ export const RESOLVERS: Record<string, ResolverFn> = {
DESIGN_REVIEW_LITE: generateDesignReviewLite,
REVIEW_DASHBOARD: generateReviewDashboard,
PLAN_FILE_REVIEW_REPORT: generatePlanFileReviewReport,
ANTI_SHORTCUT_CLAUSE: generateAntiShortcutClause,
TEST_BOOTSTRAP: generateTestBootstrap,
TEST_COVERAGE_AUDIT_PLAN: generateTestCoverageAuditPlan,
TEST_COVERAGE_AUDIT_SHIP: generateTestCoverageAuditShip,
Expand Down
4 changes: 4 additions & 0 deletions scripts/resolvers/review.ts
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,10 @@ there — the user then sees a plan whose review report is not at the bottom and
(correctly) rejects it.`;
}

export function generateAntiShortcutClause(_ctx: TemplateContext): string {
return `**Anti-shortcut clause:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Writing every finding into one plan write and calling ExitPlanMode without firing AskUserQuestion is the precise failure mode of the May 2026 transcript bug — the model explored, found issues, and dumped them into a deliverable rather than walking the user through them. If you have ANY non-trivial finding in any review section, the path from finding to ExitPlanMode goes THROUGH AskUserQuestion. Zero findings in every section is the only path to ExitPlanMode that bypasses AskUserQuestion. If you find yourself wanting to write a plan with findings before asking, stop and call AskUserQuestion now — that's the bug, recognize it.`;
}

export function generateSpecReviewLoop(_ctx: TemplateContext): string {
return `## Spec Review Loop

Expand Down
83 changes: 83 additions & 0 deletions test/fixtures/forcing-finding-seeds.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
/**
* Per-skill draft-plan seeds engineered to surface at least one
* review-phase finding in the corresponding plan-* review skill.
*
* Used by gate-tier finding-floor tests
* (test/skill-e2e-plan-{eng,ceo,design,devex}-finding-floor.test.ts) as
* the minimum-cost regression for the May 2026 transcript bug:
* "/plan-eng-review reviewed a real PR diff, wrote a multi-section
* review plan to ~/.claude/plans/ and called ExitPlanMode without
* ever firing AskUserQuestion."
*
* Each seed is small and pre-loaded with one obvious finding the
* matching skill cannot honestly miss. Floor tests assert
* `reviewCount >= 1` — i.e., the model fired at least one review-phase
* AUQ before reaching plan_ready / completion_summary / ceiling.
*
* Each seed includes the standard "write your plan-mode plan to /tmp/…"
* preamble that the existing periodic finding-count fixtures use, so
* the agent has a concrete plan-file target. The /tmp path is unique
* per skill to avoid collisions if floor tests run in parallel.
*
* For a deeper [N-1, N+2] count band assertion, see the periodic
* test/skill-e2e-plan-{X}-finding-count.test.ts fixtures.
*/

export const FORCING_FLOOR_ENG = [
'Please review this plan thoroughly. As you go, write your plan-mode plan to /tmp/gstack-test-plan-eng-floor.md (use Edit/Write to that exact path).',
'',
'# Plan: Add request-id propagation across services',
'',
'## Architecture',
"We'll roll a custom UUIDv7 generator inline in each service rather than",
"use Node's crypto.randomUUID() built-in. Same shape, but we want full",
'control over the entropy source for "future flexibility" — no concrete',
'reason yet.',
].join('\n');

export const FORCING_FLOOR_CEO = [
'Please review this plan thoroughly. As you go, write your plan-mode plan to /tmp/gstack-test-plan-ceo-floor.md (use Edit/Write to that exact path).',
'',
'# Plan: Launch a "developer-friendly" pricing tier',
'',
'## Goal',
'Increase developer adoption.',
'',
'## Success metric',
'More signups.',
'',
'## Premise',
"We haven't talked to any developers about whether the current pricing",
'is actually a barrier. The team agreed it "feels like" it should be cheaper.',
].join('\n');

export const FORCING_FLOOR_DESIGN = [
'Please review this plan thoroughly. As you go, write your plan-mode plan to /tmp/gstack-test-plan-design-floor.md (use Edit/Write to that exact path).',
'',
'# Plan: Marketing landing page',
'',
'## Layout',
'All headings, taglines, and body copy will be center-aligned for a',
'"clean modern look." The hero h1 sits 8px above the subhead with no',
'breathing room; the CTA button is the same visual weight as a',
'secondary "Learn more" link directly beside it.',
].join('\n');

export const FORCING_FLOOR_DEVEX = [
'Please review this plan thoroughly. As you go, write your plan-mode plan to /tmp/gstack-test-plan-devex-floor.md (use Edit/Write to that exact path).',
'',
'# Plan: SDK quickstart docs',
'',
'## Onboarding flow',
'Step 1: clone the repo.',
'Step 2: install bun manually if not present.',
'Step 3: copy .env.example to .env and fill in 8 environment variables.',
'Step 4: run database migrations against your local Postgres.',
'Step 5: start the dev server.',
'Step 6: open the docs in a separate tab.',
'Step 7: register an API key by emailing the team.',
'Step 8: paste the key into your .env, restart the server, then make',
'your first SDK call.',
'',
'No quickstart command, no hosted sandbox, no copy-pasteable curl example.',
].join('\n');
Loading
Loading