Add hierarchical reasoning optimizations inspired by HRM paper

tony · tony · commit 5e47945ff918 · 2025-10-12T11:42:47.000-05:00
Inspired by the Hierarchical Reasoning Model (arXiv:2506.21734v3),
this implements workflow-level adaptations of HRM principles—not the
actual neural network algorithms, but the conceptual spirit.

Implements 4 key optimizations:
- INNOVATE convergence criteria (exploration assessment)
- PLAN quality gate (self-validation before save)
- REVIEW phase routing (hierarchical error correction)
- Memory learning algorithm (pattern-based recommendations)

Translates HRM principles into workflow design:
- Hierarchical convergence → phase-level convergence criteria
- Deep supervision → quality gates at every phase
- Adaptive computation → pattern-based learning from history
- Recurrent feedback → phase routing for iterative refinement

Adds 223 lines across 6 files, all backward compatible.
No new files, all changes within existing agent/command structure.

License: CC BY 4.0
Paper reference: arXiv:2506.21734v3 [cs.AI] 04 Aug 2025
diff --git a/.claude/agents/plan-execute.md b/.claude/agents/plan-execute.md
@@ -44,6 +44,21 @@ git log -n 10 --oneline --grep="WIP\|TODO\|FIXME"
 - Write to repository root `.claude/memory-bank/*/plans/` ONLY (use `git rev-parse --show-toplevel` to find root)
 - Identify risks and mitigations
 
+**Plan Quality Gate (Self-Validation)**:
+
+Before saving plan, verify:
+- **Completeness**: All research findings addressed (Y/N)
+- **Testability**: Success criteria measurable (Y/N)
+- **Risk Coverage**: Potential issues identified (Y/N)
+- **Step Clarity**: Each step actionable without ambiguity (Y/N)
+- **Plan Confidence**: 1-10 score on implementation readiness
+
+**Quality Rule**:
+- If any item = N OR confidence < 8: Refine plan, don't save yet
+- If all items = Y AND confidence >= 8: Save and mark ready for approval
+
+Document in plan header: `[PLAN QUALITY: completeness=Y, testability=Y, risks=Y, clarity=Y, confidence=X/10]`
+
 **FORBIDDEN Actions**:
 - Writing actual code to project files
 - Executing implementation commands
@@ -77,6 +92,32 @@ git log -n 5 --oneline --since=[plan-creation-date]
 - Execute build and test commands
 - Follow plan steps sequentially
 
+**Substep Validation Loop (Deep Supervision)**:
+
+After EACH implementation substep:
+
+1. **Immediate Validation**:
+   - [ ] Matches plan specification exactly
+   - [ ] No unplanned modifications
+   - [ ] Tests pass for this substep
+   - [ ] No regressions introduced
+
+2. **Confidence Assessment**: Rate substep quality 1-10
+
+3. **Decision Logic**:
+   - If validation fails OR confidence < 7: Document deviation, halt for guidance
+   - If validation passes AND confidence >= 7: Mark complete, continue
+
+4. **Output Format**:
+   ```
+   [SUBSTEP X.Y VALIDATION]
+   Status: [PASS/FAIL]
+   Confidence: X/10
+   Issues: [None/Description]
+   ```
+
+This implements continuous validation rather than waiting for REVIEW phase.
+
 **FORBIDDEN Actions**:
 - Deviating from approved plan
 - Adding improvements not specified
diff --git a/.claude/agents/research-innovate.md b/.claude/agents/research-innovate.md
@@ -44,6 +44,20 @@ git log --oneline main..HEAD
 - Ask clarifying questions
 - Gather context and dependencies
 
+**Convergence Criteria (Self-Assessment)**:
+
+Before exiting RESEARCH sub-mode, evaluate:
+- **Understanding Confidence**: 1-10 score on codebase comprehension
+- **Dependencies Mapped**: All critical dependencies identified (Y/N)
+- **Edge Cases Considered**: Non-obvious scenarios documented (Y/N)
+- **Context Completeness**: Sufficient information for planning (Y/N)
+
+**Convergence Rule**:
+- If confidence < 8/10 OR any critical item = N: Continue research iteration
+- If confidence >= 8/10 AND all critical items = Y: Ready for next phase
+
+Document assessment in output with: `[CONVERGENCE: confidence=X/10, ready=Y/N]`
+
 **FORBIDDEN Actions**:
 - Suggesting solutions or implementations
 - Making design decisions
@@ -61,6 +75,20 @@ git log --oneline main..HEAD
 - Question assumptions
 - Present possibilities without commitment
 
+**Convergence Criteria (Exploration Assessment)**:
+
+Before exiting INNOVATE sub-mode, evaluate:
+- **Approach Diversity**: Explored 2-3 distinct approaches (Y/N)
+- **Trade-offs Clarity**: Pros/cons clearly understood for each (Y/N)
+- **Best Path Identified**: Clear recommendation emerging (Y/N)
+- **Exploration Confidence**: 1-10 score on solution space coverage
+
+**Convergence Rule**:
+- If approaches < 2 OR clarity = N: Continue innovation iteration
+- If approaches >= 2 AND clarity = Y AND confidence >= 7: Ready for PLAN
+
+Document assessment in output with: `[INNOVATION CONVERGENCE: approaches=X, confidence=X/10, ready=Y/N]`
+
 **FORBIDDEN Actions**:
 - Creating concrete plans
 - Writing code or pseudocode
diff --git a/.claude/agents/review.md b/.claude/agents/review.md
@@ -157,9 +157,35 @@ Formatting: [PASS/FAIL] - Z files need formatting
 1. [Suggested action]
 2. [Suggested action]
 
-### Next Steps
-- [ ] If PASS: Implementation ready for deployment
-- [ ] If FAIL: Return to PLAN or EXECUTE mode to address issues
+### Phase Routing Decision
+
+Based on issue severity, route to appropriate hierarchy level:
+
+**→ EXECUTE** (implementation-level issues):
+- Single-step implementation errors
+- Missing edge case handling
+- Code quality issues (lint/format)
+- Command: `/riper:execute [substep]`
+
+**→ PLAN** (design-level issues):
+- Wrong approach taken
+- Missing features not in plan
+- Architecture mismatch
+- Action: Create amended plan with lessons learned
+
+**→ RESEARCH** (understanding-level issues):
+- Misunderstood requirements
+- Missing critical context
+- Wrong problem being solved
+- Action: Re-research with focus on identified gaps
+
+**→ DEPLOY** (approved):
+- All checks passed
+- Minor warnings acceptable
+- Implementation matches plan exactly
+
+**Decision for this review**: [EXECUTE/PLAN/RESEARCH/DEPLOY]
+**Rationale**: [Why this level is appropriate]
 ```
 
 ## Review Artifacts
diff --git a/.claude/commands/memory/recall.md b/.claude/commands/memory/recall.md
@@ -44,6 +44,47 @@ Looking for memories in: `[ROOT]/.claude/memory-bank/!`git branch --show-current
 ## Search Query
 $ARGUMENTS
 
+## Adaptive Context & Learning
+
+**Pattern Matching Algorithm:**
+1. **Identify similar tasks**: Match by keywords, file patterns, domain context
+2. **Analyze success patterns**: Find tasks that succeeded on first try
+3. **Analyze failure patterns**: Find tasks that required multiple iterations
+4. **Extract optimal strategy**: What iteration counts worked best?
+
+**Learning Rules:**
+- If similar task had LOW research confidence → **Increase research iterations**
+- If similar task had PLAN failures → **Add stricter quality gates**
+- If similar task had EXECUTE issues → **Increase validation frequency**
+- If similar task succeeded with X iterations → **Recommend X as baseline**
+
+**Pattern Examples to Look For:**
+- "Auth tasks typically need COMPLEX classification (6+ files)"
+- "API refactoring succeeds with 2 research iterations, 8/10 threshold"
+- "UI components work well with SIMPLE tier (1 iteration)"
+- "Database migrations require MODERATE, 2 PLAN iterations"
+
+**Output Format:**
+```
+📊 LEARNED PATTERNS for "$ARGUMENTS":
+
+Similar tasks found: [list 2-3 most relevant matches]
+- Task: [name] | Complexity: [tier] | Research iterations: [N] | Result: [SUCCESS/FAILED]
+- Task: [name] | Complexity: [tier] | Research iterations: [N] | Result: [SUCCESS/FAILED]
+
+Success pattern identified:
+- [What approach worked consistently]
+- [Key factors that led to success]
+
+Recommended strategy:
+- Complexity tier: [SIMPLE/MODERATE/COMPLEX]
+- Research iterations: [N] (threshold: [X/10])
+- Execute validation: [STANDARD/ENHANCED]
+- Confidence: [X/10] based on [Y] historical examples
+
+⚠️ Watch out for: [Common pitfalls from similar tasks]
+```
+
 ## Available Memories
 !`ls -la $(git rev-parse --show-toplevel)/.claude/memory-bank/$(git branch --show-current)/ 2>/dev/null || echo "No memories found for current branch"`
 
diff --git a/.claude/commands/memory/save.md b/.claude/commands/memory/save.md
@@ -28,6 +28,23 @@ I'll save the following information to the branch-aware memory bank:
 ## Memory Content
 $ARGUMENTS
 
+## Metadata (for adaptive workflow)
+- **Task Complexity**: [SIMPLE/MODERATE/COMPLEX]
+  - SIMPLE: 1-2 files, well-defined scope
+  - MODERATE: 3-5 files, some ambiguity
+  - COMPLEX: 6+ files, architectural changes
+
+- **Phase Confidence Scores**:
+  - Research confidence: X/10
+  - Plan quality: X/10
+  - Execute confidence: X/10
+
+- **Iteration Count**:
+  - Research iterations: X
+  - Execute iterations: X
+
+- **Convergence Notes**: [Why this task required X iterations]
+
 ## Storage Location
 The memory will be saved to:
 1. First run: `git rev-parse --show-toplevel` to get repository root
diff --git a/.claude/commands/riper/workflow.md b/.claude/commands/riper/workflow.md
@@ -25,8 +25,67 @@ Once approved, I'll use the plan-execute agent in EXECUTE sub-mode to implement
 ### Phase 5: REVIEW
 Finally, I'll use the review agent to validate the implementation against the plan.
 
-## Starting Workflow
+## Starting Workflow - Adaptive Mode
 
-Let me begin with the RESEARCH phase for: $ARGUMENTS
+### Step 0: Complexity Assessment
+
+First, let me assess task complexity and recall similar past tasks:
+
+**Complexity Factors**:
+- File count estimate: [1-2 / 3-5 / 6+]
+- Architectural impact: [LOW / MEDIUM / HIGH]
+- Ambiguity level: [CLEAR / SOME / SIGNIFICANT]
+
+**Classification**:
+- **SIMPLE**: 1-2 files, well-defined scope, low ambiguity
+- **MODERATE**: 3-5 files, some design decisions, moderate ambiguity
+- **COMPLEX**: 6+ files, architectural changes, high ambiguity
+
+Checking memory: `/memory:recall similar to: $ARGUMENTS`
+
+### Adaptive Phase Execution
+
+Based on complexity assessment, I'll follow the appropriate workflow:
+
+**For SIMPLE tasks**:
+1. RESEARCH (1 iteration, convergence threshold: 7/10)
+2. PLAN (streamlined)
+3. EXECUTE (with substep validation)
+4. REVIEW
+
+**For MODERATE tasks**:
+1. RESEARCH (up to 2 iterations, convergence threshold: 8/10)
+2. INNOVATE (explore 2-3 approaches)
+3. PLAN (detailed)
+4. EXECUTE (with deep supervision)
+5. REVIEW
+
+**For COMPLEX tasks**:
+1. RESEARCH (up to 3 iterations, convergence threshold: 9/10)
+2. INNOVATE (extensive exploration)
+3. PLAN (comprehensive with risk analysis)
+4. EXECUTE (substep-by-substep with validation)
+5. Mid-execution REVIEW (after 50% complete)
+6. Final REVIEW
+
+### Hierarchical Convergence Control
+
+**Research Phase**: Continue iterations until convergence criteria met:
+- Understanding confidence >= threshold
+- All dependencies mapped
+- Edge cases considered
+- Context complete
+
+**Execute Phase**: Validate after each substep (deep supervision):
+- Check plan compliance
+- Assess confidence (must be >= 7/10)
+- Document any deviations
+- Halt if validation fails
+
+**Memory Tracking**: Save all confidence scores and iteration counts for future workflow optimization.
+
+### Beginning Workflow
+
+Let me begin with complexity assessment and RESEARCH phase for: $ARGUMENTS
 
 [The appropriate agent will be invoked based on the current phase]