gearbox ⚙

npx-bootstrappable AI agent harness packager. Installs a self-improving AI agent harness into any repo.

npx @mhingston5/gearbox

The core value is a session eval flywheel — hook scripts capture what happens during every AI coding session, turn that into portable runtime records and eval summaries, and give the agent better repo-specific context over time. No cloud service, no subscription — just files checked into your repo.

What you get

🪝 Hook runtime assets — capture session events, run policy guards, synthesise markdown evals, and ship portable prompts
🛠 10 user-facing harness utilities and helper CLIs — health scoring, docs drift detection, convention drift gate, event log, config sync, skill validation, and portable path helpers
🎓 33 portable skills — domain-agnostic agent skills that work with any codebase
⚡ 7 agentic workflows — GitHub Actions for ongoing repo health automation
🔌 Platform adapters — correct config files for 6 AI coding platforms
📝 Durable memory contract — AGENTS.md plus a learning guide and durable memory reference docs

Requirements

Node.js ≥ 20.12.1
Git repository

Installation

npx @mhingston5/gearbox                                    # Interactive wizard (recommended)
npx @mhingston5/gearbox --yes                              # Non-interactive — accept all defaults
npx @mhingston5/gearbox --dry-run                          # Preview files that would be written, no changes
npx @mhingston5/gearbox --platforms copilot,claude         # Select specific platforms
npx @mhingston5/gearbox --skills-dir .agents/skills        # Set custom skills directory (default: .agents/skills)
npx @mhingston5/gearbox --help                             # Show all options

The wizard asks which platforms you use and where to put skills, then writes everything in one shot.

After installation

Check harness health:
```
npm run gearbox:health
```

Commit the generated files — everything under .gearbox/, .agents/skills/, AGENTS.md, .github/agents/, docs/agents/, platform config files, and any generated symlinks belongs in version control:

git add .gearbox/ AGENTS.md .github/agents/ docs/agents/ .github/copilot/ .claude/ .agents/
git add CLAUDE.md .github/copilot-instructions.md GEMINI.md  # if present
git add .claude/skills/ .github/skills/  # if present
git commit -m "chore: install gearbox agent harness"

Configure gitleaks (optional but recommended) — gearbox's pre-tool-use hook uses gitleaks for secret scanning before every file write. Install it and add a .gitleaks.toml to your repo root. Without it the hook degrades gracefully (fail-open).
Compile the agentic workflows — if you have the gh CLI with the gh-aw extension:
```
gh aw compile .github/workflows/gearbox-*.md
```
This turns the Markdown workflow definitions into runnable GitHub Actions.

What gets installed

`.gearbox/hooks/` — shared hook runtime assets

These run at key moments during AI coding sessions. Each platform maps its native hook system to this canonical event set, then invokes the backing runtime asset shown below.

Hook event	Backing asset	What it does
`sessionStart`	`self-learning.mjs sessionStart`	Loads prior context and prepares runtime state
`userPromptSubmitted`	`self-learning.mjs userPromptSubmitted`	Captures the evolving task/goal for later compaction and evaluation
`preToolUse`	`gitleaks-check.sh` + `policy-guard.mjs`	Secret scanning via gitleaks, policy guard
`postToolUse`	`self-learning.mjs postToolUse`	Logs tool usage and updates runtime artefacts
`errorOccurred`	`self-learning.mjs errorOccurred`	Logs error context for post-session analysis
`preCompact`	`context-compact.mjs`	Writes compact context before a session compaction step
`sessionEnd`	`self-learning.mjs sessionEnd`	Flushes the event log, writes a session record, and triggers markdown eval

The installed asset set also includes shared implementations such as self-learning.mjs, markdown-eval.mjs, context-compact.mjs, helper shell scripts, and portable prompt files under .gearbox/hooks/prompts/.

`.gearbox/scripts/` — 10 user-facing harness utilities and helper CLIs

Script	What it does
`harness-audit.mjs`	Health scoring (0–100) across 5 install dimensions, plus an advisory 5-subsystem harness rubric; also runs preflight checks before harness operations
`docs-drift-check.mjs`	Detects documentation that has gone stale relative to code changes
`convention-drift-gate.mjs`	Enforces coding conventions; runs as a post-merge CI gate
`event-log.mjs`	Append-only structured event log used by hooks for session tracing
`harness-config.mjs`	Reads `harness-config.json`; single source of truth for retry limits, budget settings, and hook tuning
`sync-agent-config.mjs`	Manages cross-platform symlinks (CLAUDE.md, GEMINI.md, skills dirs) for instruction compatibility
`validate-skill.mjs`	Validates bundled `SKILL.md` files for frontmatter, structure, and basic safety issues
`paths.mjs`	Resolves portable config/worktree roots from repo guidance, local folders, or platform defaults
`normalize-error.mjs`	Normalizes unstable error text values (temp paths, GUIDs, dates, positions) for loop detection
`tmpdir.mjs`	Creates or previews scoped helper temp directories under the host temp root

common.mjs is also installed in .gearbox/scripts/ as a shared support module for these CLIs, but it is not intended to be called directly.

`{skillsDir}/` — 33 portable skills

Skills are Markdown files (SKILL.md) that the agent reads when triggered. They encode workflows, checklists, and domain knowledge that the agent applies consistently across sessions. See Skills reference below.

`.github/workflows/` — 7 agentic workflows

Workflow	Cadence	What it does
`pr-retrospective`	Per PR merge	Mines merged PRs for learnings; updates `AGENTS.md`
`convention-drift`	Weekly	Audits the whole repo for convention drift; opens issues
`docs-freshness`	Weekly	Checks docs against recent code changes; flags stale content
`decisions-hygiene`	Weekly	Reviews architectural decision records for staleness
`ci-health`	Daily	Monitors CI pass rates and flags flaky tests
`consolidate-memory`	Weekly	Merges mature session lessons into permanent agent memory
`daily-workflow-updater`	Daily	Keeps agentic workflow definitions up to date

Platform config files

See Supported platforms for the exact file paths written per platform.

Durable memory contract

AGENTS.md — top-level durable memory for repo guardrails, architecture notes, and quick links
docs/agents/learning-guide.md — concise routing guide for where new durable learnings should live
docs/agents/progress.md — lightweight running log for work that spans multiple sessions
docs/agents/session-handoff.md — restart context for the next session or reviewer
docs/agents/clean-state-checklist.md — wrap-up checklist before you hand off or pause work
.github/agents/decisions.md — long-lived technical or workflow decisions and invariants
.github/agents/user-directives.md — explicit user preferences that should shape future sessions

AGENTS.md starts as a stub and links to the other files so the installed memory bootstrap is immediately navigable.

For the smallest useful multi-session pack, start with AGENTS.md, docs/agents/learning-guide.md, docs/agents/progress.md, docs/agents/session-handoff.md, and docs/agents/clean-state-checklist.md.

`package.json` scripts

Three utility scripts are added to your package.json:

npm run gearbox:health      # Run harness health check (0-100 score + advisory subsystem rubric)
npm run gearbox:audit       # Run preflight checks
npm run gearbox:check-docs  # Check for documentation drift

Supported platforms

Platform	Config file(s)	Notes
GitHub Copilot CLI	`.github/copilot/hooks.json`	Full hook coverage
Claude Code	`.claude/settings.json`	Full hook coverage
OpenAI Codex	`~/.codex/config.json` + `~/.codex/instructions.md`	⚠ No `errorOccurred` hook; ⚠ no `sessionEnd` hook
Gemini CLI	`.gemini/settings.json`	⚠ No `errorOccurred` hook
opencode	`opencode.ts` plugin	⚠ `sessionEnd` partially mapped
pi.dev	`pi-plugin.ts`	Full hook coverage

You can install config for multiple platforms in one run. The --platforms flag accepts a comma-separated list: copilot, claude, codex, gemini, opencode, pi.

The eval flywheel

The central idea in gearbox is a feedback loop that makes the AI agent better at your specific repo over time:

Session runs
    ↓
Hook scripts fire (pre-tool-use, post-tool-use, session-end, …)
    ↓
Event log accumulates structured session data (.gearbox/hooks/.runtime/)
    ↓
session-end hook writes a reusable session record
    ↓
markdown-eval synthesises `.gearbox/hooks/.runtime/latest-eval.md`
    ↓
Durable learnings are consolidated into `AGENTS.md`, `.github/agents/`, and docs
    ↓
Agent reads improved instructions at start of next session
    ↓
Better outcomes → more learnings → cycle continues

The pr-retrospective agentic workflow runs an additional flywheel turn after every PR merge, mining the git history and review comments for durable patterns.

AGENTS.md is the living memory entry point. Keep the full durable memory contract in version control so agents can follow repo conventions, durable decisions, and explicit user preferences from session one.

Skills reference

33 portable, domain-agnostic skills. Drop them into any repo and they work without modification.

Process & quality

Skill	Description
`brainstorming`	Explores user intent, requirements and design before any implementation work
`test-driven-development`	Enforces failing-test-first discipline before writing implementation code
`verification-before-completion`	Runs verification commands and confirms output before any completion claim
`systematic-debugging`	Structured debugging protocol for any bug, test failure, or unexpected behaviour
`receiving-code-review`	Handles code review feedback with technical rigour, not performative agreement
`requesting-code-review`	Verifies work meets requirements before submitting for review
`writing-plans`	Produces multi-step implementation plans from specs before touching code
`executing-plans`	Executes written implementation plans with review checkpoints in a separate session

Agent architecture & meta-skills

Skill	Description
`using-superpowers`	Establishes how to find and use skills — invoke before any response
`subagent-driven-development`	Runs independent tasks as parallel sub-agents within the current session
`dispatching-parallel-agents`	Dispatches 2+ independent tasks to parallel agents
`stuck-loop-detection`	Detects when the agent is burning tokens without progress; escalates with a structured summary
`agentic-eval`	Patterns for evaluating and improving agent outputs: self-critique, evaluator-optimizer pipelines, LLM-as-judge

Code & architecture

Skill	Description
`refactor`	Surgical refactoring without behaviour change: extract functions, rename, eliminate smells
`improve-codebase-architecture`	Finds architectural improvement opportunities, deepens shallow modules, reduces tight coupling
`tech-debt`	Identifies, categorises, and prioritises technical debt
`context-map`	Maps all files relevant to a task before making changes
`diff-triage`	Classifies staged/unstaged changes by intent and risk without touching the index
`adopt`	Compares the current codebase with a reference project to identify high-value adaptations

Git & GitHub

Skill	Description
`gh-cli`	Comprehensive GitHub CLI reference for repos, issues, PRs, Actions, releases, and more
`using-git-worktrees`	Isolates feature work in a git worktree before implementing plans
`finishing-a-development-branch`	Guides completion of development work: merge, PR, or cleanup options
`fix-merge-conflicts`	Resolves git merge conflicts with guided categorisation and cleanup
`ci-monitor`	Watches PR CI, runs a bounded fix loop, and updates PR status

Documentation & knowledge

Skill	Description
`documentation-writer`	Expert technical writer following the Diátaxis framework (tutorials, how-tos, reference, explanation)
`create-specification`	Creates a specification file optimised for AI consumption
`mermaid-diagrams`	Creates software diagrams in Mermaid: class, sequence, flowchart, ER, C4, state, git graphs
`memory-merger`	Merges mature lessons from a domain memory file into its instruction file
`session-lessons`	Mines recent session history for evidence-backed recommendations
`wrap-up`	End-of-session reflection to surface learnings and persist them

Skill management

Skill	Description
`skill-creator`	Creates new skills or improves existing ones; tightens descriptions for reliable triggering
`writing-skills`	Creates, edits, and verifies skills before deployment
`find-skills`	Helps discover skills when asking about capabilities

License

MIT

gearbox is inspired by the agent harness patterns developed at PureGym and the obra/superpowers skill collection.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.agents/skills		.agents/skills
.gearbox		.gearbox
.github		.github
bin		bin
src		src
test		test
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
run-tests.mjs		run-tests.mjs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gearbox ⚙

What you get

Requirements

Installation

After installation

What gets installed

`.gearbox/hooks/` — shared hook runtime assets

`.gearbox/scripts/` — 10 user-facing harness utilities and helper CLIs

`{skillsDir}/` — 33 portable skills

`.github/workflows/` — 7 agentic workflows

Platform config files

Durable memory contract

`package.json` scripts

Supported platforms

The eval flywheel

Skills reference

Process & quality

Agent architecture & meta-skills

Code & architecture

Git & GitHub

Documentation & knowledge

Skill management

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gearbox ⚙

What you get

Requirements

Installation

After installation

What gets installed

.gearbox/hooks/ — shared hook runtime assets

.gearbox/scripts/ — 10 user-facing harness utilities and helper CLIs

{skillsDir}/ — 33 portable skills

.github/workflows/ — 7 agentic workflows

Platform config files

Durable memory contract

package.json scripts

Supported platforms

The eval flywheel

Skills reference

Process & quality

Agent architecture & meta-skills

Code & architecture

Git & GitHub

Documentation & knowledge

Skill management

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`.gearbox/hooks/` — shared hook runtime assets

`.gearbox/scripts/` — 10 user-facing harness utilities and helper CLIs

`{skillsDir}/` — 33 portable skills

`.github/workflows/` — 7 agentic workflows

`package.json` scripts

Packages