Skip to content

fix: gstack-memory-ingest calls non-existent gbrain put_page CLI verb#1328

Closed
smithjoshua wants to merge 1 commit intogarrytan:mainfrom
smithjoshua:fix/memory-ingest-put-cli-verb
Closed

fix: gstack-memory-ingest calls non-existent gbrain put_page CLI verb#1328
smithjoshua wants to merge 1 commit intogarrytan:mainfrom
smithjoshua:fix/memory-ingest-put-cli-verb

Conversation

@smithjoshua
Copy link
Copy Markdown
Contributor

@smithjoshua smithjoshua commented May 5, 2026

Symptom

/setup-gbrain Step 7.5 (transcript & memory ingest gate) writes 0 pages on a clean install. With --quiet removed, every page errors with:

Unknown command: put_page
Run gbrain --help for available commands.

Root cause

bin/gstack-memory-ingest.ts:767-779 shells out via:

const args = ["put_page", "--slug", page.slug, "--title", page.title,
              "--type", page.type, "--tags", page.tags.join(",")];
execFileSync("gbrain", args, { input: page.body, ... });

The gbrain CLI (v0.18+ through current v0.26.7) does not expose a put_page subcommand. The verb is gbrain put <slug> and it accepts only <slug> positionally plus content via stdin or --content. There is no --title, --type, or --tags flag on the CLI. put_page is the MCP tool name; this looks like confusion between the MCP and CLI surfaces.

$ gbrain put --help
Usage: gbrain put <slug> [options]
Options:
  <slug>     Page slug (required)
  --content  Full markdown content with YAML frontmatter (required)

Repro (current main)

bun bin/gstack-memory-ingest.ts --bulk --include-unattributed --limit 1

Result: written: 0, failed: 1.

Confirmed on a clean install with gbrain v0.26.7 + gstack v1.26.3.0.

Fix

gbrain put reads tags/title/type from YAML frontmatter, so:

  1. Switch to the correct CLI verb (put instead of put_page).
  2. Inject title, type, tags into the frontmatter that buildTranscriptPage / buildArtifactPage already produce, instead of passing them as CLI flags.

Bundled ergonomic improvements in the same function:

  • timeout: 30000 -> 60000. Auto-link reconciliation on dense transcripts hits 30s once the brain has a few hundred existing pages; the original limit caused timeouts in real backfills.
  • maxBuffer: 16 MB. Without this, Node truncates gbrain's stderr at the default 1 MB and callers see only Command failed: with no detail. Cost real debugging time on this very bug.
  • Surface stderr/stdout in the returned error message instead of the bare exception, for the same reason.

Tests

  • bun test test/gstack-memory-ingest.test.ts -> 15/15 pass
  • bun test on the three test files touching this code path (gstack-memory-ingest, gstack-memory-helpers, skill-validation) -> 362/362 pass

Validated end-to-end on a real machine: 122 Claude Code transcripts + 108 Codex transcripts written cleanly with this patch (via a temporary one-off wrapper that exercises the same gbrain put codepath this PR introduces).

Versions

  • gstack: v1.26.3.0 (current main, db9447c)
  • gbrain: v0.26.7
  • Bun: 1.3.12
  • macOS arm64
  • Affects every release since v1.26.0.0 (where transcript ingest landed)

`put_page` is the MCP tool name, not a CLI subcommand. The actual
gbrain verb is `put <slug>` with content via stdin and tags in YAML
frontmatter. Every transcript / memory ingest fails today on clean
installs.

Switch to the right verb and inject title/type/tags into the
frontmatter that buildTranscriptPage / buildArtifactPage already
produce.

Bundled in the same function:

- timeout: 30s → 60s. Auto-link reconciliation hits 30s once the
  brain has a few hundred pages.
- maxBuffer: 1MB → 16MB. Without it Node truncates gbrain's stderr
  and callers see only `Command failed:` with no detail.
- Surface stderr/stdout in the returned error instead of the bare
  exception.

Verified: bun test test/gstack-memory-ingest.test.ts -> 15/15 pass.
bun test on the three test files touching this path -> 362/362.
@metaWuming
Copy link
Copy Markdown

+1, confirming this on macOS / gstack v1.26.3.0 / gbrain v0.18.2 / PGLite.

Independent diagnosis matched yours exactly: gstack-memory-ingest.ts:762-784 shells out with put_page --slug ... --title ... --type ... --tags ... while gbrain ≥0.18.x only registers put <slug> --content (slug positional, metadata in YAML frontmatter).

Concrete impact data point on my install — mcp__gbrain__get_health reports page_count: 0, embed_coverage: 0 despite git-sync to my brain repo working fine. So the brain manifests load empty across all v1.26 skills, exactly as #1305 describes.

Your fix (swap to put + inject title/type/tags into frontmatter) matches Brett's Option A in #1305 and is the minimum-diff path. Hoping this lands soon — the headline v1.26 transcript ingest feature is non-functional on every fresh setup until then.

@chriskedz
Copy link
Copy Markdown

+1, independently hit and patched the identical bug on a clean install (gstack v1.26.3.0 / gbrain v0.27.0 / Supabase tier). Without this fix, bulk ingest writes 0/104 pages. With it, 104/104, 0 failures.

One edge case worth folding in before merge: the body.startsWith("---\n") branch only catches transcript pages (where buildTranscriptPage prepends the agent/session_id frontmatter). buildArtifactPage returns raw.slice(0, 200000) of a markdown file — and plenty of ~/.gstack/projects/*/learnings.jsonl, *-design-*.md, and ceo-plans/*.md don't start with ---. Under your current patch, those would still ingest, but title/type/tags get silently dropped, which breaks gbrain list --type learning and tag filters downstream.

Two-line addition handles it:

if (body.startsWith("---\n")) {
  // ...existing inject path...
} else {
  body = `---\ntitle: ${JSON.stringify(page.title)}\ntype: ${page.type}\ntags:\n${page.tags.map(t => `  - ${t}`).join("\n")}\n---\n\n` + body;
}

Verified shape end-to-end: artifact pages roundtrip through gbrain get <slug> with all three fields visible in the rendered frontmatter.

Otherwise this is exactly the patch we want — the headline v1.26 transcript ingest feature is non-functional on every fresh setup until it lands. Hoping for a merge soon.

garrytan added a commit that referenced this pull request May 7, 2026
…n-valid source ids (#1344)

* fix: use correct `gbrain put <slug>` CLI verb in memory ingest

`put_page` is the MCP tool name, not a CLI subcommand. The actual
gbrain verb is `put <slug>` with content via stdin and tags in YAML
frontmatter. Every transcript / memory ingest fails today on clean
installs.

Switch to the right verb and inject title/type/tags into the
frontmatter that buildTranscriptPage / buildArtifactPage already
produce.

Bundled in the same function:

- timeout: 30s → 60s. Auto-link reconciliation hits 30s once the
  brain has a few hundred pages.
- maxBuffer: 1MB → 16MB. Without it Node truncates gbrain's stderr
  and callers see only `Command failed:` with no detail.
- Surface stderr/stdout in the returned error instead of the bare
  exception.

Verified: bun test test/gstack-memory-ingest.test.ts -> 15/15 pass.
bun test on the three test files touching this path -> 362/362.

* fix(sync-gbrain): generate gbrain-valid source ids for repos with dots or long names

`deriveCodeSourceId` previously concatenated the canonicalized remote with only `/`
and whitespace stripped, leaving dots from hostnames (`github.com`) and no length
cap. gbrain rejects any source id containing characters outside [a-z0-9-] or longer
than 32 chars, so `github.com/<org>/<repo>` produced `gstack-code-github.com-<org>-<repo>`
(40 chars, plus dots) and registration failed:

    code  source registration failed: Invalid source id
          "gstack-code-github.com-radubach-platform". Must be 1-32 lowercase alnum
          chars with optional interior hyphens.

Fix:
- Drop the host segment (`github.com` is the same for nearly every user and just
  consumes the 32-char budget). Use only the last two path segments (org-repo).
- Sanitize any remaining non-alnum to hyphens, then collapse and trim.
- For genuinely long org/repo names that still exceed the budget, keep the tail
  (most distinctive end of the slug) and append a 6-char sha1 hash for collision
  resistance.

Adds a regression test that spawns the CLI in temp git repos with controlled
remotes (dot in hostname, SCP-style, multi-dot host, long names forcing
hash-truncation) and asserts every derived id is ≤32 chars and matches the
gbrain validator regex.

* fix(memory-ingest): hybrid frontmatter writer + tightened gbrain availability probe

PR #1328 (merged in the prior commit) correctly injects title/type/tags
into the YAML frontmatter that buildTranscriptPage already prepends. But
buildArtifactPage emits raw markdown without frontmatter, so design-docs,
learnings, and builder-profile-entries were landing in gbrain with empty
title/type/tags. Add the no-frontmatter wrap branch so artifact pages get
the same metadata the inject branch provides for transcripts.

Also bring in gbrainAvailable()'s --help probe (originally proposed in
PR #1341 by Alex Medina), with the regex tightened from /(^|\s)put(\s|$)/m
to /^\s+put\s/m. Anchoring on the indented subcommand format gbrain's
help actually uses keeps the probe from matching "put" appearing as
prose in help text, while still failing fast with one clean error if a
future gbrain renames or removes the put subcommand.

Updates the V1.5 NOTE doc block at the top of the file to describe the
current put-via-stdin shape rather than the legacy put_page flag form.

Co-Authored-By: Alex Medina <oficina@puntoverdemc.com>

* test+fix(memory-ingest): strengthen regression tests, fix inject for malformed-close frontmatter

Imports the shim-based regression tests from PR #1341 (Alex Medina) and
strengthens them to assert title, type, and tags actually arrive in put
stdin — not just `agent: claude-code`. Asserting the metadata fields
matches the regression class that's caused this fix wave: writers can
"succeed" while metadata is silently lost. The original PR #1341 tests
would have passed even with title/type/tags missing.

Strengthening the test surfaced a deeper issue. buildTranscriptPage joins
frontmatter array elements with "\n" and does not append a trailing
newline, so the close fence is "\n---<content>" directly, not "\n---\n".
PR #1328's inject branch searched for "\n---\n" and never matched —
which means even with PR #1328 alone, transcript pages were landing in
gbrain with no title/type/tags. Two-line fix: search for "\n---" only,
since the inject lands before the close fence regardless of what
follows it.

Also imports PR #1341's V1.5 NOTE doc-block update and the section
comment refresh so the prose stays accurate against the new writer
shape.

Co-Authored-By: Alex Medina <oficina@puntoverdemc.com>

* fix+test(gbrain-sync): handle empty-slug edge in constrainSourceId, add no-origin and basename-empty regression tests

PR #1330 (merged in the prior commit) addressed the dot-in-host and
length-overflow cases for source-id derivation, but constrainSourceId
silently returned "${prefix}-" when the input sanitized to an empty
slug — invalid per gbrain's `^[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?$`
validator on the trailing hyphen. Adds an explicit empty-slug branch
that falls back to a sha1-prefixed id ("gstack-code-<6hex>") so the
output stays gbrain-valid for every input shape.

Two new regression tests cover the corners PR #1330's coverage left
exposed:
- no-origin fallback: a cwd repo with no `origin` remote configured
  must still derive a valid id from the basename.
- basename-sanitizes-to-empty: a repo whose path basename is all
  non-alnum (e.g. "___") must produce the hash-only fallback, not
  an invalid trailing-hyphen id.

Both run the CLI inside temp git repos for genuine end-to-end
coverage (matches the pattern PR #1330 established for its own four
remote-shape cases).

Co-Authored-By: Richard Dubach <radubach@gmail.com>

* chore: bump VERSION to 1.26.5.0 + CHANGELOG entry for fix wave

PATCH bump. Three bug fixes (memory-ingest put_page CLI verb mismatch,
hybrid frontmatter writer for transcripts AND artifacts, gbrain-valid
source-id derivation for github-hosted repos), no new user capability.

CHANGELOG release-summary leads with what users can now do (clean-
install transcripts populate the brain, github-hosted repos register
code sources) and tabulates before/after numbers from real gbrain
v0.25.1 smoke output. Itemized changes credit @smithjoshua, @AZ-1224,
and @radubach for the originating PRs plus the additional hybrid
branch + strengthened tests added on top per Codex plan-review.

* docs(todos): file P2 (gbrain install-pin staleness) + P3 (source-id host-collision) follow-ups

Two follow-ups surfaced during the v1.26.5.0 fix-wave plan review.

P2 — Issue #1305 part 2: bin/gstack-gbrain-install pins gbrain to
v0.18.2 (commit 08b3698) but doesn't move when gstack ships features
that depend on newer gbrain ops or schema. Fresh /setup-gbrain on
v1.26.x lands users on schema 24 with v1.26 features expecting 32+.
Captured for a future fix-wave.

P3 — Codex P1.3 from the v1.26.5.0 plan review: deriveCodeSourceId
drops the host segment to fit gbrain's 32-char source-id budget,
which means github.com/acme/foo and gitlab.com/acme/foo collapse to
the same source id. Real but rare; PR #1330 author explicitly
considered this and chose budget over cross-host uniqueness. Captured
as a long-tail concern.

---------

Co-authored-by: Joshua Smith <joshualowellsmith@gmail.com>
Co-authored-by: Richard Dubach <radubach@gmail.com>
Co-authored-by: Alex Medina <oficina@puntoverdemc.com>
@smithjoshua
Copy link
Copy Markdown
Contributor Author

Closing — superseded by #1344, which Garry merged in v1.26.5.0 (2026-05-07). Same put_pageput fix plus the source-id validator bug we noticed in passing. Thanks for shipping the fix wave.

Tested end-to-end against my real transcript pile (228 files) using a temporary workaround script while #1328 was in review; the upstream v1.26.5.0 path replaces that workaround cleanly.

@smithjoshua smithjoshua closed this May 8, 2026
@smithjoshua smithjoshua deleted the fix/memory-ingest-put-cli-verb branch May 8, 2026 04:54
gonnabe88 pushed a commit to gonnabe88/gstack that referenced this pull request May 9, 2026
…n-valid source ids (garrytan#1344)

* fix: use correct `gbrain put <slug>` CLI verb in memory ingest

`put_page` is the MCP tool name, not a CLI subcommand. The actual
gbrain verb is `put <slug>` with content via stdin and tags in YAML
frontmatter. Every transcript / memory ingest fails today on clean
installs.

Switch to the right verb and inject title/type/tags into the
frontmatter that buildTranscriptPage / buildArtifactPage already
produce.

Bundled in the same function:

- timeout: 30s → 60s. Auto-link reconciliation hits 30s once the
  brain has a few hundred pages.
- maxBuffer: 1MB → 16MB. Without it Node truncates gbrain's stderr
  and callers see only `Command failed:` with no detail.
- Surface stderr/stdout in the returned error instead of the bare
  exception.

Verified: bun test test/gstack-memory-ingest.test.ts -> 15/15 pass.
bun test on the three test files touching this path -> 362/362.

* fix(sync-gbrain): generate gbrain-valid source ids for repos with dots or long names

`deriveCodeSourceId` previously concatenated the canonicalized remote with only `/`
and whitespace stripped, leaving dots from hostnames (`github.com`) and no length
cap. gbrain rejects any source id containing characters outside [a-z0-9-] or longer
than 32 chars, so `github.com/<org>/<repo>` produced `gstack-code-github.com-<org>-<repo>`
(40 chars, plus dots) and registration failed:

    code  source registration failed: Invalid source id
          "gstack-code-github.com-radubach-platform". Must be 1-32 lowercase alnum
          chars with optional interior hyphens.

Fix:
- Drop the host segment (`github.com` is the same for nearly every user and just
  consumes the 32-char budget). Use only the last two path segments (org-repo).
- Sanitize any remaining non-alnum to hyphens, then collapse and trim.
- For genuinely long org/repo names that still exceed the budget, keep the tail
  (most distinctive end of the slug) and append a 6-char sha1 hash for collision
  resistance.

Adds a regression test that spawns the CLI in temp git repos with controlled
remotes (dot in hostname, SCP-style, multi-dot host, long names forcing
hash-truncation) and asserts every derived id is ≤32 chars and matches the
gbrain validator regex.

* fix(memory-ingest): hybrid frontmatter writer + tightened gbrain availability probe

PR garrytan#1328 (merged in the prior commit) correctly injects title/type/tags
into the YAML frontmatter that buildTranscriptPage already prepends. But
buildArtifactPage emits raw markdown without frontmatter, so design-docs,
learnings, and builder-profile-entries were landing in gbrain with empty
title/type/tags. Add the no-frontmatter wrap branch so artifact pages get
the same metadata the inject branch provides for transcripts.

Also bring in gbrainAvailable()'s --help probe (originally proposed in
PR garrytan#1341 by Alex Medina), with the regex tightened from /(^|\s)put(\s|$)/m
to /^\s+put\s/m. Anchoring on the indented subcommand format gbrain's
help actually uses keeps the probe from matching "put" appearing as
prose in help text, while still failing fast with one clean error if a
future gbrain renames or removes the put subcommand.

Updates the V1.5 NOTE doc block at the top of the file to describe the
current put-via-stdin shape rather than the legacy put_page flag form.

Co-Authored-By: Alex Medina <oficina@puntoverdemc.com>

* test+fix(memory-ingest): strengthen regression tests, fix inject for malformed-close frontmatter

Imports the shim-based regression tests from PR garrytan#1341 (Alex Medina) and
strengthens them to assert title, type, and tags actually arrive in put
stdin — not just `agent: claude-code`. Asserting the metadata fields
matches the regression class that's caused this fix wave: writers can
"succeed" while metadata is silently lost. The original PR garrytan#1341 tests
would have passed even with title/type/tags missing.

Strengthening the test surfaced a deeper issue. buildTranscriptPage joins
frontmatter array elements with "\n" and does not append a trailing
newline, so the close fence is "\n---<content>" directly, not "\n---\n".
PR garrytan#1328's inject branch searched for "\n---\n" and never matched —
which means even with PR garrytan#1328 alone, transcript pages were landing in
gbrain with no title/type/tags. Two-line fix: search for "\n---" only,
since the inject lands before the close fence regardless of what
follows it.

Also imports PR garrytan#1341's V1.5 NOTE doc-block update and the section
comment refresh so the prose stays accurate against the new writer
shape.

Co-Authored-By: Alex Medina <oficina@puntoverdemc.com>

* fix+test(gbrain-sync): handle empty-slug edge in constrainSourceId, add no-origin and basename-empty regression tests

PR garrytan#1330 (merged in the prior commit) addressed the dot-in-host and
length-overflow cases for source-id derivation, but constrainSourceId
silently returned "${prefix}-" when the input sanitized to an empty
slug — invalid per gbrain's `^[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?$`
validator on the trailing hyphen. Adds an explicit empty-slug branch
that falls back to a sha1-prefixed id ("gstack-code-<6hex>") so the
output stays gbrain-valid for every input shape.

Two new regression tests cover the corners PR garrytan#1330's coverage left
exposed:
- no-origin fallback: a cwd repo with no `origin` remote configured
  must still derive a valid id from the basename.
- basename-sanitizes-to-empty: a repo whose path basename is all
  non-alnum (e.g. "___") must produce the hash-only fallback, not
  an invalid trailing-hyphen id.

Both run the CLI inside temp git repos for genuine end-to-end
coverage (matches the pattern PR garrytan#1330 established for its own four
remote-shape cases).

Co-Authored-By: Richard Dubach <radubach@gmail.com>

* chore: bump VERSION to 1.26.5.0 + CHANGELOG entry for fix wave

PATCH bump. Three bug fixes (memory-ingest put_page CLI verb mismatch,
hybrid frontmatter writer for transcripts AND artifacts, gbrain-valid
source-id derivation for github-hosted repos), no new user capability.

CHANGELOG release-summary leads with what users can now do (clean-
install transcripts populate the brain, github-hosted repos register
code sources) and tabulates before/after numbers from real gbrain
v0.25.1 smoke output. Itemized changes credit @smithjoshua, @AZ-1224,
and @radubach for the originating PRs plus the additional hybrid
branch + strengthened tests added on top per Codex plan-review.

* docs(todos): file P2 (gbrain install-pin staleness) + P3 (source-id host-collision) follow-ups

Two follow-ups surfaced during the v1.26.5.0 fix-wave plan review.

P2 — Issue garrytan#1305 part 2: bin/gstack-gbrain-install pins gbrain to
v0.18.2 (commit 08b3698) but doesn't move when gstack ships features
that depend on newer gbrain ops or schema. Fresh /setup-gbrain on
v1.26.x lands users on schema 24 with v1.26 features expecting 32+.
Captured for a future fix-wave.

P3 — Codex P1.3 from the v1.26.5.0 plan review: deriveCodeSourceId
drops the host segment to fit gbrain's 32-char source-id budget,
which means github.com/acme/foo and gitlab.com/acme/foo collapse to
the same source id. Real but rare; PR garrytan#1330 author explicitly
considered this and chose budget over cross-host uniqueness. Captured
as a long-tail concern.

---------

Co-authored-by: Joshua Smith <joshualowellsmith@gmail.com>
Co-authored-by: Richard Dubach <radubach@gmail.com>
Co-authored-by: Alex Medina <oficina@puntoverdemc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants