fix(acp): propagate agent_key through MCP bridge so tools resolve identity by codebit0 · Pull Request #1094 · nextlevelbuilder/goclaw

codebit0 · 2026-05-04T13:47:41Z

Summary

Bug: Tools that key off agent_key (sessions_history, sessions_send, session ownership checks) failed with agent context required when invoked through the Gemini ACP path. The MCP bridge headers signed only the agent UUID, so ToolAgentKeyFromCtx returned empty downstream.
Fix: bridgeContextMiddleware already loads the agent row to populate ShellDenyGroups. Reuse that lookup to also inject store.WithAgentKey and tools.WithToolAgentKey from ag.AgentKey. No header / HMAC schema change.
Defensive fallback: ToolAgentKeyFromCtx adds a third fallback to store.AgentKeyFromContext so any future caller that sets only the store-level key still resolves correctly.
Adapter scaffold: Add internal/agent/scope.go with an AgentScope value object + ScopeFromContext adapter. Composes from existing store.* readers + RunContext so it is purely additive — no existing reader is modified. Provides a single read surface for new code (especially across process boundaries) without touching the legacy ctx-key fan-out.

Why this is OO-shaped, not a one-line patch

The recurring class of bugs is Domain Primitive Obsession: agent identity is fanned out across store.AgentKey, tools.ctxAgentKey, RunContext.AgentToolKey, and (for ACP) the MCP-bridge headers — so any boundary forgets one of them. The bridge fix is the immediate symptom fix; AgentScope gives new code a single value-object surface to read from, with the legacy readers as the fallback. Existing call sites are untouched, keeping the upstream merge surface tiny.

Files

internal/agent/scope.go (new) — AgentScope value object + adapter
internal/agent/scope_test.go (new) — 6 unit tests (zero-value / identity / inject / legacy composition / RunContext fallback / explicit-Inject precedence)
internal/gateway/server.go — bridge middleware injects agent_key from the agent row already loaded for shell deny groups
internal/tools/context_keys.go — ToolAgentKeyFromCtx adds store.AgentKeyFromContext as a third fallback

Test plan

go build ./... and go build -tags sqliteonly ./... clean
go vet ./... clean
go test ./internal/agent/ ./internal/tools/ -count=1 pass
Smoke: invoke sessions_history from a Gemini ACP agent and confirm it no longer errors agent context required

- ACP MCP 브리지 노출: Gemini가 skill_search 등 goclaw 빌트인 툴 사용 가능 - NewBridgeServer가 BuiltinToolStore에서 활성화된 툴 목록을 동적으로 로딩 - bridgeAlwaysExcluded(spawn, create_forum_topic, heartbeat) 제외 처리 - Gemini McpServer 헤더 포맷 수정: 스펙(object) 대신 Gemini CLI 0.36.x 실제 구현에 맞춰 []McpServerKV 배열 형식으로 변경 - ACP 퍼-세션 cwd 격리: cli-workspaces와 동일하게 세션별 독립 워크스페이스 생성 - enrichGeminiACPArgs: skills-store 등 5개 스킬 소스 디렉토리를 --include-directories로 추가 - buildACPMcpServersFunc: context 헤더 + HMAC 인증 + UI 등록 MCP 서버 통합 - is_system 시맨틱 정리: UpsertSystemSkill이 is_system을 강제하지 않음 - INSERT: is_system=false (운영자가 명시적으로 설정) - UPDATE: is_system 컬럼 유지 (시더 재실행이 운영자 설정 덮어쓰지 않음) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The Python dependency check used a 5s context deadline. Importing a typical skill bundle (numpy + vectorbt + pandas + psycopg2 + dotenv) takes ~4.4s on idle and easily crosses 5s under load. On context deadline, the previous code marked every package as missing and re-triggered pip install for the whole set, which produced more load and made the next check time out again — an install loop visible in journalctl as the same packages being installed every ~30 seconds. - Raise depCheckTimeout from 5s to 30s. - On context.DeadlineExceeded, log a warning and return nil (assume present) instead of treating every import as missing. - On other exec errors, surface the captured stderr so future failures are diagnosable instead of opaque.

ACP-mode agents (Atlas, Cortana, etc.) reach goclaw tools only through the /mcp/bridge MCP server, which exposes the subset of tools listed in builtin_tools with enabled=true. Several tools registered in the Go tool registry were never seeded into builtin_tools, so ACP agents could not call them and reported errors like 'datetime tool is not available'. Native-provider agents read tools.Registry directly and were unaffected, which masked the gap. - Seed datetime, delegate, mcp_tool_search, list_group_members, vault_read, vault_search in builtinToolSeedData. The seed reconcile step (DELETE WHERE name != ALL) means manual INSERTs were wiped on every restart; the seed function is the single source of truth. - Always construct an mcp.Manager (empty when no servers configured) and register mcp_tool_search. This lets the BM25 discovery tool be available to LLMs immediately; it returns 'no tools found' until external MCP servers are added, then becomes useful without a redeploy.

…or default args End-to-end Gemini CLI 0.40.1 ACP fixes, verified live with `datetime` tool returning correct time text through the full ACP→MCP bridge path. Changes: 1. ACP spec method name aliases (tool_bridge.go) The ACP spec uses snake_case (`fs/read_text_file`, `terminal/wait_for_exit`, `session/request_permission`); only Claude CLI ever emitted camelCase. Each switch case now accepts both, so newer Gemini builds no longer fall through to "unknown method" — the JSON-RPC error that Gemini's catch path stringifies as `[object Object]` in tool_call_update content because the rejection isn't a JS Error. 2. Kind-based session permission (tool_bridge.go, types.go) Replace the legacy `RequestPermissionRequest`/`Response` (flat outcome) with new `SessionRequestPermissionRequest`/`Response` that match the spec nested-outcome shape: `{outcome: {outcome: "selected", optionId}}` or `{outcome: {outcome: "cancelled"}}`. handleSessionPermission selects by `PermissionOption.Kind` (allow_once / allow_always / reject_once / reject_always) since OptionID is agent-defined and unstable across versions. Approve-all prefers allow_once over allow_always. 3. Provider-layer vendor-arg + include-directories injection (acp_provider.go, gateway_providers.go) Removes the cmd-level `enrichGeminiACPArgs` in favor of two cleaner primitives in the providers package, mirroring the architecture in merge/2026-04-29: - `WithIncludeDirectories(dirs []string)` option: callers pass candidate skill directories; NewACPProvider stat-filters and emits `--include-directories <dir>` pairs only for gemini. - `applyVendorDefaultArgs(binary, args)` helper: appends per-binary CLI defaults that goclaw's deployment requires regardless of user state. For gemini, injects `--skip-trust` so MCP discovery runs even when the per-session cwd inherits DO_NOT_TRUST from `~/.gemini/trustedFolders.json` (ACP sessions always run inside a goclaw-managed sandbox, so the user-facing trust gate is moot). The cmd layer now only enumerates candidate skill dirs via `acpSkillCandidateDirs(workspace)` and passes them as an option; binary gating and vendor flag emission live in the provider. 4. ACP ClientInfo.Name reset to empty Restored to "" (no ClientInfo identity broadcast); aligns with the pre-debug state. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

When the ACP process is respawned, resolveSession asks the agent to restore the previous session via session/load. Some agents (notably Gemini CLI 0.39.x with gemini-3.1-pro-preview) reply with success but an empty sessionId — likely because their internal session store was cleared with the process restart and they have no actual session to restore. Previously we stored that empty string as the new session ID, then sent session/prompt with sid="" which produces a JSON-RPC -32603 "Internal error" on the agent side. The error was suppressed for external channels (telegram), so users just saw "no response." Now treat empty sid as a soft failure equivalent to an explicit load error: fall through to NewSession to get a fresh, valid session ID. Conversation history is lost on this path (same as a real load failure), but at least the agent can respond instead of hanging. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Restores the per-session watchdog that fires session/cancel after prompt_timeout (default 10 min) of no session/update activity. Without this, an ACP agent that hangs silently (network glitch, model stall) keeps the goroutine blocked indefinitely. Three-piece port from 1e940b1 onto current source: - internal/providers/acp/session.go: promptInactivityTimeout package var, watchdog goroutine in Prompt() that polls lastActivity and cancels when stale - internal/providers/acp/process.go: promptTimeout field on ACPProcess + ProcessPool, SetPromptTimeout/getPromptTimeout for runtime config - internal/providers/acp_provider.go: WithACPSessionTTL, WithACPPromptTimeout options Also restores per-server logging in session/new (logs each MCP server URL/command at INFO so misconfig is visible). Cherry-pick of 1e940b1 — auto-merged cleanly with current acp_provider.go. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Ports 36e13bf from nextlevelbuilder#1069. Two related improvements that share the same plumbing: 1. GEMINI.md as out-of-band system prompt writeGeminiMD writes the system prompt to <session-cwd>/GEMINI.md once per session. Gemini CLI reads this automatically as project context, so we no longer need to repeat the (often >30KB) system prompt on every prompt turn — only the current user message is sent. When the system prompt content changes, the file is rewritten and the live ACP session is invalidated so the next request starts a fresh session that loads the updated instructions. 2. isNew flag for session reset history restore resolveSession now returns (sid, isNew, err). isNew=true means the session was just created via session/new (no prior context known to the agent). extractACPContent uses this to decide: - isNew=false: send only the current user message (system prompt comes from GEMINI.md, history is in agent state) - isNew=true: serialise full conversation transcript (system excluded) so episodic summaries + recent turns are restored even after a process respawn or session invalidation Auto-merged cleanly with the empty-sid fallback (18f57c8) — empty sid path now correctly produces isNew=true. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Generalises the previous gemini-only writeGeminiMD into per-binary context-file dispatch, fixing two correctness issues for non-gemini ACP providers: 1. Wrong filename written: every ACP provider was producing GEMINI.md in its session workspace — fine for Gemini CLI, ignored by Claude CLI (looks for CLAUDE.md) and Codex (AGENTS.md). The file was wasted disk. 2. System prompt loss for non-gemini binaries: extractACPContent's isNew=false short-circuit ("only send the current user message — GEMINI.md handles system prompt") fired for ALL binaries, including ones that don't read GEMINI.md. Continuation turns ran without any system prompt, leaving claude/codex agents instruction-less. Changes: - ACPProvider gains `binary` and `contextFileName` fields, populated from the new `contextFileNameForBinary(binary)` mapping (claude→CLAUDE.md, gemini→GEMINI.md, codex→AGENTS.md, else ""). - writeGeminiMD → writeContextFile: writes the binary-appropriate filename, skips entirely when contextFileName is empty. - extractACPContent gains `hasContextFile bool`. When false (unknown binary), every turn embeds the system prompt as a [System] block in front of the current user message — preserves pre-feature behaviour. When true, the existing isNew-aware optimised paths run. - Chat / ChatStream pass `hasContextFile = p.contextFileName != ""`. Verified live with atlas (gemini) — GEMINI.md still produced, datetime tool round-trip works. Non-gemini binaries exercised through unit test + vet only (no live claude/codex deployment to test against). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@modifier

…ered runs Cron jobs / heartbeat ticks / any run path that doesn't carry a request-scoped locale (no Accept-Language header, no WS connect param) used to default straight to "en", which forced English output even when the operator and the agent's persona had a different language preference. atlas with "Language: 한국어" in its IDENTITY.md was getting English cron meta-prompts that nudged the LLM into English replies. Resolution cascade (centralised in internal/i18n): request ctx (HTTP Accept-Language, WS connect param) ↓ GOCLAW_DEFAULT_LOCALE env override ↓ POSIX system locale (LC_ALL > LC_MESSAGES > LANG) ↓ "en" Changes: - internal/i18n/locale_default.go (new): Default() with resolveDefault() cascade, normalizePOSIXLocale() that strips ".UTF-8" / "@modifier" / "_REGION" suffixes (ko_KR.UTF-8 → ko). Result is sync.Once-cached; ResetDefaultForTest exposed for hermetic env-driven tests. - internal/i18n/i18n.go: register LocaleKO so the cascade can return "ko". Catalog entries themselves arrive separately in nextlevelbuilder#1081; until then i18n.T for missing ko keys falls back to English via existing lookup logic. - internal/store/context.go: LocaleFromContext now delegates to i18n.Default() when ctx has no value, instead of hardcoding "en". All existing call sites (loop_pipeline_callbacks, loop_finalize, etc) automatically pick up the cascade — no callsite changes needed. - cmd/gateway_cron.go: extract extraPrompt-building into buildCronExtraPrompt. Switch on store.LocaleFromContext(ctx) to return ko / vi / zh / en text. cron handler itself has no if/else for locale — the cascade lives in i18n. - Tests: locale_default_test.go covers env priority + POSIX parsing edge cases (C, POSIX, unsupported langs). gateway_cron_test.go covers the buildCronExtraPrompt locale matrix (ctx wins, env fallback, no-delivery variant). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- 백엔드: LocaleKO 상수, IsSupported에 ko 추가, catalog_ko.go 185개 문자열 - 웹 UI: 38개 네임스페이스 JSON 파일 (locales/ko/) - constants.ts SUPPORTED_LANGUAGES에 ko 추가 - index.ts에 ko 리소스 등록 및 getInitialLanguage에 ko 감지 - en/topbar.json languages에 한국어 옵션 추가 - webui handler: index.html에 no-cache 헤더 추가하여 배포 후 캐시 문제 방지

… levels Three small operational improvements ported from nextlevelbuilder#1069's 00a817f, selecting only the parts that don't depend on the channel watchdog infrastructure (which is a separate feature in nextlevelbuilder#1069 and not yet on dev). - channels: export MaskBotToken(err) — regex replaces "bot<id>:<secret>/" → "bot***/" in error strings. Apply to "failed to start channel" log in manager.go and "failed to sync telegram menu commands" log in the telegram channel. Without this, journalctl can leak the bot token whenever Telegram returns an error containing the upstream URL. - consolidation: episodic summarize timeout 30s → 120s. ACP-mode providers (Gemini CLI etc.) need 5~10s for spawn + initialize + session/new before the LLM call begins; 30s often timed out during the model response itself, leaving sessions without summaries. - tools/filesystem: rebalance security.path_escape log levels. resolvePathWithAllowed now emits warn (was silent) when the allow-list fallback rejects a path; the inner resolvePath drops to debug because resolvePathWithAllowed re-emits warn at its boundary, avoiding double-warns for routine deny cases. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Strip <current_reply_target> and <extra_context> before persisting to per-binary context file (CLAUDE.md / GEMINI.md / AGENTS.md). Those tags are turn-specific dynamic content; they're injected inline via extractACPContent each turn. - Compare stable content via sha256 sidecar (.sha256) instead of byte-exact. Skip-if-same now triggers no session invalidation when only dynamic portions of the system prompt change. - Net effect: ACP sessions stay alive across chat-target / extra-prompt changes, preserving CLI-side context accumulation. Avoids spurious session/new round-trips that previously dropped the agent's working memory. - Add 4 unit tests for stable-content extraction and write idempotence.

…ntity Symptom: tools that key off agent_key (sessions_history, sessions_send, session ownership checks) failed with "agent context required" when invoked through Gemini ACP. The MCP bridge headers signed only the agent UUID, so ToolAgentKeyFromCtx returned empty downstream. Bridge fix (gateway/server.go): - bridgeContextMiddleware already loads the agent row to populate ShellDenyGroups. Reuse that lookup to also inject store.WithAgentKey and tools.WithToolAgentKey when ag.AgentKey is non-empty. No header / HMAC schema change. Defensive fallback (tools/context_keys.go): - ToolAgentKeyFromCtx adds a third fallback to store.AgentKeyFromContext so any future caller that sets store.WithAgentKey but not the tools-package key still resolves correctly. Adapter scaffold (agent/scope.go): - Introduce AgentScope value object grouping agent_key, agent_id, tenant_id, user_id, session_key, agent_type into one immutable snapshot. Inject / ScopeFromContext form a single read surface for new code (especially across process boundaries like ACP) without modifying existing per-key readers. - ScopeFromContext composes from existing readers (store.* + RunContext) as the legacy fallback, so the value object is purely additive. - 6 unit tests cover IsZero / HasAgentIdentity / Inject / legacy-key composition / RunContext fallback / explicit Inject beats legacy.

codebit0 and others added 14 commits May 2, 2026 10:26

fix(cli): add missing X-GoClaw-User-Id header to gateway client

23f0889

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(acp): propagate agent_key through MCP bridge so tools resolve identity#1094

fix(acp): propagate agent_key through MCP bridge so tools resolve identity#1094
codebit0 wants to merge 14 commits intonextlevelbuilder:devfrom
codebit0:refactor/agent-scope-adapter

codebit0 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant