fix(acp): propagate agent_key through MCP bridge so tools resolve identity#1094
Open
codebit0 wants to merge 14 commits intonextlevelbuilder:devfrom
Open
fix(acp): propagate agent_key through MCP bridge so tools resolve identity#1094codebit0 wants to merge 14 commits intonextlevelbuilder:devfrom
codebit0 wants to merge 14 commits intonextlevelbuilder:devfrom
Conversation
- ACP MCP 브리지 노출: Gemini가 skill_search 등 goclaw 빌트인 툴 사용 가능 - NewBridgeServer가 BuiltinToolStore에서 활성화된 툴 목록을 동적으로 로딩 - bridgeAlwaysExcluded(spawn, create_forum_topic, heartbeat) 제외 처리 - Gemini McpServer 헤더 포맷 수정: 스펙(object) 대신 Gemini CLI 0.36.x 실제 구현에 맞춰 []McpServerKV 배열 형식으로 변경 - ACP 퍼-세션 cwd 격리: cli-workspaces와 동일하게 세션별 독립 워크스페이스 생성 - enrichGeminiACPArgs: skills-store 등 5개 스킬 소스 디렉토리를 --include-directories로 추가 - buildACPMcpServersFunc: context 헤더 + HMAC 인증 + UI 등록 MCP 서버 통합 - is_system 시맨틱 정리: UpsertSystemSkill이 is_system을 강제하지 않음 - INSERT: is_system=false (운영자가 명시적으로 설정) - UPDATE: is_system 컬럼 유지 (시더 재실행이 운영자 설정 덮어쓰지 않음) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Python dependency check used a 5s context deadline. Importing a typical skill bundle (numpy + vectorbt + pandas + psycopg2 + dotenv) takes ~4.4s on idle and easily crosses 5s under load. On context deadline, the previous code marked every package as missing and re-triggered pip install for the whole set, which produced more load and made the next check time out again — an install loop visible in journalctl as the same packages being installed every ~30 seconds. - Raise depCheckTimeout from 5s to 30s. - On context.DeadlineExceeded, log a warning and return nil (assume present) instead of treating every import as missing. - On other exec errors, surface the captured stderr so future failures are diagnosable instead of opaque.
ACP-mode agents (Atlas, Cortana, etc.) reach goclaw tools only through the /mcp/bridge MCP server, which exposes the subset of tools listed in builtin_tools with enabled=true. Several tools registered in the Go tool registry were never seeded into builtin_tools, so ACP agents could not call them and reported errors like 'datetime tool is not available'. Native-provider agents read tools.Registry directly and were unaffected, which masked the gap. - Seed datetime, delegate, mcp_tool_search, list_group_members, vault_read, vault_search in builtinToolSeedData. The seed reconcile step (DELETE WHERE name != ALL) means manual INSERTs were wiped on every restart; the seed function is the single source of truth. - Always construct an mcp.Manager (empty when no servers configured) and register mcp_tool_search. This lets the BM25 discovery tool be available to LLMs immediately; it returns 'no tools found' until external MCP servers are added, then becomes useful without a redeploy.
…or default args
End-to-end Gemini CLI 0.40.1 ACP fixes, verified live with `datetime` tool
returning correct time text through the full ACP→MCP bridge path.
Changes:
1. ACP spec method name aliases (tool_bridge.go)
The ACP spec uses snake_case (`fs/read_text_file`,
`terminal/wait_for_exit`, `session/request_permission`); only Claude CLI
ever emitted camelCase. Each switch case now accepts both, so newer
Gemini builds no longer fall through to "unknown method" — the JSON-RPC
error that Gemini's catch path stringifies as `[object Object]` in
tool_call_update content because the rejection isn't a JS Error.
2. Kind-based session permission (tool_bridge.go, types.go)
Replace the legacy `RequestPermissionRequest`/`Response` (flat outcome)
with new `SessionRequestPermissionRequest`/`Response` that match the
spec nested-outcome shape: `{outcome: {outcome: "selected", optionId}}`
or `{outcome: {outcome: "cancelled"}}`. handleSessionPermission selects
by `PermissionOption.Kind` (allow_once / allow_always / reject_once /
reject_always) since OptionID is agent-defined and unstable across
versions. Approve-all prefers allow_once over allow_always.
3. Provider-layer vendor-arg + include-directories injection
(acp_provider.go, gateway_providers.go)
Removes the cmd-level `enrichGeminiACPArgs` in favor of two cleaner
primitives in the providers package, mirroring the architecture in
merge/2026-04-29:
- `WithIncludeDirectories(dirs []string)` option: callers pass
candidate skill directories; NewACPProvider stat-filters and emits
`--include-directories <dir>` pairs only for gemini.
- `applyVendorDefaultArgs(binary, args)` helper: appends per-binary
CLI defaults that goclaw's deployment requires regardless of user
state. For gemini, injects `--skip-trust` so MCP discovery runs
even when the per-session cwd inherits DO_NOT_TRUST from
`~/.gemini/trustedFolders.json` (ACP sessions always run inside a
goclaw-managed sandbox, so the user-facing trust gate is moot).
The cmd layer now only enumerates candidate skill dirs via
`acpSkillCandidateDirs(workspace)` and passes them as an option;
binary gating and vendor flag emission live in the provider.
4. ACP ClientInfo.Name reset to empty
Restored to "" (no ClientInfo identity broadcast); aligns with the
pre-debug state.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the ACP process is respawned, resolveSession asks the agent to restore the previous session via session/load. Some agents (notably Gemini CLI 0.39.x with gemini-3.1-pro-preview) reply with success but an empty sessionId — likely because their internal session store was cleared with the process restart and they have no actual session to restore. Previously we stored that empty string as the new session ID, then sent session/prompt with sid="" which produces a JSON-RPC -32603 "Internal error" on the agent side. The error was suppressed for external channels (telegram), so users just saw "no response." Now treat empty sid as a soft failure equivalent to an explicit load error: fall through to NewSession to get a fresh, valid session ID. Conversation history is lost on this path (same as a real load failure), but at least the agent can respond instead of hanging. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Restores the per-session watchdog that fires session/cancel after prompt_timeout (default 10 min) of no session/update activity. Without this, an ACP agent that hangs silently (network glitch, model stall) keeps the goroutine blocked indefinitely. Three-piece port from 1e940b1 onto current source: - internal/providers/acp/session.go: promptInactivityTimeout package var, watchdog goroutine in Prompt() that polls lastActivity and cancels when stale - internal/providers/acp/process.go: promptTimeout field on ACPProcess + ProcessPool, SetPromptTimeout/getPromptTimeout for runtime config - internal/providers/acp_provider.go: WithACPSessionTTL, WithACPPromptTimeout options Also restores per-server logging in session/new (logs each MCP server URL/command at INFO so misconfig is visible). Cherry-pick of 1e940b1 — auto-merged cleanly with current acp_provider.go. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Ports 36e13bf from nextlevelbuilder#1069. Two related improvements that share the same plumbing: 1. GEMINI.md as out-of-band system prompt writeGeminiMD writes the system prompt to <session-cwd>/GEMINI.md once per session. Gemini CLI reads this automatically as project context, so we no longer need to repeat the (often >30KB) system prompt on every prompt turn — only the current user message is sent. When the system prompt content changes, the file is rewritten and the live ACP session is invalidated so the next request starts a fresh session that loads the updated instructions. 2. isNew flag for session reset history restore resolveSession now returns (sid, isNew, err). isNew=true means the session was just created via session/new (no prior context known to the agent). extractACPContent uses this to decide: - isNew=false: send only the current user message (system prompt comes from GEMINI.md, history is in agent state) - isNew=true: serialise full conversation transcript (system excluded) so episodic summaries + recent turns are restored even after a process respawn or session invalidation Auto-merged cleanly with the empty-sid fallback (18f57c8) — empty sid path now correctly produces isNew=true. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Generalises the previous gemini-only writeGeminiMD into per-binary
context-file dispatch, fixing two correctness issues for non-gemini
ACP providers:
1. Wrong filename written: every ACP provider was producing GEMINI.md in
its session workspace — fine for Gemini CLI, ignored by Claude CLI
(looks for CLAUDE.md) and Codex (AGENTS.md). The file was wasted disk.
2. System prompt loss for non-gemini binaries: extractACPContent's
isNew=false short-circuit ("only send the current user message —
GEMINI.md handles system prompt") fired for ALL binaries, including
ones that don't read GEMINI.md. Continuation turns ran without any
system prompt, leaving claude/codex agents instruction-less.
Changes:
- ACPProvider gains `binary` and `contextFileName` fields, populated
from the new `contextFileNameForBinary(binary)` mapping
(claude→CLAUDE.md, gemini→GEMINI.md, codex→AGENTS.md, else "").
- writeGeminiMD → writeContextFile: writes the binary-appropriate
filename, skips entirely when contextFileName is empty.
- extractACPContent gains `hasContextFile bool`. When false (unknown
binary), every turn embeds the system prompt as a [System] block in
front of the current user message — preserves pre-feature behaviour.
When true, the existing isNew-aware optimised paths run.
- Chat / ChatStream pass `hasContextFile = p.contextFileName != ""`.
Verified live with atlas (gemini) — GEMINI.md still produced, datetime
tool round-trip works. Non-gemini binaries exercised through unit test
+ vet only (no live claude/codex deployment to test against).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ered runs
Cron jobs / heartbeat ticks / any run path that doesn't carry a request-scoped
locale (no Accept-Language header, no WS connect param) used to default
straight to "en", which forced English output even when the operator and the
agent's persona had a different language preference. atlas with
"Language: 한국어" in its IDENTITY.md was getting English cron meta-prompts
that nudged the LLM into English replies.
Resolution cascade (centralised in internal/i18n):
request ctx (HTTP Accept-Language, WS connect param)
↓
GOCLAW_DEFAULT_LOCALE env override
↓
POSIX system locale (LC_ALL > LC_MESSAGES > LANG)
↓
"en"
Changes:
- internal/i18n/locale_default.go (new): Default() with resolveDefault()
cascade, normalizePOSIXLocale() that strips ".UTF-8" / "@modifier" / "_REGION"
suffixes (ko_KR.UTF-8 → ko). Result is sync.Once-cached;
ResetDefaultForTest exposed for hermetic env-driven tests.
- internal/i18n/i18n.go: register LocaleKO so the cascade can return "ko".
Catalog entries themselves arrive separately in nextlevelbuilder#1081; until then i18n.T
for missing ko keys falls back to English via existing lookup logic.
- internal/store/context.go: LocaleFromContext now delegates to i18n.Default()
when ctx has no value, instead of hardcoding "en". All existing call sites
(loop_pipeline_callbacks, loop_finalize, etc) automatically pick up the
cascade — no callsite changes needed.
- cmd/gateway_cron.go: extract extraPrompt-building into buildCronExtraPrompt.
Switch on store.LocaleFromContext(ctx) to return ko / vi / zh / en text.
cron handler itself has no if/else for locale — the cascade lives in i18n.
- Tests: locale_default_test.go covers env priority + POSIX parsing edge
cases (C, POSIX, unsupported langs). gateway_cron_test.go covers the
buildCronExtraPrompt locale matrix (ctx wins, env fallback, no-delivery
variant).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- 백엔드: LocaleKO 상수, IsSupported에 ko 추가, catalog_ko.go 185개 문자열 - 웹 UI: 38개 네임스페이스 JSON 파일 (locales/ko/) - constants.ts SUPPORTED_LANGUAGES에 ko 추가 - index.ts에 ko 리소스 등록 및 getInitialLanguage에 ko 감지 - en/topbar.json languages에 한국어 옵션 추가 - webui handler: index.html에 no-cache 헤더 추가하여 배포 후 캐시 문제 방지
… levels Three small operational improvements ported from nextlevelbuilder#1069's 00a817f, selecting only the parts that don't depend on the channel watchdog infrastructure (which is a separate feature in nextlevelbuilder#1069 and not yet on dev). - channels: export MaskBotToken(err) — regex replaces "bot<id>:<secret>/" → "bot***/" in error strings. Apply to "failed to start channel" log in manager.go and "failed to sync telegram menu commands" log in the telegram channel. Without this, journalctl can leak the bot token whenever Telegram returns an error containing the upstream URL. - consolidation: episodic summarize timeout 30s → 120s. ACP-mode providers (Gemini CLI etc.) need 5~10s for spawn + initialize + session/new before the LLM call begins; 30s often timed out during the model response itself, leaving sessions without summaries. - tools/filesystem: rebalance security.path_escape log levels. resolvePathWithAllowed now emits warn (was silent) when the allow-list fallback rejects a path; the inner resolvePath drops to debug because resolvePathWithAllowed re-emits warn at its boundary, avoiding double-warns for routine deny cases. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Strip <current_reply_target> and <extra_context> before persisting to per-binary context file (CLAUDE.md / GEMINI.md / AGENTS.md). Those tags are turn-specific dynamic content; they're injected inline via extractACPContent each turn. - Compare stable content via sha256 sidecar (.sha256) instead of byte-exact. Skip-if-same now triggers no session invalidation when only dynamic portions of the system prompt change. - Net effect: ACP sessions stay alive across chat-target / extra-prompt changes, preserving CLI-side context accumulation. Avoids spurious session/new round-trips that previously dropped the agent's working memory. - Add 4 unit tests for stable-content extraction and write idempotence.
…ntity Symptom: tools that key off agent_key (sessions_history, sessions_send, session ownership checks) failed with "agent context required" when invoked through Gemini ACP. The MCP bridge headers signed only the agent UUID, so ToolAgentKeyFromCtx returned empty downstream. Bridge fix (gateway/server.go): - bridgeContextMiddleware already loads the agent row to populate ShellDenyGroups. Reuse that lookup to also inject store.WithAgentKey and tools.WithToolAgentKey when ag.AgentKey is non-empty. No header / HMAC schema change. Defensive fallback (tools/context_keys.go): - ToolAgentKeyFromCtx adds a third fallback to store.AgentKeyFromContext so any future caller that sets store.WithAgentKey but not the tools-package key still resolves correctly. Adapter scaffold (agent/scope.go): - Introduce AgentScope value object grouping agent_key, agent_id, tenant_id, user_id, session_key, agent_type into one immutable snapshot. Inject / ScopeFromContext form a single read surface for new code (especially across process boundaries like ACP) without modifying existing per-key readers. - ScopeFromContext composes from existing readers (store.* + RunContext) as the legacy fallback, so the value object is purely additive. - 6 unit tests cover IsZero / HasAgentIdentity / Inject / legacy-key composition / RunContext fallback / explicit Inject beats legacy.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
agent_key(sessions_history,sessions_send, session ownership checks) failed withagent context requiredwhen invoked through the Gemini ACP path. The MCP bridge headers signed only the agent UUID, soToolAgentKeyFromCtxreturned empty downstream.bridgeContextMiddlewarealready loads the agent row to populateShellDenyGroups. Reuse that lookup to also injectstore.WithAgentKeyandtools.WithToolAgentKeyfromag.AgentKey. No header / HMAC schema change.ToolAgentKeyFromCtxadds a third fallback tostore.AgentKeyFromContextso any future caller that sets only the store-level key still resolves correctly.internal/agent/scope.gowith anAgentScopevalue object +ScopeFromContextadapter. Composes from existingstore.*readers +RunContextso it is purely additive — no existing reader is modified. Provides a single read surface for new code (especially across process boundaries) without touching the legacy ctx-key fan-out.Why this is OO-shaped, not a one-line patch
The recurring class of bugs is Domain Primitive Obsession: agent identity is fanned out across
store.AgentKey,tools.ctxAgentKey,RunContext.AgentToolKey, and (for ACP) the MCP-bridge headers — so any boundary forgets one of them. The bridge fix is the immediate symptom fix;AgentScopegives new code a single value-object surface to read from, with the legacy readers as the fallback. Existing call sites are untouched, keeping the upstream merge surface tiny.Files
internal/agent/scope.go(new) —AgentScopevalue object + adapterinternal/agent/scope_test.go(new) — 6 unit tests (zero-value / identity / inject / legacy composition / RunContext fallback / explicit-Inject precedence)internal/gateway/server.go— bridge middleware injectsagent_keyfrom the agent row already loaded for shell deny groupsinternal/tools/context_keys.go—ToolAgentKeyFromCtxaddsstore.AgentKeyFromContextas a third fallbackTest plan
go build ./...andgo build -tags sqliteonly ./...cleango vet ./...cleango test ./internal/agent/ ./internal/tools/ -count=1passsessions_historyfrom a Gemini ACP agent and confirm it no longer errorsagent context required