AI providers, models, and autonomous agents — exposed as a first-class Beeper chat network.
ai-bridge is a mautrix bridgev2 network connector that makes "talking to an AI" indistinguishable from talking to a person on Beeper. Each AI model is a contact. Each conversation is a room. Each response streams in live, carries its reasoning, tool calls, citations, and generated images, and is fully resumable across restarts.
There are three ways to engage with it, and this README covers all three:
- Run it — deploy the bridge against a homeserver and chat with models inside Beeper. Start at Quickstart and Configuration.
- Build a client — render or drive AI chats from a Matrix/Beeper client (the Beeper app, a web UI, a bot). Part 1 documents the exact wire protocol you consume and the events you must handle.
- Extend it — add a provider, model, tool, or agent behavior by hacking on this codebase. Part 2 maps the seams and the conventions to follow.
- The core idea
- Architecture at a glance
- Quickstart
- Mental model: how a message becomes an answer
- Part 1 — Building clients
- Part 2 — Extending the bridge
- Login & provider configuration
- Configuration reference
- ID scheme & persistence
- Testing with the faux provider
- Quirks & gotchas (read this)
- Package map
A traditional chat bridge maps an external network (WhatsApp, Signal) onto Matrix. ai-bridge does the same thing — except the "network" is the universe of AI models.
- A provider/model pair (e.g.
openai/gpt-5,anthropic/claude-...) is a contact / ghost you can start a DM with. - That DM is a room, and the room is bound to one persistent session (a branching conversation tree).
- Sending a message runs the model. The reply streams back as a live-editing Matrix message carrying a rich structured payload.
- The model can use tools (web search, URL fetch, session introspection, plus provider-native tools like image generation), reason, generate images, and cite sources — all surfaced in the same payload.
- Everything is persisted and resumable: if the bridge restarts mid-response, the run is re-attached and finished.
Because it is a real bridgev2 connector, it inherits Beeper's identity, provisioning, direct-media, and room-state machinery for free. Clients that already speak Matrix/Beeper get AI chats with zero new transport.
┌──────────────────────────────────────────────┐
Matrix / Beeper │ pkg/connector │
client ◀────────────▶│ bridgev2.NetworkConnector + Client │
(renders com.beeper.ai)│ rooms↔sessions, slash cmds, login, routes │
└───────┬───────────────────────┬──────────────┘
│ │
inbound msg → │ │ ← streamed reply
▼ │
┌─────────────────────┐ ┌───────────┴───────────┐
│ pkg/msgconv │ │ pkg/ai-stream │
│ Matrix ⇄ AI message │ │ run → AG-UI events → │
│ conversion │ │ anchor/stream/final │
└─────────┬───────────┘ └───────────┬───────────┘
│ │ emits agui.Event
▼ ▼
┌──────────────────────────────────────────────────┐
│ pkg/agent + pkg/agent/harness │
│ autonomous tool-loop, sessions, compaction │
│ pkg/chattools (web_search, fetch, …) │
└────────────────────────┬─────────────────────────┘
│ StreamFn
▼
┌──────────────────────────────────────────────────┐
│ pkg/ai │
│ provider/API registry · model catalog · stream │
│ pkg/ai/providers (OpenAI, Google, …) │
└──────────────────────────────────────────────────┘
Foundations: pkg/aiid (IDs/metadata) · pkg/aidb (bridge DB, resume) ·
pkg/agent/harness/session (per-conversation SQLite tree) ·
pkg/ag-ui (the wire event protocol)
The dependency direction is strict: pkg/ai never imports pkg/ai/providers (providers register into ai via a side-effect init()), and ai-stream/ag-ui are provider- and Matrix-agnostic at their core — the Matrix coupling lives in the connector and ai-stream/matrix.
The build uses the goolm tag (pure-Go olm, no C dependency).
./build.sh # → ./ai (go build -tags=goolm ./cmd/ai)
./run.sh # go run the bridge
./test.sh # go test -tags=goolm ./... (or pass packages)The binary is a standard mautrix bridge (cmd/ai/main.go). It registers the AI connector and blank-imports pkg/ai/providers to populate the provider registry — without that import, ai.Stream panics. Generate a config the usual mautrix way; the AI-specific block is small (see Configuration reference).
Requirements:
- The Matrix connector must implement
MatrixConnectorWithBeeperStreams(the bridge refuses to start otherwise —connector.go:53). - For per-room settings (model/prompt/tools) the connector must implement
MatrixConnectorWithArbitraryRoomState; without it those features silently disappear from capabilities. - For HTTP provider provisioning, it must implement
MatrixConnectorWithProvisioning.
- Inbound. A user sends a message to an AI room.
pkg/msgconv/from_matrix.goturns it into aMatrixPrompt(text + image/audio/text-file attachments, plus reply context). Image/audio attachments are rejected if the room's model can't accept that modality. - Command check. If the body starts with
/and matches a known command, it's handled as a slash command instead of a prompt. Unknown/foois not an error — it's sent to the model verbatim. - Session. The room's
PortalMetadata.SessionIDresolves the conversation's session (lazily created on first message). The model + reasoning + extra system prompt come from room state. - Run. A
harness.AgentHarnessbuilds the LLM context from the session tree and starts an autonomous agent loop: stream a response → execute any tool calls → feed results back → repeat until the model stops calling tools. - Stream out. Every provider event is folded into an
ai-stream.Run, which emits a validated sequence of AG-UI events and projects them into Matrix payloads: one anchor placeholder, many stream carriers, one final edit. - Persist & resume. The run is recorded in the bridge DB as an active stream so a restart can finish it. Each assistant turn is appended to the session tree. On completion the active-stream record is deleted.
- Compact. After each assistant message, autocompaction checks whether the context is near the window limit and summarizes older history if so.
A client is anything that renders an AI chat or drives it (responds to approvals, sends prompts). You consume Matrix messages as usual; the AI-specific richness lives in one event-content key.
Every AI message carries a com.beeper.ai object (BeeperAIKey) in its event content. Schema com.beeper.ai.v1, protocol "ag-ui". The shape is defined by BeeperAI in pkg/ai-stream/run.go:
You can render at three levels of fidelity:
- Lowest effort: read the Matrix
body(the bridge always sets sensible plaintext/HTML) and ignorecom.beeper.ai. You get a static final answer with no streaming or tool detail. - Medium: read
message(aUIMessage) from the anchor (for the live preview) and the final edit (for the consolidated result). Skip the incremental carriers. - Full: replay the
envelopesfrom every stream carrier to animate text/reasoning/tool-call deltas in real time.
A single logical assistant message moves through three Matrix events:
| Kind | Matrix mechanism | Purpose |
|---|---|---|
| anchor | the original posted message | placeholder; message = initial preview text |
| stream | events related to the anchor (m.reference) |
incremental carriers; each holds a batch of sequenced AG-UI envelopes |
| final | an edit of the anchor | consolidated result; carries com.beeper.dont_render_edited: true |
Stream carriers are published incrementally — only events appended since the last publish are packed, with globally monotonic sequence numbers (Envelope.Seq) and deterministic transaction IDs (ai_stream_<runID>_<seq>) for idempotency. A client that understands AG-UI reconstructs the message by ordering envelopes by Seq; a client that doesn't simply shows the final edit.
pkg/ag-ui is a Go port of the AG-UI streaming spec — the vocabulary inside envelopes. An agui.Event is an open map[string]any (forward-compatible: unknown fields survive round-trips), with a type discriminator. The full catalog (events.go):
- Lifecycle:
RUN_STARTED,RUN_FINISHED,RUN_ERROR - Text:
TEXT_MESSAGE_START/_CONTENT/_END/_CHUNK - Reasoning:
REASONING_START/_END,REASONING_MESSAGE_START/_CONTENT/_END/_CHUNK,REASONING_ENCRYPTED_VALUE - Tools:
TOOL_CALL_START/_ARGS/_END/_CHUNK/_RESULT - Steps:
STEP_STARTED,STEP_FINISHED - State:
STATE_SNAPSHOT,STATE_DELTA,MESSAGES_SNAPSHOT - Activity:
ACTIVITY_SNAPSHOT,ACTIVITY_DELTA - Escape hatches:
RAW,CUSTOM
The protocol is validated (validation.go): exactly one RUN_STARTED; nothing after a terminal RUN_FINISHED/RUN_ERROR; content/end events must follow a matching start; no sequence may be left open at termination; TOOL_CALL_END must not carry a result (use TOOL_CALL_RESULT). Tool-call states are awaiting-input → input-streaming → input-complete; tool-result states are streaming / complete / error. CUSTOM events named com.beeper.source / com.beeper.document / com.beeper.file / com.beeper.data carry citations, artifacts, and arbitrary data.
pkg/ag-ui/capabilities.go defines AgentCapabilities — the agent self-description handshake (transport, tools, reasoning, multimodal in/out, human-in-the-loop, execution limits) that lets clients negotiate features.
The message field is a UIMessage (pkg/ai-stream/ui_message.go) — an ordered list of typed parts:
{ "id": "...", "role": "assistant",
"parts": [
{ "type": "thinking", "content": "...", "state": "done" },
{ "type": "text", "content": "...", "state": "streaming|done" },
{ "type": "tool-call", "toolCallId": "...", "name": "web_search",
"input": {...}, "output": {...}, "state": "input-complete|approval-requested|..." },
{ "type": "source-url", ... }, { "type": "file", ... },
{ "type": "data-com-beeper-data", ... }
] }Part types: text, thinking (also used for step markers), tool-call, source-url, source-document, file, data-com-beeper-data. Notes for renderers:
- A text/reasoning block that re-opens after a tool call is split into
…-segment-Nparts — render them in order, don't merge. - On termination, any tool-call part with no output gets a synthetic error output, so you never see a tool stuck "in progress."
tool-callstates includeapproval-requestedandapproval-responded(see below).
Matrix caps event content at 64 KiB. A rich UIMessage with full tool inputs/outputs and citations can exceed that. The final field tells you how the bridge handled it (pkg/ai-stream/final_payload.go):
delivery: "inline"—message.partsis complete and authoritative.delivery: "attachment"—message.partsis empty;final.partsRefpoints to an uploaded JSON blob (application/vnd.beeper.ai.final-parts+json) with asha256,byteSize, andurl/file. Download it, verify the hash, and render itsmessage.textComplete— whether the rendered body text was truncated ([See more on supported clients]is appended when so).partsComplete— whethermessage.partsis the full set.
A correct client must handle both delivery modes.
Some tool calls require explicit user approval before executing (pkg/ai-stream/approval.go). When that happens:
- The run interrupts (
RUN_FINISHEDwithoutcome: interrupt, reasontool_call), and the relevanttool-callpart flips toapproval-requestedwith the proposedinput. - A dedicated approval message is posted (relation type
com.beeper.ai.approval, key undercom.beeper.ai.approval, schemacom.beeper.ai.approval.v1) listing the choices. Defaults: Allow once (approve), Always allow (always_approve), Deny (deny, danger style). - The user responds with
/approve <approval-id> <approve|always|deny>. The bridge resolves the choice, emits aTOOL_CALL_RESULT, and resumes the run. - Approvals are queued (one active at a time) and time out to a denied result if unanswered.
The response schema (if you drive approvals programmatically rather than via slash commands) is { approved: bool, always?: bool, reason?: string, editedArgs?: {...} }.
Per-room capabilities (pkg/connector/capabilities.go) are tailored to the room's model:
- Formatting: Matrix rich text is fully supported for user messages; the bridge preserves
formatted_bodyand advertises full support for the Matrix formatting feature set. - Attachments: text-like files always (≤512 KB); images only if the model has
imageinput (PNG/JPEG/WebP, ≤20 MB); audio + voice only ifaudioinput (WAV/MP3/MPEG, ≤25 MB). Formatted captions are preserved. - Location messages: fully supported as prompt text.
MaxTextLength: 20000. Reply: full. Edit: rejected. Reaction: unsupported. Delete: full.- Disappearing messages supported; typing notifications on; read receipts off.
Don't build a client that depends on editing AI messages or reacting to them as a general affordance.
Clients can offer these as UI affordances; they're just messages. See the command table below.
AI-generated images and inline media are served through the bridge's direct-media path (pkg/connector/directmedia.go). A MediaID is either a sanitized human form or a self-describing base64-JSON blob (ai:<base64>) that encodes the session/entry/content-index needed to stream the bytes back (or a URL redirect). Clients just download the MXC URI as normal.
Everything here is done by editing this Go codebase and recompiling — there is no plugin system or external service interface. "Extending" means widening what the AI can do: wiring in a new provider or wire protocol, adding a model, changing agent behavior, or writing a tool.
These three concepts (pkg/ai/types.go) are distinct and often confused:
| Concept | Type | What it is | Example |
|---|---|---|---|
| Provider | ai.Provider |
who you talk to / who bills you — the vendor or gateway. Determines API-key env var, base URL, compat quirks. | openai, anthropic, google, openrouter, groq, xai, … (~35) |
| API | ai.Api |
the wire protocol — request/response shape. This is the dispatch key. | openai-responses, openai-completions, anthropic-messages, google-generative-ai, google-vertex, … |
| Model | ai.Model |
a concrete model tying an ID to a provider, API, base URL, capabilities, and pricing. | gpt-5, claude-..., gemini-... |
The key relationship: many providers share one API. Groq, xAI, DeepSeek, Together, OpenRouter, etc. all speak openai-completions. Dispatch happens on model.API, then provider-specific behavior is layered via the Provider field and a free-form Compat map[string]any.
ai.Model carries everything callers need: Reasoning/ThinkingLevelMap/DefaultThinkingLevel, Input/Output modalities, Cost (USD/1M tokens), ContextWindow, MaxTokens, BuiltInTools, Headers, and Compat.
One uniform interface over every protocol (pkg/ai/stream.go):
ai.Stream(ctx, model, ai.Context{SystemPrompt, Messages, Tools}, StreamOptions) *AssistantMessageEventStream
ai.StreamSimple(ctx, model, ctx, SimpleStreamOptions) *AssistantMessageEventStream // adds Reasoning + ThinkingBudgets
ai.Complete / ai.CompleteSimple(...) // block, return final MessageThe returned *AssistantMessageEventStream is a buffered channel of AssistantMessageEvent. Event types: start, text_start/_delta/_end, thinking_start/_delta/_end, toolcall_start/_delta/_end, raw (Responses API), terminal done (with final Message), and error. Consume with for ev := range s.Events() or block with s.Result(). StopReason is one of stop | length | toolUse | error | aborted (providers upgrade stop→toolUse when a tool call is present).
StreamOptions exposes two interception hooks every provider honors:
OnPayload(payload, model) (newBody, replace, err)— inspect/rewrite the outgoing request body before send.OnResponse(ProviderResponse, model) error— observe HTTP status/headers.
Content is the universal ContentBlock (text / thinking / toolCall / image / audio), with provenance and signatures for round-tripping reasoning across turns.
Same protocol, new vendor — usually no code:
- Add a model entry in ai-services, or configure a custom provider model with the right
API,Provider,BaseURL. - Map the provider to its API-key env var(s) in
pkg/ai/env_api_keys.go. - Add any
Compatoverrides (most base-URL patterns are auto-detected bydetectOpenAICompletionsCompat).
A new wire protocol:
- Add an
Apiconstant inpkg/ai/types.go. - Implement
StreamSimpleX(ctx, model, ai.Context, SimpleStreamOptions) *ai.AssistantMessageEventStreamfollowing the goroutine + state-machine template the existing providers use (create stream, spawn goroutine, pushstart, drive a stream-state machine that mutates the accumulating message and pushes incremental events, pushdone/error). - Register it: add to
RegisterBuiltInAPIProviders(pkg/ai/providers/register_builtins.go) or callai.RegisterAPIProviderWithSourcefrom your own package'sinit(). The registry auto-derives missingStream/CompleteSimplefrom whatever you provide, and wraps everything with an API-mismatch guard.
Remember the side-effect import: consumers must import _ ".../pkg/ai/providers" (or your package) to populate the registry.
Provider-specific behaviors worth knowing live in pkg/ai/providers: OpenAI Completions vs Responses are very different protocols; Google configures thinking per model family (discrete level vs numeric budget); cross-cutting transformMessages downgrades unsupported modalities, strips/recovers reasoning across models, and injects synthetic tool results for dangling tool calls.
The model catalog is owned by ai-services. The bridge loads /models?feature=bridge:ai, applies each model's runtime metadata, and fails provider resolution if ai-services does not return a catalog. There is no bridge-generated fallback catalog.
Custom providers use the same ai-services catalog for model metadata. The bridge filters that catalog to the supported provider runtime (openai, openrouter, anthropic, or google-vertex) and then uses the user's configured base URL and API key for execution. Arbitrary model IDs and generic OpenAI-compatible providers are not accepted.
Reasoning levels form a ladder off < minimal < low < medium < high < xhigh; ClampThinkingLevel snaps a request to the nearest supported level from the model metadata the bridge was given.
Image generation is a separate path (pkg/ai/images.go, images_*.go): ai.GenerateImages(ctx, ImagesModel, ImagesContext, ImagesOptions) AssistantImages (synchronous, no streaming). Image model metadata must come from ai-services or explicit provider configuration; the bridge does not keep a generated image catalog. The built-in implementation supports OpenRouter; blank-import pkg/ai/providers/images to enable it. Models can also expose provider-native image_generation as a built-in tool (see chat tools).
pkg/agent is the autonomous tool-using loop. A run = one RunAgentLoop call; a turn = one assistant response plus execution of its tool calls. The loop streams a response, executes tool calls (parallel by default, sequential if any tool requests it), feeds results back, and repeats while the model keeps calling tools or messages are queued.
messages, err := agent.RunAgentLoop(ctx, prompts, agent.AgentContext{SystemPrompt, Messages, Tools},
agent.AgentLoopConfig{Model: m, GetAPIKey: getKey, /* hooks… */}, emitSink, ai.StreamSimple)Tools are agent.AgentTool[any] — an ai.Tool (name + JSON schema) plus an Execute closure. Arguments are schema-validated before Execute runs; returning an error becomes an error tool-result the model sees (it never crashes the loop). A tool can set Terminate to end the run (but only if every tool in the batch terminates). Rich hooks let you intercept everything: BeforeToolCall (can block), AfterToolCall (can rewrite results), PrepareNextTurn (swap context/model/thinking level), ShouldStopAfterTurn, TransformContext.
Gotcha: there is no built-in turn/iteration cap. A model that loops on tool calls runs forever unless you bound it via
ShouldStopAfterTurn, a context deadline, orTerminate.
The stateful agent.Agent wrapper adds queueing (Steer/FollowUp), subscriptions, and abort — but assumes a single driver goroutine (its state is not internally synchronized).
pkg/agent/harness is the production façade and what the connector actually uses. On top of the loop it adds:
- Persistent sessions — a branching conversation tree (below).
- System-prompt resolution, model / thinking-level switching (persisted as tree entries).
- Three queues (
Steer,FollowUp,NextTurn) and a phase state machine (idle/turn/compaction/branch_summary). - A pub/sub + hook system —
Subscribefor observers,On(eventType, handler)for behavior-modifying hooks (before_agent_start,context,tool_call,tool_result,before_provider_request,before_provider_payload,after_provider_response,session_before_compact,session_before_tree). - Auth/header injection and compaction/summarization.
h, _ := harness.NewAgentHarness(harness.AgentHarnessOptions{
Session: session, Model: m, ThinkingLevel: agent.ThinkingLevelMedium,
SystemPromptFunc: buildPrompt, Tools: tools, ActiveToolNames: []string{"web_search"},
StreamFn: ai.StreamSimple, GetAPIKeyAndHeaders: resolveAuth,
CompactionSettings: cfg.Compaction.Settings(),
})
result, err := h.PromptWithResult(ctx, "do the thing")Errors are typed with codes (pkg/agent/harness/public_errors.go): CompactionError, BranchSummaryError, AgentHarnessError (busy, invalid_state, auth, hook, …) — switch on the code for user-facing messages.
pkg/chattools provides the tools the AI gets out of the box:
| Tool | Purpose | Notes |
|---|---|---|
get_session |
Live chat metadata (current time/timezone, model, reasoning, search/fetch modes, attachments) | read-only; recomputes time per call |
fetch |
Fetch a full HTTP/HTTPS URL → readable text + metadata | Beeper mode: direct fetch (≤2 MiB, ≤20 000 chars) or AI-services /tools/fetch extraction with fallback; native mode: provider URL-context/fetch tool when available |
web_search |
Web search | Exa-backed Beeper search, enabled when room search mode is beeper; returns concise URL results for optional follow-up fetch calls |
Tools are gated per-room via the com.beeper.ai.tools state event. search may be off, beeper, or native; fetch may be off, beeper, or native. The legacy disabled array is still read for older room state. In beeper mode, web tools route through AI-services (/tools/web_search, /tools/fetch) using the appservice bearer token. In native mode, provider-native tools are injected where supported: OpenAI/OpenRouter search, OpenRouter fetch, Anthropic search/fetch, and Google search/URL context. If the selected provider API has no native equivalent, that native tool is unavailable. Search result URLs stay in the tool view; fetched pages, provider-native citation annotations, URL-context metadata, and final-answer URLs become canonical com.beeper.source artifacts for client source cards. Other provider-native built-ins, such as image_generation, are still injected from the model catalog (pkg/connector/builtin_tools.go).
fetch tries the URL directly first with Accept preferring Markdown, plain text, JSON, XML, and CSV. If the response is already agent-readable (Markdown/plain/JSON/XML/CSV/source-ish), it returns that result without backend extraction. If the response is HTML, it checks HTTP Link headers and HTML <link rel="alternate" type="text/markdown|text/plain"> for a readable alternate and fetches that directly. Only when the direct representation is not agent-ready does it call AI-services /tools/fetch. Local/private hosts, GitHub raw/gist URLs, GitLab-style raw paths, and source/text file extensions are treated as direct-fetch candidates.
Adding a tool:
- Write a constructor returning
agent.AgentTool[any]; build the schema withobjectSchema(props, required)and pull args with thehelpers.gocoercers (args arrive asmap[string]any; JSON numbers arefloat64). - Return
jsonResult(value)for consistent text +Detailsoutput. - Register in
chattools.Tools(unconditionally or behind a config gate). - Wire config in
pkg/connector/chat_tools.goand honorDisabledTools. - If it produces citable sources, add canonical source observations in
pkg/connector/sources.goso URLs surface as message sources.
Security note: the direct fetch path intentionally bypasses AI-services for localhost/private/link-local addresses, raw asset URLs, and source-like files, and has no SSRF guard. Deployments that cannot allow bridge-origin egress to private networks should enforce an upstream deny-list or network policy before enabling
fetch.
A conversation is a tree of immutable, append-only entries with a leaf_id pointer marking the current head (pkg/agent/harness/session). Branching = moving the leaf. Entry types: message, custom_message, model_change, thinking_level_change, compaction, branch_summary, label, session_info, custom, and navigation leaf markers.
BuildContext walks leaf → root, folds the branch into the LLM context, and applies any compaction entry (replacing everything before firstKeptEntryId with a summary). Fork copies a branch into a new session. Entry IDs are time-sortable 8-char UUIDv7 prefixes.
There are two implementations of this same tree: per-conversation SQLite files (session/sqlite_storage.go) and the shared bridge DB (pkg/aidb, tables ai_session/ai_session_entry). They share the SessionStorage interface.
When context approaches the model's window, pkg/agent/autocompact triggers harness.Compact. Two triggers: overflow (the provider reported context overflow, or usage exceeded the window) and threshold (contextTokens > contextWindow - ReserveTokens). Compaction summarizes older history with the LLM, keeps the most recent KeepRecentTokens, and writes a compaction entry. Defaults: enabled, ReserveTokens: 16384, KeepRecentTokens: 20000. Token counts are estimates (≈ chars/4; images ≈ 4800 chars), so thresholds are approximate. Users can also /compact manually.
agent.StreamProxy (pkg/agent/proxy.go) is a drop-in StreamFn that proxies streaming to a remote HTTP service (POST {ProxyURL}/api/stream, SSE response) instead of calling a provider SDK directly. This lets the bridge centralize provider credentials behind an internal AI-services server — clients hold a proxy token, not raw provider keys. Its output stream is identical to ai.StreamSimple, so it's interchangeable anywhere a StreamFn is expected.
The bridge advertises five login flows (pkg/connector/login.go):
| Flow | What it does |
|---|---|
beeper |
The default Beeper AI login. Loads its catalog and runtime proxy metadata from ai-services.<domain> derived from the user's homeserver; uses an appservice bearer token, no stored key. Read-only/managed. |
openai-responses / openai-completions / openai-codex-responses / anthropic-messages / google-vertex |
Custom provider: enter base URL + API key, the bridge loads matching model metadata from ai-services, then you pick a default model. |
chatgpt-device |
ChatGPT OAuth device-code flow (PKCE). Stores access + refresh tokens, auto-refreshes within 2 min of expiry. |
One Matrix user can hold multiple AI logins; there's a canonical "AI Chats" login per user. Provider configs (with secrets) live in UserLoginMetadata.Providers. API keys support env:NAME indirection. The beeper provider is special and read-only — it can't be added/updated/deleted.
Providers can also be managed at runtime:
- HTTP (if the Matrix connector supports provisioning):
GET/POST /v3/providers,GET/PUT/DELETE /v3/providers/{id}(optional?login_id=). - Bridge commands:
!ai providers,!ai provider <show|add|update|delete> ….add/updatecarry a key and redact the command message.
Two control surfaces share the same handlers. Slash commands are parsed from message bodies; bridge commands use the !ai prefix. Coverage is intentionally asymmetric.
| Command | Slash | !ai |
What it does |
|---|---|---|---|
| help | /help [cmd] |
!ai ai-help [cmd] |
list/describe commands |
| model | /model [provider/model] |
!ai model … |
show or set the room's model (persists com.beeper.ai.model) |
| reasoning | /reasoning [off…xhigh] |
!ai reasoning … |
show or set reasoning level (validated + clamped to the model) |
| system prompt | /system-prompt [text|clear] |
!ai system-prompt … |
room-specific prompt appended to the default (com.beeper.ai.additional_prompt) |
| compact | /compact [instructions] |
— | manually summarize context |
| abort | /abort |
— | cancel the active response or compaction; clears queued messages |
| session | /session |
— | diagnostics: IDs, model, token estimate, message stats, compaction count |
| providers | — | !ai providers / !ai provider … |
list/manage configured providers |
Unknown
/foois not a command — it's sent to the model as a prompt. There is no/new: start a fresh conversation by creating a new chat (resolve a model contact withcreateChat=true).
The AI-specific config block (pkg/connector/config.go, defaults shown):
default_system_prompt: | # base prompt; room prompts are appended
You are a helpful assistant running inside the Beeper apps. …
default_reasoning_level: "off" # off | minimal | low | medium | high | xhigh
fetch:
timeout_ms: 10000
max_bytes: 2097152 # 2 MiB
max_chars: 20000
compaction:
enabled: true
reserve_tokens: 16384 # headroom kept below the context window
keep_recent_tokens: 20000 # recent history preserved verbatimThree scopes layer together: bridge-wide YAML → per-login provider configs (UserLoginMetadata.Providers) → per-room state (com.beeper.ai.model / .additional_prompt / .tools).
Relevant constants: default Beeper model beeper/default, title-generation model gpt-4.1-mini (fallback gpt-5-mini), default AI Services base URL derived from the user's homeserver domain.
All deterministic IDs are built/parsed in pkg/aiid:
| Entity | Format |
|---|---|
| network / bridge type | ai |
| login | default:<base64url(mxid)> |
| portal (room) | mxroom:<base64url(roomID)> |
| assistant ghost | assistant:ai |
| model contact | model:<encoded model> for Beeper AI, model:<encoded provider>:<encoded model> for custom providers |
| message | user:<entryID> / assistant:<entryID> |
| media | sanitized parts, or self-describing ai:<base64url(json)> |
The entryID is the spine: generated in the session tree, embedded in the Matrix MessageID, stored in the active-stream record, and carried in MessageMetadata — so any Matrix event traces back to its exact session-tree node.
Persistence (pkg/aidb) lives in the bridge DB:
ai_session/ai_session_entry— the conversation tree.ai_active_stream— in-flight runs (fullaistream.Run+ metadata + status), so a restart can resume or finalize them. Records are upserted on run start and deleted on completion.
MessageMetadata (per delivered message) records SessionEntryID, Role, RunID, provider/model, ResponseID, ContentIndex, Usage, StopReason, StreamStatus — used for resume, editing, and reaction handling.
test/faux-provider is a standalone Node server that imitates a provider so tests and manual smoke checks don't hit real APIs:
node test/faux-provider/server.mjs --port 0 # prints {"url":"http://127.0.0.1:PORT"}
# queue a scripted response:
curl -s -X POST "$URL/__faux/responses" -H 'content-type: application/json' \
--data '[{"content":"hello","stopReason":"stop"}]'It serves /v1/models, /v1/responses, /v1/chat/completions, and /api/stream (the proxy SSE shape), plus /__faux/* control endpoints. Queued response content blocks mirror pkg/ai blocks (thinking, text, toolCall, …), so you can script multi-turn tool-use flows. Run the Go tests with ./test.sh (which adds -tags=goolm).
- Provider registry is populated by import side-effect. Forget
import _ ".../pkg/ai/providers"andai.Streampanics. - No agent iteration cap. Bound runs yourself (
ShouldStopAfterTurn, context deadline, orTerminate). Terminateneeds the whole batch. One terminating tool alongside non-terminating ones won't stop the loop.- Edits rejected, reactions unsupported as general client affordances.
- Two delivery modes for the final message — handle both
inlineandattachment(partsRef). - Per-room config depends on arbitrary-room-state support in the Matrix connector; writes use a private bridgev2 escape hatch (fragile across upstream changes).
- The
beeperprovider is read-only and its base URL may be unavailable (login then fails); its models come from a live catalog, not stored config. - Reasoning is double-validated and clamped — setting a model can silently change the effective reasoning level.
- Two parallel session-tree implementations (
aidbvssessionSQLite files) with near-duplicate SQL and one subtle difference (ON DELETE CASCADE). - Token counts are estimates (≈ chars/4) — compaction thresholds are approximate.
- The direct fetch path intentionally allows private-network/raw-asset egress and has no SSRF protection.
ProviderConfigholds secrets (API keys, refresh tokens) in login metadata and serializes to JSON and YAML — don't log it.- AG-UI
Eventis a map, not a struct — read typed fields viaGet/String; unknown fields survive round-trips.
| Package | Responsibility |
|---|---|
cmd/ai |
bridge entry point (registers connector + providers) |
pkg/ai |
provider/API/model abstraction, streaming interface, env keys |
pkg/ai/providers |
built-in provider implementations (OpenAI Completions/Responses/Codex, Anthropic, Google GenAI/Vertex) + image generation |
pkg/ai-command |
shared slash-command parsing for visible /... commands and hidden !ai ... command messages |
pkg/ai-stream |
the Run model: AG-UI event accumulation, anchor/stream/final projection, shared approval commands/coordinator, final-payload sizing |
pkg/ag-ui |
the AG-UI wire event protocol, typed events, schema, validation, capabilities |
pkg/agent |
autonomous tool-using loop + stateful Agent + remote StreamProxy |
pkg/agent/harness |
production agent: sessions, hooks, queues, compaction, summarization |
pkg/agent/harness/session |
branching conversation tree (per-conversation SQLite) |
pkg/agent/autocompact |
compaction trigger policy |
pkg/chattools |
built-in tools: get_session, fetch, web_search |
pkg/connector |
the bridgev2 connector: rooms↔sessions, command adapters, login, provider catalog loading, capabilities, contacts, direct media, room state |
pkg/msgconv |
Matrix ⇄ AI message conversion |
pkg/aiid |
deterministic IDs + metadata types |
pkg/aidb |
bridge-DB persistence: session storage + active-stream resume |
test/faux-provider |
local fake provider for tests & smoke checks |
{ "schema": "com.beeper.ai.v1", "protocol": "ag-ui", "kind": "anchor" | "stream" | "final", "threadId": "...", "runId": "...", "messageId": "msg-<runId>", "agent": { "id": "...", "displayName": "..." }, "model": "provider/model", "message": { /* UIMessage — present on anchor & final */ }, "events": [ /* sequenced AG-UI events — present on stream kind; final kind includes the terminal RUN_FINISHED/RUN_ERROR event */ ], "data": { /* arbitrary com.beeper.data values */ }, "final": { "delivery": "inline|attachment", "textComplete": true, "partsComplete": true, "partsRef": {...} } }