Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
748f9e3
feat(wiki): LLM-first wiki pipeline (corrected) — checkpoint
dimknaf May 17, 2026
8d62741
feat(wiki): hands-off autonomy + maintainer staleness guard
dimknaf May 17, 2026
1280aa3
chore(deps): refresh all dependencies to latest, re-pinned
dimknaf May 17, 2026
2c5fb20
feat(context): central <=1K preview for multi-item reads; get_entity …
dimknaf May 17, 2026
34fa04a
feat(context): sliceable get_entity for big bodies + protocol in prom…
dimknaf May 17, 2026
a03f077
feat(wiki): dedup-first writer priority + created_at freshness gate
dimknaf May 17, 2026
30a54e5
feat(agent): typed submit_result convention — every agent finish is a…
dimknaf May 18, 2026
dfe5197
refactor(scheduler): collapse three timers into one gated loop; no id…
dimknaf May 18, 2026
fa209bb
fix(maintainer): make recall-first the principle, not a suggestion
dimknaf May 18, 2026
9daaf79
fix(writer): forbid silent omission/thinning of prior valid content
dimknaf May 18, 2026
b52fd35
fix(maintainer): require attach target_wiki_id to be a tool-verified …
dimknaf May 19, 2026
5a3ce81
fix(wiki): reference existing wikis by catalog NUMBER, not uuid
dimknaf May 19, 2026
a890063
fix(jobs): stale-lease so abandoned 'assigned' jobs are reclaimable
dimknaf May 19, 2026
98b729f
feat(scheduler): parallel maintainer + writers per tick (concurrent f…
dimknaf May 19, 2026
a6e8c96
feat(scheduler): WIKI_ENABLED gate, default OFF (opt-in)
dimknaf May 20, 2026
8560cfa
fix(agent): drop output_type — restore tool use; keep typed submit_re…
dimknaf May 20, 2026
d4b9288
fix(recall): restore embedding-based ranking (was silently scoring 0 …
dimknaf May 20, 2026
c4e4a2f
feat(recall): keyword-mediated fuzzy + two-level diversity quota + na…
dimknaf May 20, 2026
d6bf836
docs: reflect keyword-mediated recall + two-level diversity quota + n…
dimknaf May 20, 2026
cf1caf7
test(agent): edge-case tests for final_answer rename + RunHooks count…
dimknaf May 20, 2026
0b70603
feat(agent): rename submit_result -> final_answer + RunHooks countdow…
dimknaf May 20, 2026
afa3d85
docs: rename submit_result mentions to final_answer
dimknaf May 20, 2026
0e38ca4
feat(agent): retry-with-correction when a run ends without final_answ…
dimknaf May 20, 2026
6b20b9f
fix(agent): strict_mode=False + lenient nullable coercion on final_an…
dimknaf May 20, 2026
3e9f802
fix(agent): embed literal JSON shape in Layer 4 correction message
dimknaf May 20, 2026
56ac9be
docs(skills): wiki awareness + always-ASK-before-saving + drop stale …
dimknaf May 20, 2026
a84c182
fix(agent): accept JSON-string payload (vLLM/Qwen tool-call format)
dimknaf May 20, 2026
67177de
tune(agent): bump default max_turns 15->20, threshold 5->8, soften co…
dimknaf May 20, 2026
9def2ce
docs(skills): document auto-ingest in agent skill — drop file in data…
dimknaf May 20, 2026
cb256a1
tune(scheduler): bump WIKI_AGENT_TIMEOUT default 600 -> 1200 (10 -> 2…
dimknaf May 20, 2026
8828ecf
chore(compose): remove --reload from api default — code changes now a…
dimknaf May 20, 2026
e3ee7c9
feat(scheduler): per-wiki cooldown for attach claims (across-tick bat…
dimknaf May 20, 2026
260ae48
docs(env): document WIKI_ATTACH_COOLDOWN_SECONDS
dimknaf May 20, 2026
fb48cf0
feat(writer): conservative existing-body framing + attach-mode recall…
dimknaf May 20, 2026
5e59f57
docs: small fixups after today's reload removal + max_turns bump
dimknaf May 20, 2026
e447a76
feat(agent): add local Gemma vLLM profile on workstation port 8009
dimknaf May 21, 2026
5ee286d
docs(frontend): finalised read-only wiki frontend plan
dimknaf May 21, 2026
6008042
feat(writer): section-edit tools + optional empty body for big-wiki a…
dimknaf May 21, 2026
c80551d
feat(writer): context-handoff via successor-respawn for big-wiki runs
dimknaf May 21, 2026
2414265
tune(writer): tighten body-empty contract + raise handoff budget afte…
dimknaf May 22, 2026
f896263
fix(writer): stub big bodies + retry transient BadRequestError
dimknaf May 22, 2026
6de8c7c
fix(writer): close the orphan loop when writer no-ops on an already-c…
dimknaf May 22, 2026
79bb275
fix(agent): unwrap double-escaped JSON in tool-call payload
dimknaf May 23, 2026
122d83f
docs(skills): bump agent-call timeout guidance to 10 min
dimknaf May 23, 2026
bb14868
test(conftest): session-teardown sweeps _pytest_* keyword artefacts
dimknaf May 24, 2026
ebbb47b
chore(release): v0.2.0 — public-ready docs, deepinfra default, CI sca…
dimknaf May 24, 2026
e73f83e
docs(changelog): enumerate wiki endpoints, env vars, migration 005, r…
dimknaf May 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,48 @@ AGENT_MODEL=
# (visible via `docker logs braindb_api -f`). Response payload unchanged.
AGENT_VERBOSE=false

# Agent turn budget — how many tool-call turns the general /agent/query
# is allowed before the SDK forces termination. Default 20. Lowering
# this below ~15 degrades deep-research models (notably local Qwen via
# vLLM); raising it costs more LLM calls per query. The wiki maintainer
# / writer and the ingest watcher pass their own per-call values
# (30/30/40/30) and are unaffected by this default.
# AGENT_MAX_TURNS=20

# How many turns from the end of the run the agent gets a synthetic
# "start wrapping up" reminder injected as a user message. Default 8.
# Set to 0 to disable the reminder entirely (the SDK will still
# terminate at max_turns, but the model gets no warning). The reminder
# tone is automatic: soft "start wrapping up" when max_turns > 5,
# hard "call final_answer NOW" when max_turns <= 5 (which covers the
# Layer 4 retry path).
# AGENT_COUNTDOWN_THRESHOLD=8

# Ingest watcher poll interval (seconds) — how often the watcher sidecar
# scans data/sources/ for new files to ingest.
INGEST_POLL_INTERVAL=7

# Wiki scheduler HTTP read-timeout (seconds) on /wiki/maintain and
# /wiki/write calls. Default 1200 (20 min). Local quantised models
# (Qwen 27B AWQ-INT4 on vLLM) routinely take 6-15 min for a full wiki
# body; setting this below ~600 caused the scheduler to give up while
# the api kept working — queue drained slower than reality. Raise if
# you see "Read timed out" in the scheduler log AND the corresponding
# write actually committed (check `wikis_ext.revision`); lower only if
# you specifically want quicker scheduler turnover. The api itself is
# unbounded by this; this only controls the scheduler's patience.
# WIKI_AGENT_TIMEOUT=1200

# Per-wiki cooldown on attach claims (seconds). Default 300 (5 min).
# Once the OLDEST pending attach for a given wiki is this old, the
# writer claims ALL pending attaches for that wiki in a single batch.
# Below the cooldown, fresh attaches keep accumulating — they don't
# trigger a writer fire. Lets the writer fire once per cooldown window
# instead of once per attach job; on a hot subject like a high-volume
# person/topic wiki, this collapses 5-10 separate full-body
# regenerations into 1 per window — ~80% LLM cost reduction on the
# pattern we observed today. Self-limiting: each fire scoops up the
# whole pending queue for that wiki. Set to 0 to disable (revert to
# the old "fire on every attach" behaviour). Affects ATTACH only;
# consolidate and create paths are unchanged.
# WIKI_ATTACH_COOLDOWN_SECONDS=300
79 changes: 79 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
name: tests

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
validator-tests:
runs-on: ubuntu-latest
timeout-minutes: 10

services:
postgres:
image: pgvector/pgvector:pg16
env:
POSTGRES_PASSWORD: password
POSTGRES_DB: braindb
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 5s
--health-timeout 5s
--health-retries 10

steps:
- uses: actions/checkout@v4

- name: Enable required postgres extensions
run: |
PGPASSWORD=password psql -h localhost -U postgres -d braindb \
-c "CREATE EXTENSION IF NOT EXISTS pg_trgm; CREATE EXTENSION IF NOT EXISTS vector;"

- name: Configure .env for the CI stack
run: |
cat > .env <<'EOF'
DATABASE_URL=postgresql://postgres:password@host.docker.internal:5432/braindb
API_PORT=8000
LLM_PROFILE=deepinfra
DEEPINFRA_API_KEY=ci-placeholder-key-not-used
AGENT_VERBOSE=false
WIKI_ENABLED=false
EOF

- name: Create the local-network the compose file expects
run: docker network create local-network

- name: Bring up the stack
run: docker compose up -d --build

- name: Wait for /health
run: |
for i in $(seq 1 60); do
if curl -sf http://localhost:8000/health > /dev/null; then
echo "API healthy after ${i} attempts"
curl -s http://localhost:8000/health
exit 0
fi
sleep 2
done
echo "API failed to become healthy"
docker logs braindb_api --tail 100
exit 1

- name: Install pytest into the api container
run: docker exec braindb_api pip install pytest pytest-asyncio --quiet

- name: Run validator + handoff unit tests
run: |
docker exec braindb_api python -m pytest \
tests/test_final_answer_rename.py \
tests/test_handoff_hooks.py \
-v

- name: Dump api logs on failure
if: failure()
run: docker logs braindb_api --tail 200
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,6 @@ Thumbs.db
data/sources/*
!data/sources/.gitkeep
!data/sources/README.md

# Wiki review exports — generated, read-only inspection output
data/wiki_review/
81 changes: 58 additions & 23 deletions BRAINDB_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,25 @@ The API runs at **http://localhost:8000**. Everything is done via HTTP calls.
### Before answering anything non-trivial, always call:
```
POST /api/v1/memory/context
{"queries": ["topic 1", "topic 2"], "max_depth": 3, "max_results": 15}
{"queries": ["bare-keyword-1", "bare-keyword-2", "one broader phrase"], "max_depth": 3}
```
This returns:
- Direct matches (fuzzy + full-text) across all queries, merged by best score
- Direct matches (keyword-mediated fuzzy + keyword-mediated embedding) across all queries
- Graph-connected entities up to 3 hops away (relevance fades: 100% -> 60% -> 30%)
- Two-level diversity quota applied: per-search-term reservation (each query gets a guaranteed share) + per-keyword halving cap on the open remainder
- Always-on rules (always injected regardless of query)

Each item has a `final_rank` score. Trust higher-ranked items more.
Each item has a `final_rank` score. Trust higher-ranked items more. `max_results` defaults to 30; the scoring pool internally considers up to 500 candidates per query so narrow keywords aren't excluded before they're evaluated.

**Query strategy.** Prefer **multiple narrow queries** (single keywords, bare names) over one long sentence. Keywords are short, so a short query matches them at high pg_trgm similarity; a long phrase dilutes the trigram set and pushes narrow-subject facts down the ranking. Examples:

You can also pass a single query for backward compatibility:
```
{"query": "single topic", "max_depth": 3}
GOOD: "queries": ["Petros", "Selonda Saronikos fish farm", "Dimitrios manager"]
BAD: "queries": ["Petros person identity profile relation to Dimitris"]
```

The per-search-term quota reserves slots for each query you pass, so the bare-keyword query is guaranteed to surface its specific facts even when paired with broader angles. Single `query` (string) still works for backward compatibility.

### After learning something new, save it:
```
POST /api/v1/entities/facts — for objective facts
Expand Down Expand Up @@ -74,9 +79,19 @@ curl "http://localhost:8000/api/v1/entities?entity_type=fact&source=user-stated&
Query parameters: `entity_type`, `keyword`, `source`, `min_importance` (0-1), `limit` (1-200, default 50), `offset` (default 0).

### Get Entity by ID
The **only full-content read**. Multi-item calls (context/search/list) return
~1K previews ending `--truncated … get_entity("<id>")`; come here for the
whole body.
```bash
curl http://localhost:8000/api/v1/entities/<UUID>
# Large body? page it (don't pull it whole):
curl "http://localhost:8000/api/v1/entities/<UUID>?offset=0&limit=8000"
```
With `offset`/`limit` the response adds `content_meta`:
`{total_chars, offset, returned, next_offset}` — keep fetching `next_offset`
until it is `null`. Default (no params) = full body, unchanged. For big
documents, prefer delegating the read to a subagent via `/api/v1/agent/query`
so the content never floods the caller's context.

### Delete Entity
```bash
Expand Down Expand Up @@ -226,14 +241,23 @@ curl -X POST http://localhost:8000/api/v1/memory/search \
curl -X POST http://localhost:8000/api/v1/memory/context \
-H "Content-Type: application/json" \
-d '{
"queries": ["user profile expertise", "project architecture decisions"],
"queries": ["user-profile", "expertise", "project-decision"],
"max_depth": 3,
"max_results": 15,
"include_always_on_rules": true
}'
```

Each query runs fuzzy + full-text search independently. Seeds are merged keeping the **best score** per entity. One graph expansion runs on the combined seed set.
Each query runs through TWO keyword-mediated pathways in parallel:
- **Fuzzy** — `pg_trgm similarity(content, query)` over keyword entities.
- **Embedding** — Qwen3-Embedding-0.6B (1024-dim) cosine similarity between the query and keyword-entity embeddings.

Entities surface via `tagged_with` from the matched keywords. Per-entity score = `max(matched-keyword similarity)` on each pathway. Both signals are merged with the geometric mean (configurable `missing_signal_penalty` when only one signal fires).

After scoring, **two diversity quotas** apply:
1. **Per-search-term** — each query in `queries[]` reserves `ceil(max_results × per_query_share / num_queries)` slots filled from its own top-ranked entities. Knob: `per_query_share` (default 0.5; set to 0 to disable).
2. **Per-keyword (halving)** — walking the remaining slots in `final_rank`-desc order, each new dominant keyword gets a halving allowance (50% / 25% / 12.5% ..., floor 1). Knob: `keyword_quota_halving` (default 0.5; set to 1.0 to disable).

`max_results` defaults to 30 (LLM-visible cap). The internal scoring pool considers up to 500 keyword neighbours per query (`scoring_pool_keyword_neighbors`) and up to 500 fuzzy candidates (`scoring_pool_fuzzy`) — cheap pure-SQL/vector work, so narrow keywords aren't excluded before they're evaluated. None of these knobs are env-driven; tune them in [`braindb/config.py`](braindb/config.py) if needed.

**Single query** (backward-compatible):
```bash
Expand Down Expand Up @@ -277,8 +301,14 @@ curl "http://localhost:8000/api/v1/memory/log?since=2026-04-08T00:00:00Z"

Response includes: `id`, `timestamp`, `operation`, `entity_type`, `entity_id`, `details`, `context_note`.

### Read-only SQL
For ad-hoc exploration. Only `SELECT` and `WITH` queries; 5s timeout; 1000 row limit.
### Read-only SQL — EXCEPTION tool, not for recall

⚠ This is **not** a recall/discovery path. A flat SELECT has no embeddings, no
graph, no ranking — it discards everything BrainDB is built for. Default to
`POST /api/v1/memory/context` (and delegated `/api/v1/agent/query`) for all
recall, discovery, and understanding. Use `/memory/sql` **only** for a
specific structured/aggregate question those cannot express (counts, GROUP BY,
activity-log joins). Only `SELECT` and `WITH` queries; 5s timeout; 1000 row limit.

```bash
curl -X POST http://localhost:8000/api/v1/memory/sql \
Expand Down Expand Up @@ -306,18 +336,17 @@ curl -X POST http://localhost:8000/api/v1/entities/datasources/ingest \

### BrainDB Agent — natural language queries

`POST /api/v1/agent/query` — instead of orchestrating individual API calls, send a plain English request and let BrainDB's internal agent handle it. The agent uses the OpenAI Agents SDK with LiteLLM (provider pluggable via `LLM_PROFILE` — default `deepinfra`, `nim` also supported) and has access to all 21 BrainDB operations as function tools.
`POST /api/v1/agent/query` — instead of orchestrating individual API calls, send a plain English request and let BrainDB's internal agent handle it. The agent uses the OpenAI Agents SDK with LiteLLM (provider pluggable via `LLM_PROFILE` — **`deepinfra` with `google/gemma-4-31B-it` is the recommended default**; `nim` and local vLLM are also supported) and has access to all 21 BrainDB operations as function tools.

```bash
curl -X POST http://localhost:8000/api/v1/agent/query \
-H "Content-Type: application/json" \
-d '{
"query": "What do you know about the user role and recent projects?",
"max_turns": 15
}'
# {"answer": "The user is ...", "max_turns": 15}
-d '{"query": "What do you know about the user role and recent projects?"}'
# {"answer": "The user is ...", "max_turns": 20}
```

(`max_turns` is optional; the default — currently 20 — is used when omitted.)

**Save via the agent**:
```bash
curl -X POST http://localhost:8000/api/v1/agent/query \
Expand All @@ -332,12 +361,12 @@ curl -X POST http://localhost:8000/api/v1/agent/query \
-d '{"query":"Delegate to a subagent: find near-duplicate facts and return top 10 pairs with their IDs."}'
```

The agent has these tools internally: `recall_memory`, `quick_search`, `save_fact`, `save_thought`, `save_source`, `save_rule`, `ingest_file`, `get_entity`, `list_entities`, `update_entity`, `delete_entity`, `create_relation`, `view_entity_relations`, `delete_relation`, `view_tree`, `search_sql`, `view_log`, `get_stats`, `generate_embeddings`, `delegate_to_subagent`, `submit_result`.
The agent has these tools internally: `recall_memory`, `quick_search`, `save_fact`, `save_thought`, `save_source`, `save_rule`, `ingest_file`, `get_entity`, `list_entities`, `update_entity`, `delete_entity`, `create_relation`, `view_entity_relations`, `delete_relation`, `view_tree`, `search_sql`, `view_log`, `get_stats`, `generate_embeddings`, `delegate_to_subagent`, `final_answer`.

**Setup (pick a provider)**:
- **DeepInfra (default)**: set `LLM_PROFILE=deepinfra` and `DEEPINFRA_API_KEY=...` in `.env`. Get a key at https://deepinfra.com/
- **NVIDIA NIM**: set `LLM_PROFILE=nim` and `NVIDIA_NIM_API_KEY=...` in `.env`. Get a key at https://build.nvidia.com/
- **Self-hosted vLLM**: set `LLM_PROFILE=vllm_workstation` for a vLLM server bound to the Docker host's loopback at `:8002`. No API key needed if the server runs without auth. See [CONTRIBUTING.md](CONTRIBUTING.md) for how to add your own self-hosted profile.
- **DeepInfra — recommended default**: set `LLM_PROFILE=deepinfra` and `DEEPINFRA_API_KEY=...` in `.env`. Fast (5–30s per agent call), cheap, validated end-to-end. Get a key at https://deepinfra.com/
- **NVIDIA NIM** (free-tier alternative): set `LLM_PROFILE=nim` and `NVIDIA_NIM_API_KEY=...` in `.env`. Get a key at https://build.nvidia.com/
- **Self-hosted vLLM** (advanced / offline / requires GPU workstation): set `LLM_PROFILE=vllm_workstation` (or `..._qwen`, `..._gemma`) — points at a vLLM server bound to the Docker host's loopback at `:8002` / `:8010` / `:8009` respectively. Reach it from the docker network via an SSH tunnel if the GPU is on a remote machine. No API key needed if the server runs without auth. See [CONTRIBUTING.md](CONTRIBUTING.md) for how to add your own self-hosted profile.
- Profiles live in `braindb/config.py::_LLM_PROFILES`. Add new providers there (e.g. `together`, `openai`) by adding a dict entry — no code change required.
- Optional override: set `AGENT_MODEL=` in `.env` to use a non-default model for the active profile.

Expand Down Expand Up @@ -384,13 +413,19 @@ This is complementary to `source_entity_id` (on facts — links to a specific so

## How Search Works

The search uses a 4-tier scoring system:
Two different paths, two different scoring models:

**`POST /api/v1/memory/search`** (and the `quick_search` agent tool) — **content-matching** with a 4-tier score against entity content directly:
1. **Full-text AND match** (all query words match) — highest weight (1.0)
2. **Full-text OR match** (any query word matches) — lower weight (0.3)
3. **Content trigram similarity** — fuzzy character matching (0.5)
4. **Title trigram similarity** — fuzzy title matching (0.3)

This means specific queries with terms that appear in stored content work best. Vague queries with stop words ("everything about X") may return fewer results. If you get 0 results, reformulate with more specific terms.
This is for "find me entities whose CONTENT mentions these terms" — useful for arbitrary text matching, but it dilutes when the query is much longer than what's in the entity.

**`POST /api/v1/memory/context`** (the sophisticated path) — **keyword-mediated**. Both the fuzzy and embedding pathways match the query against keyword entities (not entity bodies); entities surface via `tagged_with`. Then graph traversal, decay, two-level diversity quota, ranking. See the "Context" section above for the full pipeline.

Use `/memory/search` for raw text matching; use `/memory/context` for everything that involves *understanding* a subject. If you get 0 results from either, reformulate with more specific terms.

---

Expand Down Expand Up @@ -422,7 +457,7 @@ The `final_rank` in context results already accounts for decay.
3. **Notes are a log** — use `notes` on any entity to record how your understanding evolved
4. **always_on rules are limited to 10** — keep them high-signal; use on-demand rules for specifics
5. **access_count reinforces memory** — things you retrieve often stay important longer
6. **Multi-query for better recall** — use `queries` (array) instead of `query` (single) to search multiple angles at once
6. **Multi-query for better recall** — use `queries` (array) instead of `query` (single) AND prefer multiple **narrow** queries (single keywords / bare names) over one long phrase. Each query in `queries[]` reserves a share of result slots, so a bare keyword is guaranteed to surface its facts. `max_results` defaults to 30.
7. **Content should be concise** — 1-2 sentences, standalone, using full terms (not abbreviations)
8. **Use the tree endpoint** to explore how an entity connects to others: `GET /memory/tree/<id>`
9. **Use the list endpoint** to browse entities: `GET /entities?entity_type=fact&limit=50`
Loading
Loading