feat: pi sessions and list mode by kakkoyun · Pull Request #8 · arjunkmrm/recall

kakkoyun · 2026-05-09T14:20:53Z

Two additive features. Draft for review while I exercise the changes locally for a few more days.

What

1. Index pi (mariozechner/pi-coding-agent) sessions — ~/.pi/agent/sessions/**/*.jsonl, alongside the existing Claude and Codex globs. Same FTS5 schema, same DB. New --source pi filter and [pi] result tag. read_session.py auto-detects the format.

2. Make the positional query optional — when omitted, recall lists every session in the time window via direct SQL on the sessions table. Bypasses FTS entirely. Banner reads Listed N sessions ... (vs Found ...) so the caller can see which path ran. Useful when an agent / retro flow wants every session in a window rather than a search hit.

Pi format reference: https://github.com/mariozechner/pi-mono · session-format docs ship with the @mariozechner/pi-coding-agent npm package under docs/session-format.md.

Diff


Commits	2
Files	5 (recall.py, read_session.py, SKILL.md, README.md, CHANGELOG.md)
+/−	+268 / −27

No schema change. No upstream API change to existing flags. Existing callers keep working unchanged; query is now optional but supplying it routes to the same search() path as before.

Pi parser

Mirrors parse_codex_session shape:

Header line {type: "session", id, cwd, version, ...} — extract cwd → project, id → session_id, timestamp → earliest_ts.
Subsequent {type: "message"} entries — keep only role ∈ {"user","assistant"} and only TextContent blocks (skip thinking, toolCall, image).
Skip every other top-level type: custom, custom_message, session_info, model_change, thinking_level_change, compaction, branch_summary, label. Skip non-conversational roles toolResult, bashExecution.
Same user+assistant-only behaviour as the existing parsers.

Format detection

detect_format() returns one of pi | claude | codex from the first non-empty parseable line. Pi headers are tested first because they carry the most distinctive signature (type == "session" with cwd or version on the same line — only present on the header).

Verified locally

Real machine with all three agents in history:

$ rm -f /tmp/test.db ; python3 -c "..." # build fresh index against /tmp/test.db
Indexed 1841; total_sessions=1841, total_messages=29184
Per-source: [('claude', 1778), ('codex', 31), ('pi', 32)]

$ recall --days 7                        # list mode, mixed [claude] [codex] [pi]
$ recall --days 7 --source pi            # only pi
$ recall "buffer" --days 7 --source pi   # FTS path on pi sessions
$ read_session.py <pi-file> --pretty     # auto-detects pi, prints user/assistant turns

Reindex on this corpus: ~7s.

Backward compatibility

recall <query> ...: unchanged.
recall ... --source claude|codex: unchanged.
--source pi: new value.
recall ... (no query): new list-mode path.
DB schema: unchanged.
Run --reindex once after upgrade to pull pi sessions into the existing ~/.recall.db.

Why submit both as one PR

The list-mode commit lands on top of pi support and uses no pi-specific code, so the two are independent and easy to split if you'd prefer two PRs — say the word and I'll repoint commit 2 onto a separate branch. Bundled here because they ship together in the dotfiles deployment that drove this work.

Open questions

Pi headers in v2 sessions (auto-migrated to v3 on load by pi itself) carry the same cwd field, so detection works for both. Behaviour against very old (v1, linear, pre-tree) sessions hasn't been tested — none on this machine. If you have any v1 transcripts, I'd appreciate a sample for the regression test.
Happy to add per-source reindex benchmarks to the CHANGELOG if you'd like — current 0.4.0 entry has the totals but not the per-agent split.

Index a third session source: pi, whose sessions live at ~/.pi/agent/sessions/--<encoded-cwd>--/<ts>_<uuid>.jsonl in v3 JSONL format documented at https://github.com/mariozechner/pi-mono. Pi headers are {type: 'session', id, cwd, version, ...}; subsequent entries carry a top-level 'type' field. We index only user/assistant TextContent from {type: 'message'} entries. Other types (custom, custom_message, session_info, model_change, thinking_level_change, compaction, branch_summary, label) and non-conversational roles (toolResult, bashExecution) are skipped, matching the user+assistant-only behaviour of the existing Claude and Codex parsers. scripts/recall.py - PI_DIR / PI_SESSIONS_DIR constants - parse_pi_session(path) parser - pi glob added to index_sessions sources list - dispatch arm for source == 'pi' - --source choices include 'pi' scripts/read_session.py - detect_format() recognises pi headers (type=='session' with cwd or version) before falling through to claude/codex checks - iter_messages() pi branch handles {type: 'message', message: {role, content}} SKILL.md / README.md / CHANGELOG.md updated with pi as a third source in install/index/query diagrams, examples, tags, and resume hint. Verified on a real machine with claude+codex+pi history (1841 sessions, 29184 messages indexed in ~7s): recall 'buffer' --source pi # only pi-tagged results recall 'buffer' --days 7 # mixed [claude] [codex] [pi] read_session.py <pi-file> --pretty # auto-detects pi format

Make the positional query argument optional. When omitted, recall lists every session in the time window without text matching — bypasses FTS entirely and queries the sessions table by (timestamp, source, project), sorted by recency. Why: callers that want to enumerate every session in a window had no ergonomic way to do it. Bare-* queries to FTS5 error with 'unknown special query', and there is no "match-all" syntax in FTS5. Forcing callers to invent a term that probably matches everything is fragile. List mode is the right primitive: no query string, no FTS, just SQL. Output banner reads 'Listed N sessions ...' (vs 'Found ...') and the empty-result message reads 'No sessions in the time window.' (vs 'No matching sessions found.'), so callers can see which path was taken. scripts/recall.py - list_sessions(conn, project, days, source, limit) — SQL-only path, returns rows in the same shape as search() so main()'s rendering loop is unchanged - parser.add_argument('query', nargs='?') - main() routes to list_sessions when args.query is None, search otherwise SKILL.md / CHANGELOG.md updated. No schema change. No reindex needed. Verified on a real machine: recall --days 7 # every session, mixed agents recall --days 7 --source pi # only pi-tagged sessions recall 'buffer' --days 7 # FTS path unchanged

arjunkmrm · 2026-05-12T11:46:12Z

awesome, let me know when it's good to go!

kakkoyun added 2 commits May 9, 2026 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: pi sessions and list mode#8

feat: pi sessions and list mode#8
kakkoyun wants to merge 2 commits into
arjunkmrm:mainfrom
kakkoyun:feat/pi-and-list-mode

kakkoyun commented May 9, 2026

Uh oh!

arjunkmrm commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kakkoyun commented May 9, 2026

What

Diff

Pi parser

Format detection

Verified locally

Backward compatibility

Why submit both as one PR

Open questions

Uh oh!

arjunkmrm commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants