Skip to content

feat: pi sessions and list mode#8

Draft
kakkoyun wants to merge 2 commits into
arjunkmrm:mainfrom
kakkoyun:feat/pi-and-list-mode
Draft

feat: pi sessions and list mode#8
kakkoyun wants to merge 2 commits into
arjunkmrm:mainfrom
kakkoyun:feat/pi-and-list-mode

Conversation

@kakkoyun
Copy link
Copy Markdown

@kakkoyun kakkoyun commented May 9, 2026

Two additive features. Draft for review while I exercise the changes locally for a few more days.

What

1. Index pi (mariozechner/pi-coding-agent) sessions~/.pi/agent/sessions/**/*.jsonl, alongside the existing Claude and Codex globs. Same FTS5 schema, same DB. New --source pi filter and [pi] result tag. read_session.py auto-detects the format.

2. Make the positional query optional — when omitted, recall lists every session in the time window via direct SQL on the sessions table. Bypasses FTS entirely. Banner reads Listed N sessions ... (vs Found ...) so the caller can see which path ran. Useful when an agent / retro flow wants every session in a window rather than a search hit.

Pi format reference: https://github.com/mariozechner/pi-mono · session-format docs ship with the @mariozechner/pi-coding-agent npm package under docs/session-format.md.

Diff

Commits 2
Files 5 (recall.py, read_session.py, SKILL.md, README.md, CHANGELOG.md)
+/− +268 / −27

No schema change. No upstream API change to existing flags. Existing callers keep working unchanged; query is now optional but supplying it routes to the same search() path as before.

Pi parser

Mirrors parse_codex_session shape:

  • Header line {type: "session", id, cwd, version, ...} — extract cwd → project, id → session_id, timestamp → earliest_ts.
  • Subsequent {type: "message"} entries — keep only role ∈ {"user","assistant"} and only TextContent blocks (skip thinking, toolCall, image).
  • Skip every other top-level type: custom, custom_message, session_info, model_change, thinking_level_change, compaction, branch_summary, label. Skip non-conversational roles toolResult, bashExecution.
  • Same user+assistant-only behaviour as the existing parsers.

Format detection

detect_format() returns one of pi | claude | codex from the first non-empty parseable line. Pi headers are tested first because they carry the most distinctive signature (type == "session" with cwd or version on the same line — only present on the header).

Verified locally

Real machine with all three agents in history:

$ rm -f /tmp/test.db ; python3 -c "..." # build fresh index against /tmp/test.db
Indexed 1841; total_sessions=1841, total_messages=29184
Per-source: [('claude', 1778), ('codex', 31), ('pi', 32)]

$ recall --days 7                        # list mode, mixed [claude] [codex] [pi]
$ recall --days 7 --source pi            # only pi
$ recall "buffer" --days 7 --source pi   # FTS path on pi sessions
$ read_session.py <pi-file> --pretty     # auto-detects pi, prints user/assistant turns

Reindex on this corpus: ~7s.

Backward compatibility

  • recall <query> ...: unchanged.
  • recall ... --source claude|codex: unchanged.
  • --source pi: new value.
  • recall ... (no query): new list-mode path.
  • DB schema: unchanged.
  • Run --reindex once after upgrade to pull pi sessions into the existing ~/.recall.db.

Why submit both as one PR

The list-mode commit lands on top of pi support and uses no pi-specific code, so the two are independent and easy to split if you'd prefer two PRs — say the word and I'll repoint commit 2 onto a separate branch. Bundled here because they ship together in the dotfiles deployment that drove this work.

Open questions

  • Pi headers in v2 sessions (auto-migrated to v3 on load by pi itself) carry the same cwd field, so detection works for both. Behaviour against very old (v1, linear, pre-tree) sessions hasn't been tested — none on this machine. If you have any v1 transcripts, I'd appreciate a sample for the regression test.
  • Happy to add per-source reindex benchmarks to the CHANGELOG if you'd like — current 0.4.0 entry has the totals but not the per-agent split.

kakkoyun added 2 commits May 9, 2026 16:15
Index a third session source: pi, whose sessions live at
~/.pi/agent/sessions/--<encoded-cwd>--/<ts>_<uuid>.jsonl in v3 JSONL
format documented at https://github.com/mariozechner/pi-mono.

Pi headers are {type: 'session', id, cwd, version, ...}; subsequent
entries carry a top-level 'type' field. We index only user/assistant
TextContent from {type: 'message'} entries. Other types (custom,
custom_message, session_info, model_change, thinking_level_change,
compaction, branch_summary, label) and non-conversational roles
(toolResult, bashExecution) are skipped, matching the user+assistant-only
behaviour of the existing Claude and Codex parsers.

scripts/recall.py
  - PI_DIR / PI_SESSIONS_DIR constants
  - parse_pi_session(path) parser
  - pi glob added to index_sessions sources list
  - dispatch arm for source == 'pi'
  - --source choices include 'pi'

scripts/read_session.py
  - detect_format() recognises pi headers (type=='session' with cwd or
    version) before falling through to claude/codex checks
  - iter_messages() pi branch handles {type: 'message',
    message: {role, content}}

SKILL.md / README.md / CHANGELOG.md updated with pi as a third source
in install/index/query diagrams, examples, tags, and resume hint.

Verified on a real machine with claude+codex+pi history (1841 sessions,
29184 messages indexed in ~7s):

  recall 'buffer' --source pi           # only pi-tagged results
  recall 'buffer' --days 7              # mixed [claude] [codex] [pi]
  read_session.py <pi-file> --pretty    # auto-detects pi format
Make the positional query argument optional. When omitted, recall lists
every session in the time window without text matching — bypasses FTS
entirely and queries the sessions table by (timestamp, source, project),
sorted by recency.

Why: callers that want to enumerate every session in a window had no
ergonomic way to do it. Bare-* queries to FTS5 error with 'unknown
special query', and there is no "match-all" syntax in FTS5. Forcing
callers to invent a term that probably matches everything is fragile.
List mode is the right primitive: no query string, no FTS, just SQL.

Output banner reads 'Listed N sessions ...' (vs 'Found ...') and the
empty-result message reads 'No sessions in the time window.' (vs 'No
matching sessions found.'), so callers can see which path was taken.

scripts/recall.py
  - list_sessions(conn, project, days, source, limit) — SQL-only path,
    returns rows in the same shape as search() so main()'s rendering loop
    is unchanged
  - parser.add_argument('query', nargs='?')
  - main() routes to list_sessions when args.query is None, search otherwise

SKILL.md / CHANGELOG.md updated. No schema change. No reindex needed.

Verified on a real machine:
  recall --days 7                         # every session, mixed agents
  recall --days 7 --source pi             # only pi-tagged sessions
  recall 'buffer' --days 7                # FTS path unchanged
@arjunkmrm
Copy link
Copy Markdown
Owner

awesome, let me know when it's good to go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants