Save and upload codex conversation transcript#9697
Conversation
Capture the codex session UUID via the codex-cli-sidecar SessionStart event and upload the on-disk rollout to the server alongside the block snapshot during periodic and final saves. REMOTE-1504 Co-Authored-By: Oz <oz-agent@warp.dev>
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
I'm starting a first review of this pull request. You can view the conversation on Warp. I completed the review and posted feedback on this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR adds Codex transcript discovery, envelope serialization, and upload alongside existing block snapshot saves, plus unit coverage for the transcript helper module.
Concerns
parse_session_metareturns metadata for any first entry with apayload, so an early read where the first parsed entry is notsession_metacan cache defaultcwd/version forever and make all later uploads keep incorrect metadata.
Verdict
Found: 0 critical, 1 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
|
I'm starting a first review of this pull request. You can view the conversation on Warp. I completed the review and posted feedback on this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR adds Codex transcript discovery, envelope serialization, and upload alongside the existing block snapshot save path, with unit coverage for root resolution, rollout discovery, and metadata parsing.
Concerns
- Rollout discovery returns matching directory entries without rejecting symlinks, so a writable Codex sessions tree can redirect the privileged upload path to another JSON/JSONL file.
Security
- The transcript file finder should only accept regular files before returning a path for upload.
Verdict
Found: 0 critical, 1 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
| Three `OnceLock` fields added for lazy, set-once caching: | ||
| - `session_id: OnceLock<Uuid>` — captured from `CLIAgentSessionsModel` when hooks emit `SessionStart` | ||
| - `transcript_path: OnceLock<PathBuf>` — resolved by `find_session_file` on first save, cached thereafter | ||
| - `session_metadata: OnceLock<CodexSessionMetadata>` — parsed from JSONL first line, cached thereafter |
There was a problem hiding this comment.
If we're reading the JSONL file already, I'm not sure we gain as much from caching this?
Caching the transcript path makes sense though
There was a problem hiding this comment.
Yeah that's fair—I can remove that
| Ok(None) | ||
| } | ||
|
|
||
| fn read_subdirs(parent: &Path) -> io::Result<Vec<PathBuf>> { |
There was a problem hiding this comment.
nit: thoughts on returning an io::Result<impl Iterator<Item = PathBuf>>> so we can wrap the fs::read_dir iterator instead of buffering into a Vec?
There was a problem hiding this comment.
Ahhh nice! Will do
|
Hi, CodexHarnessRunner::resolve_transcript_path converts both JoinError and filesystem errors into None, so transcript upload failures become indistinguishable from “rollout file not yet written” and silently skip uploads. Severity: action required | Category: reliability How to fix: Preserve/log resolution errors Agent prompt to fix - you can give this to your LLM of choice:
Found by Qodo. Free code review for open-source maintainers. |

Description
Add transcript upload for codex cloud agent runs, mirroring what we already have for claude transcripts.
Of note:
YYYY/MM/DDfolders, we walk the dirs here to find the session JSONL before the first snapshot upload. This first walk shouldn't be expensive in practice—we are running these in a context where there shouldn't be any other sessions (cloud agent).cwdsince we don't expect that to change over the course of the session.Testing
Tested by running locally and confirming that the codex transcripts do get uploaded to GCS in the format that we expect. Also added unit tests for parsing metadata.
Agent Mode