[fix] stop merging agentic turns at first non-COMPLETED turn#1323
Merged
Conversation
merge_samples folds per-turn Samples of a multi-turn agentic trajectory into one training Sample. _merge_sample_pair asserts the accumulated turn is COMPLETED before appending the next, encoding the invariant that only the final turn may be non-COMPLETED. When an intermediate turn TRUNCATED (hit rollout-max-response-len mid-generation) yet the agent harness still produced later turns, the assertion crashed the rollout loop. Stop folding at the first non-COMPLETED turn so the trajectory ends there.
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the merge_samples function in miles/rollout/generate_utils/sample_utils.py to ensure that only a COMPLETED turn can be extended by a later turn. If an intermediate turn is truncated (i.e., its status is not COMPLETED), the merging loop breaks early. There are no review comments, so there is no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When MILES_EXPERIMENTAL_ROLLOUT_REFACTOR is on, each LLM call within a multi-turn agentic trajectory produces its own Sample, and merge_samples folds them left into one training Sample via _merge_sample_pair. _merge_sample_pair asserts the accumulated turn is COMPLETED before appending the next, encoding the invariant that only the final turn may be non-COMPLETED.
A trajectory whose intermediate turn TRUNCATED (hit --rollout-max-response-len mid-generation) but whose agent harness still produced later turns violated that invariant and crashed the rollout loop with AssertionError: a.status must be COMPLETED, got Status.TRUNCATED.
Fix: in merge_samples, stop folding at the first non-COMPLETED turn so the trajectory simply ends there (merged Sample retains the truncated status). Keeps the assertion in _merge_sample_pair as a real invariant. No behavior change for fully-COMPLETED trajectories.
Repro: DeepSeek-V4-Flash agentic RL on Terminal-Bench-2/terminus-2; crashed right after step 0 on a polyglot task that truncated an intermediate turn.