Description
Hi team,
First of all, thank you for this fantastic repository! I've been exploring OpenManus-RL and it works great.
I am currently trying to enhance the agent's capabilities by integrating a long-term memory module (e.g. mem0). My goal is to train this memory-augmented agent using GRPO with the verl backend.
I would appreciate some guidance on the architectural changes required to achieve this properly. Specifically, I have questions regarding both the retrieval and storage phases:
-
Retrieval (Injecting Memory):
To inject retrieved memory into the context window during the rollout phase, which component should I prioritize modifying?
- Should this be handled inside the Environment wrapper (treating memory as part of the observation)?
- Or should I modify the Actor/Rollout Worker logic directly to intercept the prompt before it is sent to the model?
-
Storage (Updating Memory):
I also need to store the successful interactions (or full trajectories) back into mem0 to evolve the memory. Where is the best place to access the complete context for storage?
- Is there a specific callback or a post-episode hook in the
RolloutWorker where the full trajectory is available?
- Or should this logic reside in the
RewardManager since it evaluates the final outcome?
-
Verl Compatibility:
Since verl handles distributed rollouts, are there any specific constraints I need to be aware of when dynamically changing the prompt length (due to retrieved memories) across different interaction steps? I want to ensure this doesn't break the batch processing or PPO/GRPO data collection pipeline.
Any high-level advice or pointers to the relevant code sections (e.g., specific files in verl or openmanusagent) would be incredibly helpful!
Thanks again for your hard work.
Additional Information
No response
Description
Hi team,
First of all, thank you for this fantastic repository! I've been exploring OpenManus-RL and it works great.
I am currently trying to enhance the agent's capabilities by integrating a long-term memory module (e.g. mem0). My goal is to train this memory-augmented agent using GRPO with the
verlbackend.I would appreciate some guidance on the architectural changes required to achieve this properly. Specifically, I have questions regarding both the retrieval and storage phases:
Retrieval (Injecting Memory):
To inject retrieved memory into the context window during the rollout phase, which component should I prioritize modifying?
Storage (Updating Memory):
I also need to store the successful interactions (or full trajectories) back into
mem0to evolve the memory. Where is the best place to access the complete context for storage?RolloutWorkerwhere the full trajectory is available?RewardManagersince it evaluates the final outcome?Verl Compatibility:
Since
verlhandles distributed rollouts, are there any specific constraints I need to be aware of when dynamically changing the prompt length (due to retrieved memories) across different interaction steps? I want to ensure this doesn't break the batch processing or PPO/GRPO data collection pipeline.Any high-level advice or pointers to the relevant code sections (e.g., specific files in
verloropenmanusagent) would be incredibly helpful!Thanks again for your hard work.
Additional Information
No response