Skip to content

Conversation

@shreymodi1
Copy link
Contributor

@shreymodi1 shreymodi1 commented Nov 17, 2025


name: Pull Request
about: Propose changes to the codebase
title: "Brief description of changes"
labels: ''
assignees: ''


Description

Please include a summary of the change and which issue is fixed or feature is implemented. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)
Implements # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Refactoring/Code cleanup
  • Build/CI/CD related changes
  • Other (please describe):

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

  • Test A
  • Test B

Test Configuration:

  • Firmware version:
  • Hardware:
  • Toolchain:
  • SDK:

Checklist:

  • My code follows the style guidelines of this project (ran black ., isort ., flake8 .)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Screenshots (if applicable)

If applicable, add screenshots to help showcase your changes.

Additional context

Add any other context about the PR here.


Note

Introduces a generic OpenEnv rollout processor with TRL and vLLM integrations, adds timestamps + debug to the SQLite row store, improves logs server initialization/broadcast diagnostics, and adds OpenEnv integration tests.

  • OpenEnv Integrations:
    • Add generic OpenEnvRolloutProcessor to run rollouts against any OpenEnv HTTPEnvClient (eval_protocol/pytest/openenv_rollout_processor.py).
    • Add TRL helper create_openenv_rollout_func and vLLM helper create_openenv_vllm_rollout_func for GRPO pipelines with split modes (eval_protocol/pytest/integrations/openenv_trl.py, openenv_trl_vllm.py).
  • Storage/Logging:
    • SQLite store: add updated_at column with best-effort migration, set on upsert/create, order reads by updated_at desc, and add SQL/result debug logs (eval_protocol/dataset_logger/sqlite_evaluation_row_store.py).
    • Logs server: limit initial log payload via EP_LOGS_INIT_LIMIT, and add richer debug for init/broadcast (eval_protocol/utils/logs_server.py).
  • Tests:
    • Enable pytest plugin autoload (tests/pytest/conftest.py).
    • Add integration datasets and tests for BrowserGym, Echo (base URL/Hub), and TextArena Wordle using the new processor (tests/pytest/data/*, tests/pytest/test_openenv_*).

Written by Cursor Bugbot for commit 66ac02b. This will update automatically on new commits. Configure here.

for _ in range(num_generations):
evaluation_rows.append(
EvaluationRow(
messages=[{"role": "user", "content": prompt}],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Standardize Message Object Initialization

messages is initialized with a dict {"role": "user", "content": prompt} instead of a Message object. The EvaluationRow.messages field expects a list of Message objects. This should be messages=[Message(role="user", content=prompt)] after importing Message from eval_protocol.models.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants