Skip to content

Conversation

@shreymodi1
Copy link
Contributor

@shreymodi1 shreymodi1 commented Dec 18, 2025

Note

Updates streaming compliance benchmarks and rollout plumbing.

  • Default model set to glm-4p6; import pytest and add @pytest.mark.skipif to conditionally run reasoning and multi-tool tests via EP_SUPPORTS_REASONING and EP_SUPPORTS_MULTIPLE_TOOL_CALLS
  • New helpers _maybe_add_reasoning_effort and updated _build_completion_params_from_payload to only pass reasoning_effort when supported; many tests now use these helpers
  • Add raw_output: True to test params; rollout moves raw_output to extra_body and captures response raw_output into row.execution_metadata.raw_output
  • Model updates: ExecutionMetadata gains raw_output field to store provider extras
  • Tests adjusted to include raw_output, and make reasoning checks/metrics conditional on support; minor param tweaks across streaming and non-streaming cases

Written by Cursor Bugbot for commit da015c5. This will update automatically on new commits. Configure here.

Copy link
Collaborator

@dphuang2 dphuang2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets avoid test-specific environment variables in favor of globally configurable environment variables

@dphuang2
Copy link
Collaborator

dphuang2 commented Dec 22, 2025

@shreymodi1 shreymodi1 merged commit 37d2e02 into main Dec 22, 2025
17 checks passed
@shreymodi1 shreymodi1 deleted the shrey/updatedtests branch December 22, 2025 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants