Skip to content

Conversation

@morgendave
Copy link
Collaborator

@morgendave morgendave commented Nov 24, 2025


name: Adding response quality validation for retry
about: response quality validation for retryable error
title: "Adding response quality validation for retry"
labels: ''
assignees: ''


Description

For bad rollouts that makes the final output bad which has quality issues, eg, gibberish, repetitions, etc, fail it with this plugin with customizations. Default no-op

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

Test with current tests and added decorator tests

Checklist:

  • My code follows the style guidelines of this project (ran black ., isort ., flake8 .)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Screenshots (if applicable)

If applicable, add screenshots to help showcase your changes.

Additional context

Add any other context about the PR here.


Note

Introduces RESPONSE_QUALITY_ERROR (103) and a post-processing hook to validate rollout results and trigger retries; updates retry config and adds tests.

  • Core/Models & Exceptions:
    • Add ResponseQualityError (code 103) in eval_protocol/exceptions.py and map in STATUS_CODE_TO_EXCEPTION.
    • Extend Status.Code with RESPONSE_QUALITY_ERROR and add Status.response_quality_error() in eval_protocol/models.py.
  • Pytest Runtime:
    • New post-processing plugin API in eval_protocol/pytest/rollout_result_post_processor.py with RolloutResultPostProcessor and NoOpRolloutResultPostProcessor (exported in eval_protocol/pytest/__init__.py).
    • Add optional post_processor to RolloutProcessorConfig in eval_protocol/pytest/types.py.
    • Integrate post-processing into retry flow in evaluation_test_utils.rollout_processor_with_retry(); raising ResponseQualityError triggers backoff retries.
  • Retry Config:
    • Include ResponseQualityError in DEFAULT_RETRYABLE_EXCEPTIONS and minor typing/format adjustments in eval_protocol/pytest/exception_config.py.
  • Tests:
    • Add tests/test_exception_config.py covering backoff decorator and inclusion of ResponseQualityError.
    • Extend tests/test_exceptions.py for code 103 mapping and behavior.
    • Update tests/test_retry_mechanism.py with scenarios verifying retry on ResponseQualityError.

Written by Cursor Bugbot for commit 2eeb478. This will update automatically on new commits. Configure here.

mode="pointwise",
exception_handler_config=ExceptionHandlerConfig(
backoff_config=BackoffConfig(max_tries=3, base_delay=0.1, strategy="constant"),
exception_backoff_overrides={
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the default backoff retry logic not fit in this case? is it necessary to override it for quality error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default will backoff, but I imagine we don't want to backoff in quality case. Though if we unify them the code complexity will be much smaller

Comment on lines 371 to 379
if config.post_processor is not None:
try:
config.post_processor.process(result)
except ResponseQualityError as quality_error:
# Re-raise ResponseQualityError to trigger retry logic
raise quality_error
except Exception as post_process_error:
# Wrap unexpected post-processor errors in ResponseQualityError
raise ResponseQualityError(f"Post-processor failed: {post_process_error}") from post_process_error
Copy link
Collaborator

@dphuang2 dphuang2 Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so all errors are ResponseQualityError if a post_processor is set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, let me change this

Copy link
Collaborator

@dphuang2 dphuang2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make sure this works with remote rollout processor?

@morgendave morgendave force-pushed the response-validation-plugin branch from 3d209f5 to 2eeb478 Compare November 26, 2025 23:54
@morgendave morgendave mentioned this pull request Nov 27, 2025
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants