adding response quality validation for retry #344

morgendave · 2025-11-24T19:35:50Z

name: Adding response quality validation for retry
about: response quality validation for retryable error
title: "Adding response quality validation for retry"
labels: ''
assignees: ''

Description

For bad rollouts that makes the final output bad which has quality issues, eg, gibberish, repetitions, etc, fail it with this plugin with customizations. Default no-op

Type of change

New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

Test with current tests and added decorator tests

Checklist:

My code follows the style guidelines of this project (ran black ., isort ., flake8 .)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules
I have checked my code and corrected any misspellings

Screenshots (if applicable)

If applicable, add screenshots to help showcase your changes.

Additional context

Add any other context about the PR here.

Note

Introduces RESPONSE_QUALITY_ERROR (103) and a post-processing hook to validate rollout results and trigger retries; updates retry config and adds tests.

Core/Models & Exceptions:
- Add ResponseQualityError (code 103) in eval_protocol/exceptions.py and map in STATUS_CODE_TO_EXCEPTION.
- Extend Status.Code with RESPONSE_QUALITY_ERROR and add Status.response_quality_error() in eval_protocol/models.py.
Pytest Runtime:
- New post-processing plugin API in eval_protocol/pytest/rollout_result_post_processor.py with RolloutResultPostProcessor and NoOpRolloutResultPostProcessor (exported in eval_protocol/pytest/__init__.py).
- Add optional post_processor to RolloutProcessorConfig in eval_protocol/pytest/types.py.
- Integrate post-processing into retry flow in evaluation_test_utils.rollout_processor_with_retry(); raising ResponseQualityError triggers backoff retries.
Retry Config:
- Include ResponseQualityError in DEFAULT_RETRYABLE_EXCEPTIONS and minor typing/format adjustments in eval_protocol/pytest/exception_config.py.
Tests:
- Add tests/test_exception_config.py covering backoff decorator and inclusion of ResponseQualityError.
- Extend tests/test_exceptions.py for code 103 mapping and behavior.
- Update tests/test_retry_mechanism.py with scenarios verifying retry on ResponseQualityError.

^{Written by Cursor Bugbot for commit 2eeb478. This will update automatically on new commits. Configure here.}

eval_protocol/pytest/evaluation_test_utils.py

mayinghan · 2025-11-25T21:23:22Z

tests/test_retry_mechanism.py

+    mode="pointwise",
+    exception_handler_config=ExceptionHandlerConfig(
+        backoff_config=BackoffConfig(max_tries=3, base_delay=0.1, strategy="constant"),
+        exception_backoff_overrides={


will the default backoff retry logic not fit in this case? is it necessary to override it for quality error?

Default will backoff, but I imagine we don't want to backoff in quality case. Though if we unify them the code complexity will be much smaller

dphuang2 · 2025-11-25T21:29:58Z

eval_protocol/pytest/evaluation_test_utils.py

+            if config.post_processor is not None:
+                try:
+                    config.post_processor.process(result)
+                except ResponseQualityError as quality_error:
+                    # Re-raise ResponseQualityError to trigger retry logic
+                    raise quality_error
+                except Exception as post_process_error:
+                    # Wrap unexpected post-processor errors in ResponseQualityError
+                    raise ResponseQualityError(f"Post-processor failed: {post_process_error}") from post_process_error


so all errors are ResponseQualityError if a post_processor is set?

My bad, let me change this

dphuang2

can we make sure this works with remote rollout processor?

cursor bot reviewed Nov 24, 2025

View reviewed changes

eval_protocol/pytest/evaluation_test_utils.py Show resolved Hide resolved

mayinghan reviewed Nov 25, 2025

View reviewed changes

dphuang2 reviewed Nov 25, 2025

View reviewed changes

morgendave added 3 commits November 26, 2025 15:54

adding response quality validation for retry

5a7be6d

Add rollout result post processor

a322224

simplify the config

2eeb478

morgendave force-pushed the response-validation-plugin branch from 3d209f5 to 2eeb478 Compare November 26, 2025 23:54

morgendave mentioned this pull request Nov 27, 2025

Response validation plugin #351

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adding response quality validation for retry #344

adding response quality validation for retry #344

morgendave commented Nov 24, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

mayinghan Nov 25, 2025

Uh oh!

morgendave Nov 25, 2025

Uh oh!

dphuang2 Nov 25, 2025 •

edited

Loading

Uh oh!

morgendave Nov 25, 2025

Uh oh!

dphuang2 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adding response quality validation for retry #344

Are you sure you want to change the base?

adding response quality validation for retry #344

Conversation

morgendave commented Nov 24, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How Has This Been Tested?

Checklist:

Screenshots (if applicable)

Additional context

Uh oh!

Uh oh!

mayinghan Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

morgendave Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

dphuang2 Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

morgendave Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

dphuang2 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

morgendave commented Nov 24, 2025 •

edited by cursor bot

Loading

dphuang2 Nov 25, 2025 •

edited

Loading