-
Notifications
You must be signed in to change notification settings - Fork 10
adding response quality validation for retry #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
morgendave
wants to merge
3
commits into
eval-protocol:main
Choose a base branch
from
morgendave:response-validation-plugin
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| """ | ||
| Rollout result post-processing plugin for quality checks. | ||
|
|
||
| This module provides an abstract base class for post-processing rollout results | ||
| to guard response quality. Post-processors can validate results and raise | ||
| ResponseQualityError if quality checks fail. | ||
| """ | ||
|
|
||
| from abc import ABC, abstractmethod | ||
|
|
||
| from eval_protocol.models import EvaluationRow | ||
|
|
||
|
|
||
| class RolloutResultPostProcessor(ABC): | ||
| """ | ||
| Abstract base class for rollout result post-processing plugins. | ||
|
|
||
| Post-processors validate rollout results and can raise ResponseQualityError | ||
| if quality checks fail. This allows for customizable quality guards that | ||
| can be overridden by users. | ||
| """ | ||
|
|
||
| @abstractmethod | ||
| def process(self, result: EvaluationRow) -> None: | ||
| """ | ||
| Process and validate a rollout result. | ||
|
|
||
| This method should perform quality checks on the result. If quality | ||
| checks fail, it should raise ResponseQualityError with an appropriate | ||
| message. | ||
|
|
||
| Args: | ||
| result: The EvaluationRow result from the rollout | ||
|
|
||
| Raises: | ||
| ResponseQualityError: If quality checks fail | ||
| """ | ||
| pass | ||
|
|
||
|
|
||
| class NoOpRolloutResultPostProcessor(RolloutResultPostProcessor): | ||
| """ | ||
| Default no-op implementation of RolloutResultPostProcessor. | ||
|
|
||
| This implementation does not perform any quality checks and always passes. | ||
| Use this as a default when no post-processing is needed. | ||
| """ | ||
|
|
||
| def process(self, result: EvaluationRow) -> None: | ||
| """ | ||
| No-op implementation that does not perform any quality checks. | ||
|
|
||
| Args: | ||
| result: The EvaluationRow result from the rollout | ||
| """ | ||
| pass | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| """ | ||
| Unit tests for exception_config module. | ||
|
|
||
| Tests the BackoffConfig and ExceptionHandlerConfig classes, including: | ||
| 1. Backoff decorator creation | ||
| 2. Per-exception backoff overrides | ||
| 3. ResponseQualityError default no-backoff configuration | ||
| 4. Exception grouping to avoid double backoff | ||
| """ | ||
|
|
||
| import pytest | ||
| from eval_protocol.pytest.exception_config import BackoffConfig, ExceptionHandlerConfig, DEFAULT_RETRYABLE_EXCEPTIONS | ||
| from eval_protocol.exceptions import ResponseQualityError | ||
|
|
||
|
|
||
| def test_backoff_config_no_exceptions(): | ||
| """Test that BackoffConfig returns no-op decorator when no exceptions specified.""" | ||
| config = BackoffConfig() | ||
| decorator = config.get_backoff_decorator(set()) | ||
|
|
||
| # Should be a no-op decorator | ||
| def test_func(): | ||
| return "test" | ||
|
|
||
| decorated = decorator(test_func) | ||
| assert decorated() == "test" | ||
| assert decorated is test_func # Should be the same function | ||
|
|
||
|
|
||
| def test_backoff_config_no_overrides(): | ||
| """Test that BackoffConfig creates a single decorator.""" | ||
| config = BackoffConfig(strategy="constant", base_delay=0.1, max_tries=2) | ||
| exceptions = {ConnectionError, TimeoutError} | ||
|
|
||
| decorator = config.get_backoff_decorator(exceptions) | ||
| assert decorator is not None | ||
|
|
||
| # Decorator should be callable | ||
| def test_func(): | ||
| raise ConnectionError("test") | ||
|
|
||
| decorated = decorator(test_func) | ||
| assert callable(decorated) | ||
|
|
||
|
|
||
| def test_exception_handler_config_default_response_quality_error(): | ||
| """Test that ExceptionHandlerConfig includes ResponseQualityError by default.""" | ||
| config = ExceptionHandlerConfig() | ||
|
|
||
| # ResponseQualityError should be in retryable_exceptions | ||
| assert ResponseQualityError in config.retryable_exceptions | ||
|
|
||
|
|
||
| def test_exception_handler_config_get_backoff_decorator(): | ||
| """Test that ExceptionHandlerConfig.get_backoff_decorator() works correctly.""" | ||
| config = ExceptionHandlerConfig() | ||
| decorator = config.get_backoff_decorator() | ||
|
|
||
| assert decorator is not None | ||
| assert callable(decorator) | ||
|
|
||
| # Should be able to decorate a function | ||
| def test_func(): | ||
| raise ConnectionError("test") | ||
|
|
||
| decorated = decorator(test_func) | ||
| assert callable(decorated) | ||
|
|
||
|
|
||
| def test_backoff_config_expo_strategy(): | ||
|
|
||
| """Test that BackoffConfig creates expo decorator correctly.""" | ||
| config = BackoffConfig(strategy="expo", base_delay=1.0, max_tries=2) | ||
| exceptions = {ConnectionError} | ||
|
|
||
| decorator = config.get_backoff_decorator(exceptions) | ||
| assert decorator is not None | ||
|
|
||
| def test_func(): | ||
| raise ConnectionError("test") | ||
|
|
||
| decorated = decorator(test_func) | ||
| assert callable(decorated) | ||
|
|
||
|
|
||
| def test_backoff_config_constant_strategy(): | ||
| """Test that BackoffConfig creates constant decorator correctly.""" | ||
| config = BackoffConfig(strategy="constant", base_delay=0.1, max_tries=2) | ||
| exceptions = {ConnectionError} | ||
|
|
||
| decorator = config.get_backoff_decorator(exceptions) | ||
| assert decorator is not None | ||
|
|
||
| def test_func(): | ||
| raise ConnectionError("test") | ||
|
|
||
| decorated = decorator(test_func) | ||
| assert callable(decorated) | ||
|
|
||
|
|
||
| def test_backoff_config_invalid_strategy(): | ||
| """Test that BackoffConfig raises ValueError for invalid strategy.""" | ||
| config = BackoffConfig(strategy="invalid", base_delay=1.0, max_tries=2) | ||
| exceptions = {ConnectionError} | ||
|
|
||
| with pytest.raises(ValueError, match="Unknown backoff strategy"): | ||
| config.get_backoff_decorator(exceptions) | ||
|
|
||
|
|
||
| def test_exception_handler_config_response_quality_error_in_defaults(): | ||
| """Test that ResponseQualityError is in DEFAULT_RETRYABLE_EXCEPTIONS.""" | ||
| assert ResponseQualityError in DEFAULT_RETRYABLE_EXCEPTIONS | ||
|
|
||
|
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.