-
Notifications
You must be signed in to change notification settings - Fork 385
[Refactor] refactor http request error #1259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1ed910f to
af8f167
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors HTTP request error handling in the rollout system to provide more granular error management and improve system reliability. The refactoring introduces a new error classification system that categorizes HTTP errors and handles them according to different strategies:
- Timeout/Request errors: Retry the request
- Client errors (4xx): Skip the data item
- Server errors (5xx): Track failures per worker and deactivate workers exceeding threshold
- Unknown errors: Raise RuntimeError for immediate attention
Key changes include:
- New
httpx_utils.pymodule with error type enum and result dataclass - Refactored error handling in
worker.pyto use structured error results - Worker deactivation logic in controller based on repeated server errors
- Mock test utilities and comprehensive test suite for error scenarios
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
xtuner/v1/utils/httpx_utils.py |
New utility module defining HTTP error types, result dataclass, and status setting logic |
xtuner/v1/utils/rl_test_utils.py |
Added mock worker classes for simulating different HTTP error scenarios in tests |
xtuner/v1/ray/rollout/worker.py |
Refactored _safe_post_request and rollout_task methods to use structured error handling |
xtuner/v1/ray/rollout/controller.py |
Added worker deactivation logic based on server error counts; moved @ray.remote decorator to call sites |
xtuner/v1/ray/rollout/lmdeploy.py |
Removed @ray.remote decorator to allow dynamic application |
xtuner/v1/ray/environment/base_env.py |
Added support for injecting pre-initialized controllers for testing |
xtuner/v1/ray/environment/single_turn_env.py |
Enhanced environment initialization to support controller injection and handle skipped data |
xtuner/v1/ray/dataflow/flow.py |
Added skipped sample tracking and updated termination conditions |
xtuner/v1/ray/config/worker.py |
Added max_retry_per_worker configuration with automatic default calculation |
xtuner/v1/data_proto/rl_data.py |
Updated check_dataflow_item to handle skipped status; added comment documenting finish_reason values |
xtuner/v1/ray/evaluator.py |
Improved logging to use action_id instead of data object |
tests/ray/test_rollout.py |
Updated tests to use dynamic ray.remote decorator; removed outdated error test |
tests/ray/test_mock_rollout.py |
New comprehensive test suite for error handling scenarios |
tests/ray/test_evaluator.py |
Removed outdated error handling test |
Comments suppressed due to low confidence (1)
xtuner/v1/ray/environment/base_env.py:140
- This import of module ray is redundant, as it was previously imported on line 4.
import ray
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
84d55ac to
fcb40aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
xtuner/v1/ray/environment/base_env.py:140
- This import of module ray is redundant, as it was previously imported on line 4.
import ray
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f6ab33e to
b96b841
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (1)
xtuner/v1/ray/environment/base_env.py:140
- This import of module ray is redundant, as it was previously imported on line 4.
import ray
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
xtuner/v1/ray/environment/base_env.py:140
- This import of module ray is redundant, as it was previously imported on line 4.
import ray
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
961ac25 to
e44e83e
Compare
d8acdce to
5c8c03e
Compare
ec3175b to
e00a267
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
xtuner/v1/ray/environment/base_env.py:145
- This import of module ray is redundant, as it was previously imported on line 5.
import ray
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
增加http server异常处理逻辑: