[pull] master from ray-project:master#4082
Merged
pull[bot] merged 7 commits intomiqdigital:masterfrom Apr 24, 2026
Merged
Conversation
## Description Fix the flaky `test_no_process_leak_after_job_finishes`. The test uses a `PidActor` for tracking worker pids. There is a `wait_for_condition` waiting for 3 pids that causes flakiness. The 3 pids should be: 1. actor worker pid 2. parent task worker pid 3. child task worker pid However, the parent task worker pid and the child task worker pid can be the same one sometimes, so we can have only 2 pids, and the `wait_for_condition` will time out. The fix is we track pids with a normal list instead of a set. ## Related issues Fixes anyscale#1429 Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Updating mac depset to include torch Adding --config-settings editable_mode=compat to install_ray failing build: https://buildkite.com/organizations/ray-project/pipelines/postmerge-macos/builds/12099/jobs/019db846-ac1b-4805-807d-3f1627ae5ffa/log?force=true#1710-3040 FROM CLAUDE: ``` The failing test: //python/ray/tests:test_typing (specifically test_typing_good and test_typing_actor_async) on the macOS postmerge job. Both mypy and pyright report: Module "ray" has no attribute "ObjectRef" (also: init, remote, wait, get, method) The chain of causes: 1. Setuptools ≥64 defaults to a PEP 660 editable install. For Ray's layout, setuptools picks strict mode, which installs a __editable___ray_finder.py MetaPathFinder rather than putting the source dir on sys.path. 2. Runtime import ray works fine — Python executes the finder. 3. mypy and pyright are static type checkers. They resolve imports by walking sys.path / filesystem, not by executing finders. They cannot follow PEP 660 strict-mode editables, and neither tool intends to implement this. 4. The macOS CI runner is persistent. A ray/ directory left in site-packages from a prior run still sits on sys.path. mypy/pyright find that stale copy first (it lacks current stubs / __init__.py exports) and report the missing-attribute errors. The finder-based fresh install is invisible to them. What editable_mode=compat does: forces setuptools to emit a legacy .egg-link / easy-install.pth editable, which puts the current source tree directly on sys.path ahead of any stale copy. Both type checkers see the live source, test_typing passes. ``` postmerge run: https://buildkite.com/ray-project/postmerge-macos/builds/12096 --------- Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
) ## Summary Adds a release-unit test that validates every `anyscale_sdk_2026=true` release test's `cluster_compute` YAML by constructing `anyscale.compute_config.ComputeConfig.from_dict(...)` on the rendered content. Catches schema bugs (unknown fields, wrong types, invalid enums, duplicate worker names) at CI time instead of deep inside a release run. - Iterates all three test-collection files (`release_tests.yaml`, `release_data_tests.yaml`, `release_multimodal_inference_benchmarks_tests.yaml`) so future additions are auto-covered. - Rendering reuses `ray_release.template.load_test_cluster_compute`, so Jinja vars (`{{env["ANYSCALE_CLOUD_NAME"]}}` is the only one referenced today) are resolved the same way a real release run resolves them. - Also accepts `--compute-config-file=PATH` to validate a single YAML in isolation (renders Jinja with `DEFAULT_CLOUD_ID` / `DEFAULT_CLOUD_NAME` fallbacks). - Tagged `release_unit` so it's picked up automatically by the existing `:coral: reef: ci+release tooling tests` step in `.buildkite/cicd.rayci.yml` — no CI YAML edits needed. Collects 61 parametrized cases today (every `anyscale_sdk_2026=true` test across all variations). `pytest_generate_tests` raises a `RuntimeError` if that list is ever empty, so the gate cannot silently disappear if the flag is retired. **Notable design choices**: - No `COMPUTE_CONFIG_MODEL_FIELDS` filter before `from_dict()`. The production code path filters because `set_cluster_compute()` adds runtime-only keys after YAML load; this test loads the raw YAML, so filtering would silently strip typos like `head_nod` and defeat the gate. - `pytest_addoption` is **not** used because pytest only collects that hook from conftest.py or registered plugins, not test modules. To keep the test self-contained in one file, `--compute-config-file` is parsed out of `sys.argv` in `__main__` and passed through the `COMPUTE_CONFIG_FILE` env var. Signed-off-by: sai.miduthuri <sai.miduthuri@anyscale.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tests (#62642) ## Description Separate Arrow serialization tests into unit and integration files. Pure Python/PyArrow logic tests are moved to `tests/unit/`, while tests requiring a running Ray cluster remain in the top-level integration test file. **Moved to `tests/unit/`** (9 tests): - `test_bytes_for_bits_manual` - `test_bytes_for_bits_auto` - `test_align_bit_offset_auto` - `test_copy_buffer_if_needed` - `test_copy_normal_buffer_if_needed` - `test_copy_bitpacked_buffer_if_needed` - `test_copy_offsets_buffer_if_needed` - `test_fixed_shape_tensor_array_serialization` - `test_variable_shape_tensor_serialization` (+ `_VariableShapeTensorType` helper class) **Remain in integration file** (7 tests): - `test_custom_arrow_data_serializer` (parametrized, uses `ray_start_regular_shared`) - `test_custom_arrow_data_serializer_fallback` (uses `ray_start_regular_shared`) - `test_arrow_scalar_conversion` (uses `ray_start_regular_shared`, `ray.data`) - `test_arrow_object_and_array_support` (uses `ray_start_regular_shared`, `ray.data`) - `test_custom_arrow_data_serializer_parquet_roundtrip` (uses `ray_start_regular_shared`) - `test_arrow_schema_ipc_serialization` (uses `ray_start_regular_shared`) - `test_custom_arrow_data_serializer_disable` (uses `shutdown_only`, `ray.init()`) ## Related issues Related to #61125 ## Additional information . --------- Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
## Description `AutoscalerMetricsReporter.report_instances()` computes `terminated` across the full instance snapshot, but it was incrementing `stopped_nodes` inside the per-node-type reporting loop. That caused the same terminated transition count to be added once per configured node type instead of once per reporting pass. In a mixed node-type snapshot, `autoscaler_stopped_nodes_total` could therefore be over-counted. Move the `stopped_nodes` increment out of the per-node-type loop so the counter is updated exactly once for each batch of newly terminated instances. ## Related issues #62025 Signed-off-by: weimingdiit <weimingdiit@gmail.com> Co-authored-by: Rueian <rueiancsie@gmail.com>
Populate `instance_type_name` in the autoscaling state generated by the v2 reconciler. Previously `_fill_autoscaling_state()` only populated `ray_node_type_name` for pending instance requests, pending instances, and failed instance requests. As a result, downstream consumers could observe empty provider instance types in parsed autoscaler status. Use `autoscaling_config.get_provider_instance_type()` to fill `instance_type_name`. ## Related issues #62100 --------- Signed-off-by: weimingdiit <weimingdiit@gmail.com>
## Description This PR introduces the new `KubeRayIPPRProvider`, which is the utility that provides IPPR helpers and will be wired up with `KubeRayNodeProvider` and the autoscaler in the upcoming final IPPR PR. The following are the helpers it provides: 1. `validate_and_set_ippr_specs` (in the previous PR) 2. `sync_with_raylets` (in the previous PR) 3. `sync_ippr_status_from_pods` 4. `do_ippr_requests` 5. `get_ippr_statuses` The first 3 helpers will be invoked during the sync phase of each autoscaler reconciliation for reconciling both sides of Ray and Kubernetes. The last 2 helpers will be invoked after the sync phase for the autoscaler to decide whether to do IPPR or not during the bin packing simulation. But this PR only introduces the `KubeRayIPPRProvider`. There are no actual behavior changes in the autoscaler yet. The actual behavior changes will come in the next PR. --------- Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )