[pull] master from ray-project:master by pull[bot] · Pull Request #4082 · miqdigital/ray

pull · 2026-04-24T19:35:29Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

## Description Fix the flaky `test_no_process_leak_after_job_finishes`. The test uses a `PidActor` for tracking worker pids. There is a `wait_for_condition` waiting for 3 pids that causes flakiness. The 3 pids should be: 1. actor worker pid 2. parent task worker pid 3. child task worker pid However, the parent task worker pid and the child task worker pid can be the same one sometimes, so we can have only 2 pids, and the `wait_for_condition` will time out. The fix is we track pids with a normal list instead of a set. ## Related issues Fixes anyscale#1429 Signed-off-by: Rueian Huang <rueiancsie@gmail.com>

Updating mac depset to include torch Adding --config-settings editable_mode=compat to install_ray failing build: https://buildkite.com/organizations/ray-project/pipelines/postmerge-macos/builds/12099/jobs/019db846-ac1b-4805-807d-3f1627ae5ffa/log?force=true#1710-3040 FROM CLAUDE: ``` The failing test: //python/ray/tests:test_typing (specifically test_typing_good and test_typing_actor_async) on the macOS postmerge job. Both mypy and pyright report: Module "ray" has no attribute "ObjectRef" (also: init, remote, wait, get, method) The chain of causes: 1. Setuptools ≥64 defaults to a PEP 660 editable install. For Ray's layout, setuptools picks strict mode, which installs a __editable___ray_finder.py MetaPathFinder rather than putting the source dir on sys.path. 2. Runtime import ray works fine — Python executes the finder. 3. mypy and pyright are static type checkers. They resolve imports by walking sys.path / filesystem, not by executing finders. They cannot follow PEP 660 strict-mode editables, and neither tool intends to implement this. 4. The macOS CI runner is persistent. A ray/ directory left in site-packages from a prior run still sits on sys.path. mypy/pyright find that stale copy first (it lacks current stubs / __init__.py exports) and report the missing-attribute errors. The finder-based fresh install is invisible to them. What editable_mode=compat does: forces setuptools to emit a legacy .egg-link / easy-install.pth editable, which puts the current source tree directly on sys.path ahead of any stale copy. Both type checkers see the live source, test_typing passes. ``` postmerge run: https://buildkite.com/ray-project/postmerge-macos/builds/12096 --------- Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

) ## Summary Adds a release-unit test that validates every `anyscale_sdk_2026=true` release test's `cluster_compute` YAML by constructing `anyscale.compute_config.ComputeConfig.from_dict(...)` on the rendered content. Catches schema bugs (unknown fields, wrong types, invalid enums, duplicate worker names) at CI time instead of deep inside a release run. - Iterates all three test-collection files (`release_tests.yaml`, `release_data_tests.yaml`, `release_multimodal_inference_benchmarks_tests.yaml`) so future additions are auto-covered. - Rendering reuses `ray_release.template.load_test_cluster_compute`, so Jinja vars (`{{env["ANYSCALE_CLOUD_NAME"]}}` is the only one referenced today) are resolved the same way a real release run resolves them. - Also accepts `--compute-config-file=PATH` to validate a single YAML in isolation (renders Jinja with `DEFAULT_CLOUD_ID` / `DEFAULT_CLOUD_NAME` fallbacks). - Tagged `release_unit` so it's picked up automatically by the existing `:coral: reef: ci+release tooling tests` step in `.buildkite/cicd.rayci.yml` — no CI YAML edits needed. Collects 61 parametrized cases today (every `anyscale_sdk_2026=true` test across all variations). `pytest_generate_tests` raises a `RuntimeError` if that list is ever empty, so the gate cannot silently disappear if the flag is retired. **Notable design choices**: - No `COMPUTE_CONFIG_MODEL_FIELDS` filter before `from_dict()`. The production code path filters because `set_cluster_compute()` adds runtime-only keys after YAML load; this test loads the raw YAML, so filtering would silently strip typos like `head_nod` and defeat the gate. - `pytest_addoption` is **not** used because pytest only collects that hook from conftest.py or registered plugins, not test modules. To keep the test self-contained in one file, `--compute-config-file` is parsed out of `sys.argv` in `__main__` and passed through the `COMPUTE_CONFIG_FILE` env var. Signed-off-by: sai.miduthuri <sai.miduthuri@anyscale.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… tests (#62642) ## Description Separate Arrow serialization tests into unit and integration files. Pure Python/PyArrow logic tests are moved to `tests/unit/`, while tests requiring a running Ray cluster remain in the top-level integration test file. **Moved to `tests/unit/`** (9 tests): - `test_bytes_for_bits_manual` - `test_bytes_for_bits_auto` - `test_align_bit_offset_auto` - `test_copy_buffer_if_needed` - `test_copy_normal_buffer_if_needed` - `test_copy_bitpacked_buffer_if_needed` - `test_copy_offsets_buffer_if_needed` - `test_fixed_shape_tensor_array_serialization` - `test_variable_shape_tensor_serialization` (+ `_VariableShapeTensorType` helper class) **Remain in integration file** (7 tests): - `test_custom_arrow_data_serializer` (parametrized, uses `ray_start_regular_shared`) - `test_custom_arrow_data_serializer_fallback` (uses `ray_start_regular_shared`) - `test_arrow_scalar_conversion` (uses `ray_start_regular_shared`, `ray.data`) - `test_arrow_object_and_array_support` (uses `ray_start_regular_shared`, `ray.data`) - `test_custom_arrow_data_serializer_parquet_roundtrip` (uses `ray_start_regular_shared`) - `test_arrow_schema_ipc_serialization` (uses `ray_start_regular_shared`) - `test_custom_arrow_data_serializer_disable` (uses `shutdown_only`, `ray.init()`) ## Related issues Related to #61125 ## Additional information . --------- Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>

## Description `AutoscalerMetricsReporter.report_instances()` computes `terminated` across the full instance snapshot, but it was incrementing `stopped_nodes` inside the per-node-type reporting loop. That caused the same terminated transition count to be added once per configured node type instead of once per reporting pass. In a mixed node-type snapshot, `autoscaler_stopped_nodes_total` could therefore be over-counted. Move the `stopped_nodes` increment out of the per-node-type loop so the counter is updated exactly once for each batch of newly terminated instances. ## Related issues #62025 Signed-off-by: weimingdiit <weimingdiit@gmail.com> Co-authored-by: Rueian <rueiancsie@gmail.com>

Populate `instance_type_name` in the autoscaling state generated by the v2 reconciler. Previously `_fill_autoscaling_state()` only populated `ray_node_type_name` for pending instance requests, pending instances, and failed instance requests. As a result, downstream consumers could observe empty provider instance types in parsed autoscaler status. Use `autoscaling_config.get_provider_instance_type()` to fill `instance_type_name`. ## Related issues #62100 --------- Signed-off-by: weimingdiit <weimingdiit@gmail.com>

## Description This PR introduces the new `KubeRayIPPRProvider`, which is the utility that provides IPPR helpers and will be wired up with `KubeRayNodeProvider` and the autoscaler in the upcoming final IPPR PR. The following are the helpers it provides: 1. `validate_and_set_ippr_specs` (in the previous PR) 2. `sync_with_raylets` (in the previous PR) 3. `sync_ippr_status_from_pods` 4. `do_ippr_requests` 5. `get_ippr_statuses` The first 3 helpers will be invoked during the sync phase of each autoscaler reconciliation for reconciling both sides of Ray and Kubernetes. The last 2 helpers will be invoked after the sync phase for the autoscaler to decide whether to do IPPR or not during the bin packing simulation. But this PR only introduces the `KubeRayIPPRProvider`. There are no actual behavior changes in the autoscaler yet. The actual behavior changes will come in the next PR. --------- Signed-off-by: Rueian Huang <rueiancsie@gmail.com>

rueian and others added 7 commits April 24, 2026 09:52

pull Bot locked and limited conversation to collaborators Apr 24, 2026

pull Bot added the ⤵️ pull label Apr 24, 2026

pull Bot merged commit 3267463 into miqdigital:master Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ray-project:master#4082

[pull] master from ray-project:master#4082
pull[bot] merged 7 commits intomiqdigital:masterfrom
ray-project:master

pull Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

pull Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pull Bot commented Apr 24, 2026 •

edited

Loading