Skip to content

[pull] master from ray-project:master#4082

Merged
pull[bot] merged 7 commits intomiqdigital:masterfrom
ray-project:master
Apr 24, 2026
Merged

[pull] master from ray-project:master#4082
pull[bot] merged 7 commits intomiqdigital:masterfrom
ray-project:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Apr 24, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

rueian and others added 7 commits April 24, 2026 09:52
## Description

Fix the flaky `test_no_process_leak_after_job_finishes`.

The test uses a `PidActor` for tracking worker pids. There is a
`wait_for_condition` waiting for 3 pids that causes flakiness. The 3
pids should be:
1. actor worker pid
2. parent task worker pid
3. child task worker pid

However, the parent task worker pid and the child task worker pid can be
the same one sometimes, so we can have only 2 pids, and the
`wait_for_condition` will time out.

The fix is we track pids with a normal list instead of a set.

## Related issues
Fixes anyscale#1429

Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Updating mac depset to include torch

Adding --config-settings editable_mode=compat to install_ray
failing build:
https://buildkite.com/organizations/ray-project/pipelines/postmerge-macos/builds/12099/jobs/019db846-ac1b-4805-807d-3f1627ae5ffa/log?force=true#1710-3040
FROM CLAUDE:
 ```
The failing test: //python/ray/tests:test_typing (specifically
test_typing_good and test_typing_actor_async) on the macOS postmerge
job. Both mypy and pyright report:
Module "ray" has no attribute "ObjectRef" (also: init, remote, wait,
get, method)

The chain of causes:
1. Setuptools ≥64 defaults to a PEP 660 editable install. For Ray's
layout, setuptools picks strict mode, which installs a
__editable___ray_finder.py MetaPathFinder rather than putting the source
dir on sys.path.
2. Runtime import ray works fine — Python executes the finder.
3. mypy and pyright are static type checkers. They resolve imports by
walking sys.path / filesystem, not by executing finders. They cannot
follow PEP 660 strict-mode editables, and neither tool intends to
implement this.
4. The macOS CI runner is persistent. A ray/ directory left in
site-packages from a prior run still sits on sys.path. mypy/pyright find
that stale copy first (it lacks current stubs / __init__.py exports) and
report the missing-attribute errors. The finder-based fresh install is
invisible to them.

What editable_mode=compat does: forces setuptools to emit a legacy
.egg-link / easy-install.pth editable, which puts the current source
tree directly on sys.path ahead of any stale copy. Both type checkers
see the live source, test_typing passes.
```

postmerge run: https://buildkite.com/ray-project/postmerge-macos/builds/12096

---------

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
)

## Summary

Adds a release-unit test that validates every `anyscale_sdk_2026=true`
release test's `cluster_compute` YAML by constructing
`anyscale.compute_config.ComputeConfig.from_dict(...)` on the rendered
content. Catches schema bugs (unknown fields, wrong types, invalid
enums, duplicate worker names) at CI time instead of deep inside a
release run.

- Iterates all three test-collection files (`release_tests.yaml`,
`release_data_tests.yaml`,
`release_multimodal_inference_benchmarks_tests.yaml`) so future
additions are auto-covered.
- Rendering reuses `ray_release.template.load_test_cluster_compute`, so
Jinja vars (`{{env["ANYSCALE_CLOUD_NAME"]}}` is the only one referenced
today) are resolved the same way a real release run resolves them.
- Also accepts `--compute-config-file=PATH` to validate a single YAML in
isolation (renders Jinja with `DEFAULT_CLOUD_ID` / `DEFAULT_CLOUD_NAME`
fallbacks).
- Tagged `release_unit` so it's picked up automatically by the existing
`:coral: reef: ci+release tooling tests` step in
`.buildkite/cicd.rayci.yml` — no CI YAML edits needed.

Collects 61 parametrized cases today (every `anyscale_sdk_2026=true`
test across all variations). `pytest_generate_tests` raises a
`RuntimeError` if that list is ever empty, so the gate cannot silently
disappear if the flag is retired.

**Notable design choices**:
- No `COMPUTE_CONFIG_MODEL_FIELDS` filter before `from_dict()`. The
production code path filters because `set_cluster_compute()` adds
runtime-only keys after YAML load; this test loads the raw YAML, so
filtering would silently strip typos like `head_nod` and defeat the
gate.
- `pytest_addoption` is **not** used because pytest only collects that
hook from conftest.py or registered plugins, not test modules. To keep
the test self-contained in one file, `--compute-config-file` is parsed
out of `sys.argv` in `__main__` and passed through the
`COMPUTE_CONFIG_FILE` env var.

Signed-off-by: sai.miduthuri <sai.miduthuri@anyscale.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tests (#62642)

## Description

Separate Arrow serialization tests into unit and integration files.

Pure Python/PyArrow logic tests are moved to `tests/unit/`, while tests
requiring a running Ray cluster remain in the top-level integration test
file.

**Moved to `tests/unit/`** (9 tests):
- `test_bytes_for_bits_manual`
- `test_bytes_for_bits_auto`
- `test_align_bit_offset_auto`
- `test_copy_buffer_if_needed`
- `test_copy_normal_buffer_if_needed`
- `test_copy_bitpacked_buffer_if_needed`
- `test_copy_offsets_buffer_if_needed`
- `test_fixed_shape_tensor_array_serialization`
- `test_variable_shape_tensor_serialization` (+
`_VariableShapeTensorType` helper class)

**Remain in integration file** (7 tests):
- `test_custom_arrow_data_serializer` (parametrized, uses
`ray_start_regular_shared`)
- `test_custom_arrow_data_serializer_fallback` (uses
`ray_start_regular_shared`)
- `test_arrow_scalar_conversion` (uses `ray_start_regular_shared`,
`ray.data`)
- `test_arrow_object_and_array_support` (uses
`ray_start_regular_shared`, `ray.data`)
- `test_custom_arrow_data_serializer_parquet_roundtrip` (uses
`ray_start_regular_shared`)
- `test_arrow_schema_ipc_serialization` (uses
`ray_start_regular_shared`)
- `test_custom_arrow_data_serializer_disable` (uses `shutdown_only`,
`ray.init()`)

## Related issues

Related to #61125

## Additional information

.

---------

Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
## Description
`AutoscalerMetricsReporter.report_instances()` computes `terminated`
across the full instance snapshot, but it was incrementing
`stopped_nodes` inside the per-node-type reporting loop.

That caused the same terminated transition count to be added once per
configured node type instead of once per reporting pass. In a mixed
node-type snapshot, `autoscaler_stopped_nodes_total` could therefore be
over-counted.

Move the `stopped_nodes` increment out of the per-node-type loop so the
counter is updated exactly once for each batch of newly terminated
instances.

## Related issues
#62025

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>
Populate `instance_type_name` in the autoscaling state generated by the
v2 reconciler.

Previously `_fill_autoscaling_state()` only populated
`ray_node_type_name` for pending instance requests, pending instances,
and failed instance requests. As a result, downstream consumers could
observe empty provider instance types in parsed autoscaler status.

Use `autoscaling_config.get_provider_instance_type()` to fill
`instance_type_name`.


## Related issues
#62100

---------

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
## Description

This PR introduces the new `KubeRayIPPRProvider`, which is the utility
that provides IPPR helpers and will be wired up with
`KubeRayNodeProvider` and the autoscaler in the upcoming final IPPR PR.

The following are the helpers it provides:

1. `validate_and_set_ippr_specs` (in the previous PR)
2. `sync_with_raylets` (in the previous PR)
3. `sync_ippr_status_from_pods`
4. `do_ippr_requests`
5. `get_ippr_statuses`

The first 3 helpers will be invoked during the sync phase of each
autoscaler reconciliation for reconciling both sides of Ray and
Kubernetes.

The last 2 helpers will be invoked after the sync phase for the
autoscaler to decide whether to do IPPR or not during the bin packing
simulation.

But this PR only introduces the `KubeRayIPPRProvider`. There are no
actual behavior changes in the autoscaler yet. The actual behavior
changes will come in the next PR.

---------

Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
@pull pull Bot locked and limited conversation to collaborators Apr 24, 2026
@pull pull Bot added the ⤵️ pull label Apr 24, 2026
@pull pull Bot merged commit 3267463 into miqdigital:master Apr 24, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants