match: Use an aggregate equality comparison for constant array/slice patterns when possible#155216
match: Use an aggregate equality comparison for constant array/slice patterns when possible#155216jakubadamw wants to merge 6 commits intorust-lang:mainfrom
Conversation
…s when possible
When every element in an array or slice pattern is a constant and there
is no `..` subpattern, the match builder now emits a single call to
`PartialEq::eq` instead of comparing each element one by one.
This drastically reduces the number of MIR basic blocks for large
constant-array matches – e.g. a 64-element `[u8; 64]` match previously
generated 64 separate comparison blocks and now generates just one
`PartialEq::eq` call that LLVM can lower to a `memcmp()`
The optimisation is gated on having at least two constant elements.
Single-element arrays still use a plain scalar comparison.
Example:
```rust
const FOO: [u8; 64] = *b"0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef";
pub fn foo(x: &[u8; 64]) -> bool {
// Before: 64 basic blocks, one per byte.
// After: a single `PartialEq::eq()` call.
matches!(x, &FOO)
}
```
…on a large fixed-length array
|
Some changes occurred in match lowering cc @Nadrieril |
|
rustbot has assigned @JonathanBrouwer. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
…t contexts This is unless the `const_cmp` feature is enabled, in which case `PartialEq` becomes available in said contexts.
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
match: Use an aggregate equality comparison for constant array/slice patterns when possible
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (5a6dba6): comparison URL. Overall result: ❌✅ regressions and improvements - no action neededBenchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.0%, secondary 0.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.3%, secondary 14.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.1%, secondary 0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 491.114s -> 490.988s (-0.03%) |
|
@rustbot reroll |
| impl<'a, 'tcx> Builder<'a, 'tcx> { | ||
| /// Check if we can use aggregate `PartialEq::eq` comparisons for constant array/slice patterns. | ||
| /// This is not possible in const contexts unless `#![feature(const_cmp, const_trait_impl)]` are enabled, | ||
| /// because`PartialEq` is not const-stable. |
There was a problem hiding this comment.
| /// because`PartialEq` is not const-stable. | |
| /// because `PartialEq` is not const-stable. |
| fn can_use_aggregate_eq(&self) -> bool { | ||
| let const_partial_eq_enabled = { | ||
| let features = self.tcx.features(); | ||
| features.enabled(sym::const_trait_impl) && features.enabled(sym::const_cmp) | ||
| }; | ||
| let in_const_context = self.tcx.is_const_fn(self.def_id.to_def_id()) | ||
| || !self.tcx.hir_body_owner_kind(self.def_id).is_fn_or_closure(); | ||
| !in_const_context || const_partial_eq_enabled | ||
| } |
There was a problem hiding this comment.
I'm wary of changing MIR building based on whether features are enabled, especially with one being a library feature. That feels subtle.
It's also unfortunate that MIR building would be conditional on being in a const context, but I'm not sure what could be done about that other than waiting for const_cmp to stabilize. I'd assumed from the discussion on #110870 that that was blocking this approach to simplifying array/slice pattern tests.
| if let PatKind::Constant { value } = pat.kind { | ||
| Some(ty::Const::new_value(tcx, value.valtree, value.ty)) | ||
| } else { | ||
| None | ||
| } |
There was a problem hiding this comment.
It might also be worth reconstructing aggregate constants for arrays/slices of arrays of constants, etc.? I'm not a specialization expert, but it looks like arrays of bytewise-comparable things are also bytewise-comparable, at least for common array lengths1. Since array and slice equality are specialized based on their element types' bytewise-comparability, we should be able to get better codegen for nested array patterns too (as long as the inner arrays are of one of those common lengths), I think?
Footnotes
| // When there is no `..`, all elements are constants, and | ||
| // there are at least two of them, collapse the individual | ||
| // element subpairs into a single aggregate comparison that | ||
| // is performed after the length check. | ||
| if slice.is_none() |
There was a problem hiding this comment.
An additional possibility: even if there is a .., the comparisons for the sub-slices before and after the .. could be done via aggregate equality when applicable. Credit to #121540, which I think did this?
Even if only handling the case with no .., it might be worth moving the special-casing into prefix_slice_suffix to share it between PatKind::Slice and PatKind::Array, since that's where the commonalities live.
|
☔ The latest upstream changes (presumably #155472) made this pull request unmergeable. Please resolve the merge conflicts. |
There was a problem hiding this comment.
The general approach feels unfortunate: we transformed constants into patterns in const_to_pat, and now we try to reverse that transformation. Have we tried keeping the original constant around in the output of const_to_pat and using it at runtime? Tho this has the same issue of const-dependent MIR lowering that @dianne pointed out.
There was a problem hiding this comment.
If we do that, maybe we could also synthesize a constant during THIR building for hand-written array/slice patterns, like this PR currently does in MIR building? That way, we wouldn't end up worse codegen for hand-written array patterns than for const array items used as patterns.
There was a problem hiding this comment.
I dunno, small handwritten arrays behave pretty much like tuples, it could be better codegen sometimes not to make them into constants:
match foo {
// compiles to three nested `if`s today, would become 8 sequential `if`s if turned into constants
[false, false, false] => ...,
[false, false, true] => ...,
[false, true, false] => ...,
...
}It's admittedly a stretch but I'm tempted to err on the side of respecting user intent especially before the MIR boundary.
There was a problem hiding this comment.
Oh, right. Maybe we should have a test for that in mir-opt/building/match/sort_candidates.rs or such if we start optimizing array/slice comparisons? I think as-is this PR may also turn that into 8 sequential tests, each a TestKind::AggregateEq for a different constant.
|
@dianne do you want to take over review here? |
There was a problem hiding this comment.
I don't have much context on the ctfe or const traits side of things to evaluate the approach. cc @oli-obk maybe? It feels like we have to make some sort of compromise here to avoid calling anything const-unstable. Possibly we could block on const_cmp stabilizing, or possibly we could have some workaround until then if we want to land this first.
I think I can handle the technical review, at least. Tentatively, r? me (though feel free to steal the assignment if that'd be easier ^^)
| TestKind::AggregateEq { value } => { | ||
| let tcx = self.tcx; | ||
| let success_block = target_block(TestBranch::Success); | ||
| let fail_block = target_block(TestBranch::Failure); | ||
|
|
||
| let aggregate_ty = value.ty; | ||
| let ref_ty = Ty::new_imm_ref(tcx, tcx.lifetimes.re_erased, aggregate_ty); | ||
|
|
||
| // The constant has type `[T; N]` (or `[T]`), but calling | ||
| // `PartialEq::eq` requires `&[T; N]` (or `&[T]`) operands. | ||
| // Valtree representations are the same with or without the | ||
| // reference wrapper, so we can reinterpret by replacing the type. | ||
| let expected_value = ty::Value { ty: ref_ty, valtree: value.valtree }; | ||
| let expected_operand = | ||
| self.literal_operand(test.span, Const::from_ty_value(tcx, expected_value)); | ||
|
|
||
| // Create a reference to the scrutinee place. | ||
| let actual_ref_place = self.temp(ref_ty, test.span); | ||
| self.cfg.push_assign( | ||
| block, | ||
| self.source_info(test.span), | ||
| actual_ref_place, | ||
| Rvalue::Ref(tcx.lifetimes.re_erased, BorrowKind::Shared, place), | ||
| ); | ||
|
|
||
| // Compare using `<T as PartialEq>::eq` where `T` is the array or slice type. | ||
| self.non_scalar_compare( | ||
| block, | ||
| success_block, | ||
| fail_block, | ||
| source_info, | ||
| aggregate_ty, | ||
| expected_operand, | ||
| Operand::Copy(actual_ref_place), | ||
| ); | ||
| } |
There was a problem hiding this comment.
Apart from missing the assertion that value.ty is str, this looks the same as the case for TestKind::StringEq. Could they be merged?
There was a problem hiding this comment.
If we do that, maybe we could also synthesize a constant during THIR building for hand-written array/slice patterns, like this PR currently does in MIR building? That way, we wouldn't end up worse codegen for hand-written array patterns than for const array items used as patterns.
When every element in an array or slice pattern is a constant and there is no
..subpattern, the match builder will now emit a single call toPartialEq::eqinstead of comparing each element from the value one by one against the respective constant in the pattern.This drastically reduces the number of MIR basic blocks for large constant-array matches – e.g. a 64-element
[u8; 64]match previously generated 64 separate comparison blocks and now generates just onePartialEq::eqcall that LLVM can lower to amemcmp(). The optimisation is gated on having at least two constant elements, meaning single-element arrays will still use a plain scalar comparison.Example:
Closes #103073.
Closes #110870.