-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Refactor nullif to use bitmap layout contract and buffer bitwise APIs #8877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add docs/arraydata_bitmap_layout_contract.md to document ArrayData bitmap invariants - Add Buffer::bitwise_binary / Buffer::bitwise_unary helpers and tests in immutable.rs - Refactor arrow-select/src/nullif.rs to: - Compute validity via compute_nullif_validity using buffer bitwise helpers - Build result ArrayData with offset = 0 per the layout contract - Fix offset/null-count handling for sliced arrays and nested types
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry @joe-ucp -- I don't understand what problem this PR is solving
Can you please remind me what the rationale for these changes are?
| @@ -0,0 +1,76 @@ | |||
| # ArrayData + Bitmap Layout Contract | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to add docs , let's do it in doc comments to keep it close to the code rather than in a separate document that is more likely to drift out of sync
arrow-select/src/nullif.rs
Outdated
| * result_valid(i) = V(i) & !C(i) | ||
| * result_value(i) = left_value(i) // when result_valid(i) == true | ||
| * | ||
| * This contract is the law. All nullif implementations must follow it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't understand the contract is the law comment. We should be enforcing things with tests not external readme files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent here wasn’t to replace tests, but to write down the invariants that the tests are asserting — especially around how offsets map to validity bits and how V(i) & !C(i) is interpreted for sliced arrays. This was primarly used by me to keep the "logic" in my own head. 😂
I’m happy to move that description out of the standalone markdown and into doc comments on the helpers in nullif.rs, and keep the tests as the way we actually enforce the behavior. That should keep the spec close to the code while still making the rules explicit.
If you’d prefer to avoid the extra MD file entirely, I can also drop it and keep the contract only in code + tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think we should strive to follow the patterns that already exist in this codebase (e.g. inline comments rather than markdown files) when possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improving the overall documentation about how arrays work and how to think about validity maps is of course always welcome -- perhaps it would make a good separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Makes sense. In the latest update I’ve:
-
Moved the bitmap/validity contract text out of the standalone markdown file and into doc comments next to the nullif helpers, so the explanation now lives right with the code
-
Dropped the extra .md file from this PR.
I agree a higher-level doc about how arrays/validity maps work would be useful as a separate follow-up PR once this one settles.
This one is mostly about alignment + safety, not fixing a known broken case in main. Concretely, this PR:
The existing tests still pass, and the new tests mainly serve to lock in those layout assumptions so nullif doesn’t become a special case the next time we touch bitwise code. |
I am sorry @joe-ucp I am not likely to have much time to review this PR any time soon. Given the limited amount of review capacity we have in this crate I suspect general code refactors that don't add features or fix issues from non maintainers are likely to be pretty low on the review list. |
What issue does this PR close?
This PR extracts the
nullifbitmap/layout improvements from the larger bitwise PR for independent review.Rationale
This PR addresses three key goals:
nulliflogic:Refactor
nullifto use the same core buffer-level bitwise primitives employed elsewhere.Encode the
ArrayDatabitmap layout contract in documentation and implementation, clarifying:V(i) & !C(i))Fix and guard against bugs involving offsets and null counts, particularly for sliced arrays and nested types.
Summary of Changes
docs/arraydata_bitmap_layout_contract.mddescribing ArrayData bitmap invariants and nullif validity.arrow-buffer/src/buffer/immutable.rswithBuffer::bitwise_binaryandBuffer::bitwise_unary.arrow-select/src/nullif.rsto:ArrayDatawithoffset = 0, with buffers aligned to logical index 0.nullif; no changes made to other kernels.Testing
cargo test -p arrow-select --lib nullif -- --nocapturecargo test -p arrow-select --libcargo test -p arrow-buffer --libcargo test -p arrow-arith --libAll tests pass.
User-Facing Changes
nullifsemantics are unchanged, but handling of edge cases with offsets/null bitmaps is now more robust and formally documented.