-
Notifications
You must be signed in to change notification settings - Fork 34
Make block_size flexible #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rfsaliev/cpp-runtime-binding
Are you sure you want to change the base?
Make block_size flexible #235
Conversation
ahuber21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! This should do what we want it to do. I'd refine the tests a bit and maybe improve the syntax. We can brainstorm together what the cleanest implementation would look like.
| ) { | ||
| auto threadpool = default_threadpool(); | ||
|
|
||
| auto blocksize_bytes = svs::lib::PowerOfTwo(blocksize_exp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a runtime value, we'd probably need some validation. Do you think it could be a good idea to create a struct similar to PowerOfTwo and use that as function arg rather than the plain int. We could perform some boundary checks in the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that make sense.
What values of exponents are acceptable? In the best of my understanding, blocksize_bytes should be in range [4KB, 1GB] (based on possible page sizes
| HugepageX86Parameters{1 << 30, MAP_HUGETLB | MAP_HUGE_1GB}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that sounds reasonable. @mihaic did you ever experiment with even bigger hugepages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Xeon only goes to 1 GiB, we can have that as the limit.
| std::vector<size_t> labels(test_n); | ||
| std::iota(labels.begin(), labels.end(), 0); | ||
|
|
||
| int block_size_exp = 17; // block_size = 2^block_size_exp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond testing if passing the param works, we should also test if the block size actually changed.
| // Abstract interface for Dynamic Vamana-based indexes. | ||
| struct SVS_RUNTIME_API DynamicVamanaIndex : public VamanaIndex { | ||
| virtual Status add(size_t n, const size_t* labels, const float* x) noexcept = 0; | ||
| virtual Status add(size_t n, const size_t* labels, const float* x, int blocksize_exp = 30) noexcept = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if there's a cleaner way to communicate this value to init_impl(). Maybe @rfsaliev has an opinion?
rfsaliev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, there is some functionality to be implemented.
| struct SVS_RUNTIME_API DynamicVamanaIndex : public VamanaIndex { | ||
| virtual Status add(size_t n, const size_t* labels, const float* x) noexcept = 0; | ||
| virtual Status | ||
| add(size_t n, const size_t* labels, const float* x, int blocksize_exp = 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this approach manages block size for an index created from scratch.
How will we manage block size for an index loaded from a stream?
| static StorageType init( | ||
| const svs::data::ConstSimpleDataView<float>& data, | ||
| Pool& pool, | ||
| svs::lib::PowerOfTwo SVS_UNUSED(blocksize_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQDataset<>::compress() accepts customized allocator via allocator argument...
| static StorageType init( | ||
| const svs::data::ConstSimpleDataView<float>& data, | ||
| Pool& pool, | ||
| svs::lib::PowerOfTwo SVS_UNUSED(blocksize_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LVQDataset<>::compress() accepts customized allocator via allocator argument...
| static StorageType init( | ||
| const svs::data::ConstSimpleDataView<float>& data, | ||
| Pool& pool, | ||
| svs::lib::PowerOfTwo SVS_UNUSED(blocksize_bytes), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LeanDataset<>::compress() accepts customized allocator via allocator argument...
This small PR adds an ability to set block_size value while building index. If parameter isn't set the default value of 2^30 is used.