Skip to content

Conversation

@razdoburdin
Copy link
Contributor

This small PR adds an ability to set block_size value while building index. If parameter isn't set the default value of 2^30 is used.

Copy link
Contributor

@ahuber21 ahuber21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! This should do what we want it to do. I'd refine the tests a bit and maybe improve the syntax. We can brainstorm together what the cleanest implementation would look like.

) {
auto threadpool = default_threadpool();

auto blocksize_bytes = svs::lib::PowerOfTwo(blocksize_exp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a runtime value, we'd probably need some validation. Do you think it could be a good idea to create a struct similar to PowerOfTwo and use that as function arg rather than the plain int. We could perform some boundary checks in the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that make sense.

What values of exponents are acceptable? In the best of my understanding, blocksize_bytes should be in range [4KB, 1GB] (based on possible page sizes

HugepageX86Parameters{1 << 30, MAP_HUGETLB | MAP_HUGE_1GB},
) , right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds reasonable. @mihaic did you ever experiment with even bigger hugepages?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Xeon only goes to 1 GiB, we can have that as the limit.

std::vector<size_t> labels(test_n);
std::iota(labels.begin(), labels.end(), 0);

int block_size_exp = 17; // block_size = 2^block_size_exp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond testing if passing the param works, we should also test if the block size actually changed.

// Abstract interface for Dynamic Vamana-based indexes.
struct SVS_RUNTIME_API DynamicVamanaIndex : public VamanaIndex {
virtual Status add(size_t n, const size_t* labels, const float* x) noexcept = 0;
virtual Status add(size_t n, const size_t* labels, const float* x, int blocksize_exp = 30) noexcept = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there's a cleaner way to communicate this value to init_impl(). Maybe @rfsaliev has an opinion?

Copy link
Member

@rfsaliev rfsaliev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, there is some functionality to be implemented.

struct SVS_RUNTIME_API DynamicVamanaIndex : public VamanaIndex {
virtual Status add(size_t n, const size_t* labels, const float* x) noexcept = 0;
virtual Status
add(size_t n, const size_t* labels, const float* x, int blocksize_exp = 30
Copy link
Member

@rfsaliev rfsaliev Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this approach manages block size for an index created from scratch.
How will we manage block size for an index loaded from a stream?

static StorageType init(
const svs::data::ConstSimpleDataView<float>& data,
Pool& pool,
svs::lib::PowerOfTwo SVS_UNUSED(blocksize_bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQDataset<>::compress() accepts customized allocator via allocator argument...

static StorageType init(
const svs::data::ConstSimpleDataView<float>& data,
Pool& pool,
svs::lib::PowerOfTwo SVS_UNUSED(blocksize_bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LVQDataset<>::compress() accepts customized allocator via allocator argument...

static StorageType init(
const svs::data::ConstSimpleDataView<float>& data,
Pool& pool,
svs::lib::PowerOfTwo SVS_UNUSED(blocksize_bytes),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LeanDataset<>::compress() accepts customized allocator via allocator argument...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants