Looking at your popcount code in
|
inline unsigned int popcount_32(unsigned int x) |
|
{ |
|
unsigned int m1 = 0x55555555; |
|
unsigned int m2 = 0x33333333; |
|
unsigned int m4 = 0x0f0f0f0f; |
|
unsigned int h01 = 0x01010101; |
|
x -= (x >> 1) & m1; /* put count of each 2 bits into those 2 bits */ |
|
x = (x & m2) + ((x >> 2) & m2); /* put count of each 4 bits in */ |
|
x = (x + (x >> 4)) & m4; /* put count of each 8 bits in partie droite 4bit piece*/ |
|
return (x * h01) >> 24; /* returns left 8 bits of x + (x<<8) + ... */ |
|
} |
|
|
|
|
|
inline unsigned int popcount_64(uint64_t x) |
|
{ |
|
unsigned int low = x & 0xffffffff ; |
|
unsigned int high = ( x >> 32LL) & 0xffffffff ; |
|
|
|
return (popcount_32(low) + popcount_32(high)); |
|
} |
it's perhaps worthwhile noting that you could speed-up your popcounts by using the GCC __builtin_pocount() and __builtin_popcountl() intrinsics when available, instead. Which would be compiled into a single POPCNT instruction on some CPUs - e.g. on not too old Intel ones.
Looking at your popcount code in
BBHash/BooPHF.h
Lines 170 to 189 in 6bb97c4
it's perhaps worthwhile noting that you could speed-up your popcounts by using the GCC
__builtin_pocount()and__builtin_popcountl()intrinsics when available, instead. Which would be compiled into a single POPCNT instruction on some CPUs - e.g. on not too old Intel ones.