一个程序员的辩白

01 Jul 2017

Lowest Bit Set

How do you solve the question “Determine position of lowest bit set”?

I’m not excepting you to solve it by now, but think about it.

The intuitive way to solve it may like Position of lowest bit set:

unsigned int lowest_bit_set_naive(unsigned int x)
{
    // x=0 is not properly handled by while-loop
    if (x == 0) return 0;

    unsigned int result = 0;
    while ((x & 1) == 0) {
        x >>= 1;
        result++;
    }

    return result;
}

Yes, it solved the question nicely, I don’t want be captious, 0 and 1 have the same value returned :-(

 

Position of lowest bit set also given an opaque solution to this question.

Much more like this(rearranged):

const unsigned int MultiplyDeBruijnBitPosition[32] = {
    // precomputed lookup table
    0,  1, 28,  2, 29, 14, 24,  3, 30, 22, 20, 15, 25, 17,  4,  8,
    31, 27, 13, 23, 21, 19, 16,  7, 26, 12, 18,  6, 11,  5, 10,  9
}

unsigned int lowestBitSet(unsigned int x)
{
    // only lowest bit will remain 1, all others become 0
    x &= - (int) x;
    // DeBruijn constant
    x *= 0x077cb531;
    // the upper 5 bits are unique, skip the rest (32 - 5 = 27)
    x >>= 27;
    // convert to actual position
    return MultiplyDeBruijnBitPosition[x];
}

 

Even with comments, it’s not easy to understand this for some people(like me) at the first glance.

The most confusions come from:

  • What’s the De Bruijn constant?
  • How the lookup table precomputed?

Before answering those two questions, I need explain to you what’s the idea behind this opaque solution:

  • It applied a mapping concept, which means, the bit set position is one-to-one.

  • Generate a precomputed array to store the mapping result, which can lead us to real position.

Though we now know that the magic number can generate an unique 5-bits number. But how we can generate it?

 

My intuitive way is brute-force probing, which is easy and applicable.

#include <stdio.h>
#include <string.h>

#define LIM 32
#define BIT(x) (1 << (x))

static unsigned int map[] = {
    BIT(0),  BIT(1),  BIT(2),  BIT(3),
    BIT(4),  BIT(5),  BIT(6),  BIT(7),
    BIT(8),  BIT(9),  BIT(10), BIT(11),
    BIT(12), BIT(13), BIT(14), BIT(15),
    BIT(16), BIT(17), BIT(18), BIT(19),
    BIT(20), BIT(21), BIT(22), BIT(23),
    BIT(24), BIT(25), BIT(26), BIT(27),
    BIT(28), BIT(29), BIT(30), BIT(31),
};

void probing(void)
{
    unsigned int i, j;
    unsigned int buff[LIM];

    for (i = 0x1; i != 0; i++) {
        memset(buff, 0, sizeof(buff));

        for (j = 0; j < LIM; j++) {
            /* each bit should mapped into unique slot */
            if (++buff[(map[j] * i) >> 27] != 1)
                break;
        }

        if (j == LIM)
            break;
    }

    if (i == 0) {
        fprintf(stderr, "probe failed\n");
        return;
    }

    /* found, prepare and print the result */
    printf("%#010x:\n", i);

    for (j = 0; j < LIM; j++) {
        /* mapping from bitmap into index */
        buff[(map[j] * i) >> 27] = j;
    }

    for (j = 0; j < LIM; j++) {
        printf("%2u,", buff[j]);
        putchar(j % 8 != 7 ? ' ' : '\n');
    }
}

If you invoke probing(), it should generate following output:

0x04653adf:
 0,  1,  2,  6,  3, 11,  7, 16,
 4, 14, 12, 21,  8, 23, 17, 26,
31,  5, 10, 15, 13, 20, 22, 25,
30,  9, 19, 24, 29, 18, 28, 27,

Of which 0x04653adf is the minimal magic number, there are plenty of those magic numbers. 0x077cb531 is another case.

Based on this, we can make up our own version here:

static const int debj32[] = {
     0,  1,  2,  6,  3, 11,  7, 16,
     4, 14, 12, 21,  8, 23, 17, 26,
    31,  5, 10, 15, 13, 20, 22, 25,
    30,  9, 19, 24, 29, 18, 28, 27,
};

int lsb32(unsigned int x)
{
    if (x == 0)        /* zero with no bit set */
        return -1;
    x &= -x;
    x *= 0x04653adf;    /* minimal magic number */
    x >>= 27;
    return debj32[x];
}

inline int lsb32i(unsigned int x)
{
    return x ? debj32[((x & -x) * 0x04653adf) >> 27] : -1;
}

 

The magic thing is that it already adopted in Haskell hashtables package:

static uint8_t de_bruijn_table[] = {
    0,   1, 28,  2, 29, 14, 24,  3, 30, 22, 20, 15, 25, 17,  4,  8,
    31, 27, 13, 23, 21, 19, 16,  7, 26, 12, 18,  6, 11,  5, 10,  9
};

static inline int32_t mask(int32_t a, int32_t b) { return -(a == b); }

static inline int32_t first_bit_set(int32_t a) {
    int32_t zero_case = mask(0, a);
    uint32_t x = (uint32_t) (a & -a);
    x *= 0x077CB531;
    x >>= 27;
    return zero_case | de_bruijn_table[x];
}

Also its Haskell bits package:

instance Ranked Word32 where
  lsb n = fromIntegral $ go (unsafeShiftR ((n .&. (-n)) * 0x077CB531) 27) where
    go :: Word32 -> Word8
    go i = inlinePerformIO $ peekElemOff debruijn_lsb32 (fromIntegral i)
  {-# INLINE lsb #-}

And there’re some deBruijn tables you can use instantly. Although the number they used(i.e. 0x077CB531) is not optimal.

 

Note that lsb64() is extremely horrible, the magic number can be very huge. In this case, it’s 0x07edd5e59a4e28c2, most of the time it leads lsb64() to do the tedious multiplication.

One possible optimization can be mapping from lsb64() to lsb32().

When we get the lowest set bit(i.e. x & -x):

  • If x equals to zero, return -1 directly
  • Bit-and’ing x with 0xffffffff, if it’s not zero, just return lsb32().
  • Since lower 32-bit is zero, we just return 32 + lsb32(x >> 32).

It may makes ALU’s life easier.

 

Also, there’re several things you need to note about.

  • This solution isn’t efficient than you might think. Stephan Brumme already done the practical analysis for you.

  • It’s only applicable for finite range of data(of which each value must be unique).

  • Above case is perfect hashing(e.g. 32 possible results can be complete filled with 5 bits). If you want to mapping for, saying 10 known results, you must utilize 4 bits. Which left 6 unmapped slots(of those slots you can mark them up as invalid however)

  • If you using GCC, you can use __builtin_ffs() function to replace lsb32()

Built-in Function: int __builtin_ffs (int x): Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero.

  • The magic number 0x07edd5e59a4e28c2 for lsb64() clearly not the minimal one, the minimal is 0x0218a392cd3d5dbf. But what I’d like to tell you is that the brute-force algorithm isn’t applicable to probe it. If you get interested in how to probe for it, references at bottom might help you.

 

In case you might want to use 64-bit version of LSB, here is code snippet(though I think you can use lsb32() to implement lsb64()):

static const int debj64[] = {
     0,  1,  2,  7,  3, 13,  8, 19,
     4, 25, 14, 28,  9, 34, 20, 40,
     5, 17, 26, 38, 15, 46, 29, 48,
    10, 31, 35, 54, 21, 50, 41, 57,
    63,  6, 12, 18, 24, 27, 33, 39,
    16, 37, 45, 47, 30, 53, 49, 56,
    62, 11, 23, 32, 36, 44, 52, 55,
    61, 22, 43, 51, 60, 42, 59, 58,
};

/*
 * Assuming `unsigned long long' sized 8 bytes
 */
int lsb64(unsigned long long x)
{
    if (x == 0)
        return -1;
    x &= -x;
    x *= 0x0218a392cd3d5dbf;
    x >>= 58;
    return debj64[x];
}

inline int lsb64i(unsigned long long x)
{
    return x ? debj64[((x & -x) * 0x0218a392cd3d5dbf) >> 58] : -1;
}

static const int debj32[] = {
     0,  1,  2,  6,  3, 11,  7, 16,
     4, 14, 12, 21,  8, 23, 17, 26,
    31,  5, 10, 15, 13, 20, 22, 25,
    30,  9, 19, 24, 29, 18, 28, 27,
};

/*
 * This should be fast enough
 */
inline int lsb64q(unsigned long long x)
{
    if (x == 0) return -1;
    int i = 0;
    if (x >> 32) i = 32, x >>= 32;
    return i | debj32[((unsigned int) ((x & -x) * 0x04653adf)) >> 27];
}

 

Other magic numbers you may need

#define OPAQUE8 0x17u
static const int debj8[] = {
     0,  1,  2,  4,  7,  3,  6,  5,
};

/*
 * The universal LSB, reduced panicky mul-op.
 */
int lsb64u(uint64_t x)
{
    if (x == 0) return -1;
    int pad = 0;

    while (!(x & 0xffu))
        pad += 8, x >>= 8;

    return pad | debj8[((unsigned char) ((x & -x) * OPAQUE8)) >> 5];
}

/* Left for exercise */
#define OPAQUE16 0x09afu
static const int debj16[] = {
     0,  1,  2,  5,  3,  9,  6, 11,
    15,  4,  8, 10, 14,  7, 13, 12,
};

 

As mentioned before, this method can be categorized into prefect hashing. In case you got interested, GNU already provided a good tool gperf for generating perfect hashing function. Check that out!

Phew! Writing as second language isn’t easy for me, if you found anything wrong, please let me know :-P

 

In an incoming article, I will generalize this method for string hashing. which reduce time complexity to O(n) for small fixed-size string matching.

 

References

Find first set - Wikipedia

Bit Twiddling Hacks

神秘常量复出!用0x077CB531计算末尾0的个数

Position of lowest bit set

BitScan - ChessProgramming Wiki

Nicolaas de Bruijn - ChessProgramming Wiki

De Bruijn Sequence Generator - ChessProgramming Wiki

Haskell de Bruijn LSB table

List of first 8192 64-bit De Bruijn constants for LSB(Lowest Set Bit) - gist

List of first 4096 32-bit De Bruijn constants for LSB - gist