Code Answer: Boolean[] vs. BitSet: Which is more efficient?

What is more efficient in terms of memory and CPU usage — an array of booleans or a BitSet? Specific BitSet methods are not used, only get/set/clear (==, =, Arrays.fill respectively for an array).

From stackoverflow

I believe that a BitSet is more memory- and CPU-efficient, is it can internally pack the bits into int, longs, or native data types, whereas a boolean[] requires a byte for each bit of data. Additionally, if you were to use the other methods (and, or, etc), you would find that the BitSet is more efficient, as there is no need to iterate through every element of an array; bitwise math is used instead.

EFraim : Memory efficient - probably true. CPU efficient - most certainly not. It is almost always less efficient to perform two bitwise operations (shift/and or shift/or) and up to two memory accesses (though most likely cached) than a single memory access on x86.

Jon Skeet : @EFraim: By reducing the amount of memory used you're increasing the chance of having everything in cache. Cache misses are very expensive. I wouldn't be at all surprised to see this factor make BitArray faster.

Ron : For example: a bitset would outperform boolean[] if the whole bitset fits in the cache, but not the boolean[], and random access were required.
Going from Java to CPU is totally VM specific. For instance, it used to be that a boolean was actually implemented as a 32-bit value (quite probably is true to this day).

Unless you know it is going to matter you are better off writing the code to be clear, profile it, and then fix the parts that are slow or consuming a lot of memory.

You can do this as you go. For instance I once decided to not call .intern() on Strings because when I ran the code in the profiler it slowed it down too much (despite using less memory).
From some benchmarks with Sun JDK 1.6 computing primes with a sieve (best of 10 iterations to warm up, give the JIT compiler a chance, and exclude random scheduling delays, Core 2 Duo T5600 1.83GHz):

BitSet is more memory efficient than boolean[] except for very small sizes. Each boolean in the array takes a byte. The numbers from runtime.freeMemory() are a bit muddled for BitSet, but less.

boolean[] is more CPU efficient except for very large sizes, where they are about even. E.g., for size 1 million boolean[] is about four times faster (e.g. 6ms vs 27ms), for ten and a hundred million they are about even.

basszero : Can you post your test?

basszero : I suspect that some of the BitSet style operations (and, or, not) are faster as BitSet instead of array. Worth noting which operations are better. The title is going to mislead everyone into never using a BitSet again

starblue : The test doesn't use set operations, and is biased towards writing.
It depends as always. Yes BitSet is more memory efficent, but as soon as you require multithreaded access boolean[] might be the better choice. For example for computing primes you only set the boolean to true and therefore you don't really need synchronization. Hans Boehm has written some paper about this and the same technique can be used for marking nodes in graph.

Randolpho : provided that your boolean array doesn't grow, that'd certainly be better for concurrent use.
- Boolean[] uses about 4-20 bytes per boolean value.
- boolean[] uses about 1 byte per boolean value.
- BitSet uses about 1 bit per boolean value.
Memory size might not be an issue for you in which case boolean[] might be simpler to code.
A bit left-field of your question, but if storage is a concern you may want to look into Huffman compression. For example, 00000001 could be squeezed down by frequency to something equivalent to {(7)0, (1)1}. A more "randomized" string 00111010 would require a more complex representation, e.g. {(2)0, (3)1, (1)0, (1)1, (1)0}, and take up more space. Depending on the structure of your bit data, you may get some storage benefit from its use, beyond BitSet.

Code Answer

Monday, April 25, 2011

Boolean[] vs. BitSet: Which is more efficient?

0 comments:

Post a Comment

Blog Archive