Package org.apache.orc.util
Class BloomFilter
java.lang.Object
org.apache.orc.util.BloomFilter
- Direct Known Subclasses:
BloomFilterUtf8
BloomFilter is a probabilistic data structure for set membership check. BloomFilters are
highly space efficient when compared to using a HashSet. Because of the probabilistic nature of
bloom filter false positive (element not present in bloom filter but test() says true) are
possible but false negatives are not possible (if element is present then test() will never
say false). The false positive probability is configurable (default: 5%) depending on which
storage requirement may increase or decrease. Lower the false positive probability greater
is the space requirement.
Bloom filters are sensitive to number of elements that will be inserted in the bloom filter.
During the creation of bloom filter expected number of entries must be specified. If the number
of insertions exceed the specified initial number of entries then false positive probability will
increase accordingly.
Internally, this implementation of bloom filter uses Murmur3 fast non-cryptographic hash algorithm. Although Murmur2 is slightly faster than Murmur3 in Java, it suffers from hash collisions for specific sequence of repeating bytes. Check the following link for more info https://code.google.com/p/smhasher/wiki/MurmurHash2Flaw
Note that this class is here for backwards compatibility, because it uses the JVM default character set for strings. All new users should BloomFilterUtf8, which always uses UTF8 for the encoding.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Bare metal bit set implementation. -
Field Summary
-
Constructor Summary
ConstructorDescriptionBloomFilter
(long expectedEntries) BloomFilter
(long[] bits, int numFuncs) A constructor to support rebuilding the BloomFilter from a serialized representation.BloomFilter
(long expectedEntries, double fpp) -
Method Summary
Modifier and TypeMethodDescriptionvoid
add
(byte[] val) void
addBytes
(byte[] val, int offset, int length) void
addDouble
(double val) void
addLong
(long val) void
boolean
long[]
int
int
int
hashCode()
void
merge
(BloomFilter that) Merge the specified bloom filter with current bloom filter.void
reset()
long
boolean
test
(byte[] val) boolean
testBytes
(byte[] val, int offset, int length) boolean
testDouble
(double val) boolean
testLong
(long val) boolean
testString
(String val) toString()
-
Field Details
-
DEFAULT_FPP
public static final double DEFAULT_FPP- See Also:
-
-
Constructor Details
-
BloomFilter
public BloomFilter(long expectedEntries) -
BloomFilter
public BloomFilter(long expectedEntries, double fpp) -
BloomFilter
public BloomFilter(long[] bits, int numFuncs) A constructor to support rebuilding the BloomFilter from a serialized representation.- Parameters:
bits
- the serialized bitsnumFuncs
- the number of functions used
-
-
Method Details
-
equals
-
hashCode
public int hashCode() -
add
public void add(byte[] val) -
addBytes
public void addBytes(byte[] val, int offset, int length) -
addString
-
addLong
public void addLong(long val) -
addDouble
public void addDouble(double val) -
test
public boolean test(byte[] val) -
testBytes
public boolean testBytes(byte[] val, int offset, int length) -
testString
-
testLong
public boolean testLong(long val) -
testDouble
public boolean testDouble(double val) -
sizeInBytes
public long sizeInBytes() -
getBitSize
public int getBitSize() -
getNumHashFunctions
public int getNumHashFunctions() -
getBitSet
public long[] getBitSet() -
toString
-
merge
Merge the specified bloom filter with current bloom filter.- Parameters:
that
- - bloom filter to merge
-
reset
public void reset()
-