Class BloomFilter

java.lang.Object
org.apache.orc.util.BloomFilter
Direct Known Subclasses:
BloomFilterUtf8

public class BloomFilter extends Object
BloomFilter is a probabilistic data structure for set membership check. BloomFilters are highly space efficient when compared to using a HashSet. Because of the probabilistic nature of bloom filter false positive (element not present in bloom filter but test() says true) are possible but false negatives are not possible (if element is present then test() will never say false). The false positive probability is configurable (default: 5%) depending on which storage requirement may increase or decrease. Lower the false positive probability greater is the space requirement. Bloom filters are sensitive to number of elements that will be inserted in the bloom filter. During the creation of bloom filter expected number of entries must be specified. If the number of insertions exceed the specified initial number of entries then false positive probability will increase accordingly.

Internally, this implementation of bloom filter uses Murmur3 fast non-cryptographic hash algorithm. Although Murmur2 is slightly faster than Murmur3 in Java, it suffers from hash collisions for specific sequence of repeating bytes. Check the following link for more info https://code.google.com/p/smhasher/wiki/MurmurHash2Flaw

Note that this class is here for backwards compatibility, because it uses the JVM default character set for strings. All new users should BloomFilterUtf8, which always uses UTF8 for the encoding.

  • Field Details

  • Constructor Details

    • BloomFilter

      public BloomFilter(long expectedEntries)
    • BloomFilter

      public BloomFilter(long expectedEntries, double fpp)
    • BloomFilter

      public BloomFilter(long[] bits, int numFuncs)
      A constructor to support rebuilding the BloomFilter from a serialized representation.
      Parameters:
      bits - the serialized bits
      numFuncs - the number of functions used
  • Method Details

    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • add

      public void add(byte[] val)
    • addBytes

      public void addBytes(byte[] val, int offset, int length)
    • addString

      public void addString(String val)
    • addLong

      public void addLong(long val)
    • addDouble

      public void addDouble(double val)
    • test

      public boolean test(byte[] val)
    • testBytes

      public boolean testBytes(byte[] val, int offset, int length)
    • testString

      public boolean testString(String val)
    • testLong

      public boolean testLong(long val)
    • testDouble

      public boolean testDouble(double val)
    • sizeInBytes

      public long sizeInBytes()
    • getBitSize

      public int getBitSize()
    • getNumHashFunctions

      public int getNumHashFunctions()
    • getBitSet

      public long[] getBitSet()
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • merge

      public void merge(BloomFilter that)
      Merge the specified bloom filter with current bloom filter.
      Parameters:
      that - - bloom filter to merge
    • reset

      public void reset()