Package org.apache.orc.impl
Class RunLengthIntegerWriterV2
java.lang.Object
org.apache.orc.impl.RunLengthIntegerWriterV2
- All Implemented Interfaces:
IntegerWriter
A writer that performs light weight compression over sequence of integers.
There are four types of lightweight integer compression
- SHORT_REPEAT
- DIRECT
- PATCHED_BASE
- DELTA
The description and format for these types are as below: SHORT_REPEAT: Used for short repeated integer sequences.
- 1 byte header
- 2 bits for encoding type
- 3 bits for bytes required for repeating value
- 3 bits for repeat count (MIN_REPEAT + run length)
- Blob - repeat value (fixed bytes)
DIRECT: Used for random integer sequences whose number of bit requirement doesn't vary a lot.
- 2 byte header (1st byte)
- 2 bits for encoding type
- 5 bits for fixed bit width of values in blob
- 1 bit for storing MSB of run length
- 2nd byte
- 8 bits for lower run length bits
- Blob - stores the direct values using fixed bit width. The length of the data blob is (fixed width * run length) bits long
PATCHED_BASE: Used for random integer sequences whose number of bit requirement varies beyond a threshold.
- 4 bytes header (1st byte)
- 2 bits for encoding type
- 5 bits for fixed bit width of values in blob
- 1 bit for storing MSB of run length
- 2nd byte
- 8 bits for lower run length bits
- 3rd byte
- 3 bits for bytes required to encode base value
- 5 bits for patch width
- 4th byte
- 3 bits for patch gap width
- 5 bits for patch length
- Base value - Stored using fixed number of bytes. If MSB is set, base value is negative else positive. Length of base value is (base width * 8) bits.
- Data blob - Base reduced values as stored using fixed bit width. Length of data blob is (fixed width * run length) bits.
- Patch blob - Patch blob is a list of gap and patch value. Each entry in the patch list is (patch width + patch gap width) bits long. Gap between the subsequent elements to be patched are stored in upper part of entry whereas patch values are stored in lower part of entry. Length of patch blob is ((patch width + patch gap width) * patch length) bits.
DELTA Used for monotonically increasing or decreasing sequences, sequences with fixed delta values or long repeated sequences.
- 2 bytes header (1st byte)
- 2 bits for encoding type
- 5 bits for fixed bit width of values in blob
- 1 bit for storing MSB of run length
- 2nd byte
- 8 bits for lower run length bits
- Base value - zigzag encoded value written as varint
- Delta base - zigzag encoded value written as varint
- Delta blob - only positive values. monotonicity and orderness are decided based on the sign of the base value and delta base
-
Nested Class Summary
-
Constructor Summary
ConstructorDescriptionRunLengthIntegerWriterV2
(PositionedOutputStream output, boolean signed, boolean alignedBitpacking) -
Method Summary
Modifier and TypeMethodDescriptionvoid
long
Estimate the amount of memory being used.void
flush()
Flush the buffervoid
getPosition
(PositionRecorder recorder) Get position from the stream.void
write
(long val) Write the integer value
-
Constructor Details
-
RunLengthIntegerWriterV2
public RunLengthIntegerWriterV2(PositionedOutputStream output, boolean signed, boolean alignedBitpacking)
-
-
Method Details
-
flush
Description copied from interface:IntegerWriter
Flush the buffer- Specified by:
flush
in interfaceIntegerWriter
- Throws:
IOException
-
write
Description copied from interface:IntegerWriter
Write the integer value- Specified by:
write
in interfaceIntegerWriter
- Throws:
IOException
-
getPosition
Description copied from interface:IntegerWriter
Get position from the stream.- Specified by:
getPosition
in interfaceIntegerWriter
- Throws:
IOException
-
estimateMemory
public long estimateMemory()Description copied from interface:IntegerWriter
Estimate the amount of memory being used.- Specified by:
estimateMemory
in interfaceIntegerWriter
- Returns:
- number of bytes
-
changeIv
- Specified by:
changeIv
in interfaceIntegerWriter
-