Package org.apache.orc.impl.reader
Class StripePlanner
java.lang.Object
org.apache.orc.impl.reader.StripePlanner
This class handles parsing the stripe information and handling the necessary
filtering and selection.
It supports:
- column projection
- row group selection
- encryption
-
Nested Class Summary
-
Constructor Summary
ConstructorDescriptionStripePlanner
(TypeDescription schema, ReaderEncryption encryption, DataReader dataReader, OrcFile.WriterVersion version, boolean ignoreNonUtf8BloomFilter, long maxBufferSize) StripePlanner
(TypeDescription schema, ReaderEncryption encryption, DataReader dataReader, OrcFile.WriterVersion version, boolean ignoreNonUtf8BloomFilter, long maxBufferSize, Set<Integer> filterColIds) Create a stripe parser. -
Method Summary
Modifier and TypeMethodDescriptionvoid
Release all of the buffers for the current stripe.getEncoding
(int column) getStream
(StreamName name) Get the stream for the given name.parseStripe
(StripeInformation stripe, boolean[] columnInclude) Parse a new stripe.readData
(OrcIndex index, boolean[] rowGroupInclude, boolean forceDirect, TypeReader.ReadPhase readPhase) Read the stripe data from the file.readFollowData
(OrcIndex index, boolean[] rowGroupInclude, int rgIdx, boolean forceDirect) readRowIndex
(boolean[] sargColumns, OrcIndex output) Read and parse the indexes for the current stripe.
-
Constructor Details
-
StripePlanner
public StripePlanner(TypeDescription schema, ReaderEncryption encryption, DataReader dataReader, OrcFile.WriterVersion version, boolean ignoreNonUtf8BloomFilter, long maxBufferSize, Set<Integer> filterColIds) Create a stripe parser.- Parameters:
schema
- the file schemaencryption
- the encryption informationdataReader
- the underlying data readerversion
- the file writer versionignoreNonUtf8BloomFilter
- ignore old non-utf8 bloom filtersmaxBufferSize
- the largest single buffer to usefilterColIds
- Column Ids that identify the filter columns
-
StripePlanner
public StripePlanner(TypeDescription schema, ReaderEncryption encryption, DataReader dataReader, OrcFile.WriterVersion version, boolean ignoreNonUtf8BloomFilter, long maxBufferSize) -
StripePlanner
-
-
Method Details
-
parseStripe
public StripePlanner parseStripe(StripeInformation stripe, boolean[] columnInclude) throws IOException Parse a new stripe. Resets the current stripe state.- Parameters:
stripe
- the new stripecolumnInclude
- an array with true for each column to read- Returns:
- this for method chaining
- Throws:
IOException
-
readData
public BufferChunkList readData(OrcIndex index, boolean[] rowGroupInclude, boolean forceDirect, TypeReader.ReadPhase readPhase) throws IOException Read the stripe data from the file.- Parameters:
index
- null for no row filters or the index for filteringrowGroupInclude
- null for all of the rows or an array with boolean for each row group in the current stripe.forceDirect
- should direct buffers be created?readPhase
- influences the columns that are read e.g. if readPhase = LEADERS then only the data required for FILTER columns is read- Returns:
- the buffers that were read
- Throws:
IOException
-
readFollowData
public BufferChunkList readFollowData(OrcIndex index, boolean[] rowGroupInclude, int rgIdx, boolean forceDirect) throws IOException - Throws:
IOException
-
getWriterTimezone
-
getStream
Get the stream for the given name. It is assumed that the name does not have the encryption set, because the TreeReader's don't know if they are reading encrypted data. Assumes that readData has already been called on this stripe.- Parameters:
name
- the column/kind of the stream- Returns:
- a new stream with the options set correctly
- Throws:
IOException
-
clearStreams
public void clearStreams()Release all of the buffers for the current stripe. -
getEncoding
-
readRowIndex
Read and parse the indexes for the current stripe.- Parameters:
sargColumns
- the columns we can use bloom filters foroutput
- an OrcIndex to reuse- Returns:
- the indexes for the required columns
- Throws:
IOException
-