Class StripePlanner

java.lang.Object
org.apache.orc.impl.reader.StripePlanner

public class StripePlanner extends Object
This class handles parsing the stripe information and handling the necessary filtering and selection.

It supports:

  • column projection
  • row group selection
  • encryption
  • Constructor Details

    • StripePlanner

      public StripePlanner(TypeDescription schema, ReaderEncryption encryption, DataReader dataReader, OrcFile.WriterVersion version, boolean ignoreNonUtf8BloomFilter, long maxBufferSize, Set<Integer> filterColIds)
      Create a stripe parser.
      Parameters:
      schema - the file schema
      encryption - the encryption information
      dataReader - the underlying data reader
      version - the file writer version
      ignoreNonUtf8BloomFilter - ignore old non-utf8 bloom filters
      maxBufferSize - the largest single buffer to use
      filterColIds - Column Ids that identify the filter columns
    • StripePlanner

      public StripePlanner(TypeDescription schema, ReaderEncryption encryption, DataReader dataReader, OrcFile.WriterVersion version, boolean ignoreNonUtf8BloomFilter, long maxBufferSize)
    • StripePlanner

      public StripePlanner(StripePlanner old)
  • Method Details

    • parseStripe

      public StripePlanner parseStripe(StripeInformation stripe, boolean[] columnInclude) throws IOException
      Parse a new stripe. Resets the current stripe state.
      Parameters:
      stripe - the new stripe
      columnInclude - an array with true for each column to read
      Returns:
      this for method chaining
      Throws:
      IOException
    • readData

      public BufferChunkList readData(OrcIndex index, boolean[] rowGroupInclude, boolean forceDirect, TypeReader.ReadPhase readPhase) throws IOException
      Read the stripe data from the file.
      Parameters:
      index - null for no row filters or the index for filtering
      rowGroupInclude - null for all of the rows or an array with boolean for each row group in the current stripe.
      forceDirect - should direct buffers be created?
      readPhase - influences the columns that are read e.g. if readPhase = LEADERS then only the data required for FILTER columns is read
      Returns:
      the buffers that were read
      Throws:
      IOException
    • readFollowData

      public BufferChunkList readFollowData(OrcIndex index, boolean[] rowGroupInclude, int rgIdx, boolean forceDirect) throws IOException
      Throws:
      IOException
    • getWriterTimezone

      public String getWriterTimezone()
    • getStream

      public InStream getStream(StreamName name) throws IOException
      Get the stream for the given name. It is assumed that the name does not have the encryption set, because the TreeReader's don't know if they are reading encrypted data. Assumes that readData has already been called on this stripe.
      Parameters:
      name - the column/kind of the stream
      Returns:
      a new stream with the options set correctly
      Throws:
      IOException
    • clearStreams

      public void clearStreams()
      Release all of the buffers for the current stripe.
    • getEncoding

      public OrcProto.ColumnEncoding getEncoding(int column)
    • readRowIndex

      public OrcIndex readRowIndex(boolean[] sargColumns, OrcIndex output) throws IOException
      Read and parse the indexes for the current stripe.
      Parameters:
      sargColumns - the columns we can use bloom filters for
      output - an OrcIndex to reuse
      Returns:
      the indexes for the required columns
      Throws:
      IOException