Class Reader.Options

java.lang.Object
org.apache.orc.Reader.Options
All Implemented Interfaces:
Cloneable
Enclosing interface:
Reader

public static class Reader.Options extends Object implements Cloneable
Options for creating a RecordReader.
Since:
1.1.0
  • Constructor Details

    • Options

      public Options()
      Since:
      1.1.0
    • Options

      public Options(Configuration conf)
      Since:
      1.1.0
  • Method Details

    • include

      public Reader.Options include(boolean[] include)
      Set the list of columns to read.
      Parameters:
      include - a list of columns to read
      Returns:
      this
      Since:
      1.1.0
    • range

      public Reader.Options range(long offset, long length)
      Set the range of bytes to read
      Parameters:
      offset - the starting byte offset
      length - the number of bytes to read
      Returns:
      this
      Since:
      1.1.0
    • schema

      public Reader.Options schema(TypeDescription schema)
      Set the schema on read type description.
      Since:
      1.1.0
    • setRowFilter

      public Reader.Options setRowFilter(String[] filterColumnNames, Consumer<OrcFilterContext> filterCallback)
      Set a row level filter. This is an advanced feature that allows the caller to specify a list of columns that are read first and then a filter that is called to determine which rows if any should be read. User should expect the batches that come from the reader to use the selected array set by their filter. Use cases for this are predicates that SearchArgs can't represent, such as relationships between columns (eg. columnA == columnB).
      Parameters:
      filterColumnNames - a comma separated list of the column names that are read before the filter is applied. Only top level columns in the reader's schema can be used here and they must not be duplicated.
      filterCallback - a function callback to perform filtering during the call to RecordReader.nextBatch. This function should not reference any static fields nor modify the passed in ColumnVectors but should set the filter output using the selected array.
      Returns:
      this
      Since:
      1.7.0
    • searchArgument

      public Reader.Options searchArgument(org.apache.hadoop.hive.ql.io.sarg.SearchArgument sarg, String[] columnNames)
      Set search argument for predicate push down.
      Parameters:
      sarg - the search argument
      columnNames - the column names for
      Returns:
      this
      Since:
      1.1.0
    • allowSARGToFilter

      public Reader.Options allowSARGToFilter(boolean allowSARGToFilter)
      Set allowSARGToFilter.
      Parameters:
      allowSARGToFilter -
      Returns:
      this
      Since:
      1.7.0
    • isAllowSARGToFilter

      public boolean isAllowSARGToFilter()
      Get allowSARGToFilter value.
      Returns:
      allowSARGToFilter
      Since:
      1.7.0
    • useZeroCopy

      public Reader.Options useZeroCopy(boolean value)
      Set whether to use zero copy from HDFS.
      Parameters:
      value - the new zero copy flag
      Returns:
      this
      Since:
      1.1.0
    • dataReader

      public Reader.Options dataReader(DataReader value)
      Set dataReader.
      Parameters:
      value - the new dataReader.
      Returns:
      this
      Since:
      1.1.0
    • skipCorruptRecords

      public Reader.Options skipCorruptRecords(boolean value)
      Set whether to skip corrupt records.
      Parameters:
      value - the new skip corrupt records flag
      Returns:
      this
      Since:
      1.1.0
    • tolerateMissingSchema

      public Reader.Options tolerateMissingSchema(boolean value)
      Set whether to make a best effort to tolerate schema evolution for files which do not have an embedded schema because they were written with a' pre-HIVE-4243 writer.
      Parameters:
      value - the new tolerance flag
      Returns:
      this
      Since:
      1.2.0
    • forcePositionalEvolution

      public Reader.Options forcePositionalEvolution(boolean value)
      Set whether to force schema evolution to be positional instead of based on the column names.
      Parameters:
      value - force positional evolution
      Returns:
      this
      Since:
      1.3.0
    • positionalEvolutionLevel

      public Reader.Options positionalEvolutionLevel(int value)
      Set number of levels to force schema evolution to be positional instead of based on the column names.
      Parameters:
      value - number of levels of positional schema evolution
      Returns:
      this
      Since:
      1.5.11
    • isSchemaEvolutionCaseAware

      public Reader.Options isSchemaEvolutionCaseAware(boolean value)
      Set boolean flag to determine if the comparison of field names in schema evolution is case sensitive
      Parameters:
      value - the flag for schema evolution is case sensitive or not.
      Returns:
      this
      Since:
      1.5.0
    • includeAcidColumns

      public Reader.Options includeAcidColumns(boolean includeAcidColumns)
      true if acid metadata columns should be decoded otherwise they will be set to null.
      Since:
      1.5.3
    • getInclude

      public boolean[] getInclude()
      Since:
      1.1.0
    • getOffset

      public long getOffset()
      Since:
      1.1.0
    • getLength

      public long getLength()
      Since:
      1.1.0
    • getSchema

      public TypeDescription getSchema()
      Since:
      1.1.0
    • getSearchArgument

      public org.apache.hadoop.hive.ql.io.sarg.SearchArgument getSearchArgument()
      Since:
      1.1.0
    • getFilterCallback

      public Consumer<OrcFilterContext> getFilterCallback()
      Since:
      1.7.0
    • getPreFilterColumnNames

      public String[] getPreFilterColumnNames()
      Since:
      1.7.0
    • getColumnNames

      public String[] getColumnNames()
      Since:
      1.1.0
    • getMaxOffset

      public long getMaxOffset()
      Since:
      1.1.0
    • getUseZeroCopy

      public Boolean getUseZeroCopy()
      Since:
      1.1.0
    • getSkipCorruptRecords

      public Boolean getSkipCorruptRecords()
      Since:
      1.1.0
    • getDataReader

      public DataReader getDataReader()
      Since:
      1.1.0
    • getForcePositionalEvolution

      public boolean getForcePositionalEvolution()
      Since:
      1.3.0
    • getPositionalEvolutionLevel

      public int getPositionalEvolutionLevel()
      Since:
      1.5.11
    • getIsSchemaEvolutionCaseAware

      public boolean getIsSchemaEvolutionCaseAware()
      Since:
      1.5.0
    • getIncludeAcidColumns

      public boolean getIncludeAcidColumns()
      Since:
      1.5.3
    • clone

      public Reader.Options clone()
      Overrides:
      clone in class Object
      Since:
      1.1.0
    • toString

      public String toString()
      Overrides:
      toString in class Object
      Since:
      1.1.0
    • getTolerateMissingSchema

      public boolean getTolerateMissingSchema()
      Since:
      1.2.0
    • useSelected

      public boolean useSelected()
      Since:
      1.7.0
    • useSelected

      public Reader.Options useSelected(boolean newValue)
      Since:
      1.7.0
    • allowPluginFilters

      public boolean allowPluginFilters()
    • allowPluginFilters

      public Reader.Options allowPluginFilters(boolean allowPluginFilters)
    • pluginAllowListFilters

      public List<String> pluginAllowListFilters()
    • pluginAllowListFilters

      public Reader.Options pluginAllowListFilters(String... allowLists)
    • minSeekSize

      public int minSeekSize()
      Since:
      1.8.0
    • minSeekSize

      public Reader.Options minSeekSize(int minSeekSize)
      Since:
      1.8.0
    • minSeekSizeTolerance

      public double minSeekSizeTolerance()
      Since:
      1.8.0
    • minSeekSizeTolerance

      public Reader.Options minSeekSizeTolerance(double value)
      Since:
      1.8.0
    • getRowBatchSize

      public int getRowBatchSize()
      Since:
      1.9.0
    • rowBatchSize

      public Reader.Options rowBatchSize(int value)
      Since:
      1.9.0