Package org.apache.orc.impl
Class RecordReaderImpl
java.lang.Object
org.apache.orc.impl.RecordReaderImpl
- All Implemented Interfaces:
Closeable,AutoCloseable,RecordReader
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classstatic classstatic final class -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final OrcProto.ColumnStatisticsprotected final Pathprotected final TypeDescription -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedRecordReaderImpl(ReaderImpl fileReader, Reader.Options options) -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Release the resources associated with the given reader.static StringencodeTranslatedSargColumn(int rootColumn, Integer indexInSourceTable) static org.apache.hadoop.hive.ql.io.sarg.SearchArgument.TruthValueevaluatePredicate(ColumnStatistics stats, org.apache.hadoop.hive.ql.io.sarg.PredicateLeaf predicate, BloomFilter bloomFilter) Evaluate a predicate with respect to the statistics from the column that is referenced in the predicate.static org.apache.hadoop.hive.ql.io.sarg.SearchArgument.TruthValueevaluatePredicate(ColumnStatistics stats, org.apache.hadoop.hive.ql.io.sarg.PredicateLeaf predicate, BloomFilter bloomFilter, boolean useUTCTimestamp) Evaluate a predicate with respect to the statistics from the column that is referenced in the predicate.intfloatReturn the fraction of rows that have been read from the selected.longGet the row number of the row that will be returned by the following call to next().static int[]mapSargColumnsToOrcInternalColIdx(List<org.apache.hadoop.hive.ql.io.sarg.PredicateLeaf> sargLeaves, SchemaEvolution evolution) Find the mapping from predicate leaves to columns.static int[]mapTranslatedSargColumns(List<OrcProto.Type> types, List<org.apache.hadoop.hive.ql.io.sarg.PredicateLeaf> sargLeaves) booleannextBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch) Read the next row batch.protected boolean[]Pick the row groups that we need to load from the current stripe.readRowIndex(int stripeIndex, boolean[] included, boolean[] readCols) readStripeFooter(StripeInformation stripe) voidseekToRow(long rowNumber) Seek to a particular row number.
-
Field Details
-
EMPTY_COLUMN_STATISTICS
-
path
-
schema
-
-
Constructor Details
-
RecordReaderImpl
- Throws:
IOException
-
-
Method Details
-
mapSargColumnsToOrcInternalColIdx
public static int[] mapSargColumnsToOrcInternalColIdx(List<org.apache.hadoop.hive.ql.io.sarg.PredicateLeaf> sargLeaves, SchemaEvolution evolution) Find the mapping from predicate leaves to columns.- Parameters:
sargLeaves- the search argument that we need to mapevolution- the mapping from reader to file schema- Returns:
- an array mapping the sarg leaves to concrete column numbers in the file
-
evaluatePredicate
public static org.apache.hadoop.hive.ql.io.sarg.SearchArgument.TruthValue evaluatePredicate(ColumnStatistics stats, org.apache.hadoop.hive.ql.io.sarg.PredicateLeaf predicate, BloomFilter bloomFilter) Evaluate a predicate with respect to the statistics from the column that is referenced in the predicate.- Parameters:
stats- the statistics for the column mentioned in the predicatepredicate- the leaf predicate we need to evaluation- Returns:
- the set of truth values that may be returned for the given predicate.
-
evaluatePredicate
public static org.apache.hadoop.hive.ql.io.sarg.SearchArgument.TruthValue evaluatePredicate(ColumnStatistics stats, org.apache.hadoop.hive.ql.io.sarg.PredicateLeaf predicate, BloomFilter bloomFilter, boolean useUTCTimestamp) Evaluate a predicate with respect to the statistics from the column that is referenced in the predicate. Includes option to specify if timestamp column stats values should be in UTC.- Parameters:
stats- the statistics for the column mentioned in the predicatepredicate- the leaf predicate we need to evaluationbloomFilter-useUTCTimestamp-- Returns:
- the set of truth values that may be returned for the given predicate.
-
pickRowGroups
Pick the row groups that we need to load from the current stripe.- Returns:
- an array with a boolean for each row group or null if all of the row groups must be read.
- Throws:
IOException
-
nextBatch
public boolean nextBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch) throws IOException Description copied from interface:RecordReaderRead the next row batch. The size of the batch to read cannot be controlled by the callers. Caller need to look at VectorizedRowBatch.size of the returned object to know the batch size read.- Specified by:
nextBatchin interfaceRecordReader- Parameters:
batch- a row batch object to read into- Returns:
- were more rows available to read?
- Throws:
IOException
-
close
Description copied from interface:RecordReaderRelease the resources associated with the given reader.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein interfaceRecordReader- Throws:
IOException
-
getRowNumber
public long getRowNumber()Description copied from interface:RecordReaderGet the row number of the row that will be returned by the following call to next().- Specified by:
getRowNumberin interfaceRecordReader- Returns:
- the row number from 0 to the number of rows in the file
-
getProgress
public float getProgress()Return the fraction of rows that have been read from the selected. section of the file- Specified by:
getProgressin interfaceRecordReader- Returns:
- fraction between 0.0 and 1.0 of rows consumed
-
readRowIndex
public OrcIndex readRowIndex(int stripeIndex, boolean[] included, boolean[] readCols) throws IOException - Throws:
IOException
-
seekToRow
Description copied from interface:RecordReaderSeek to a particular row number.- Specified by:
seekToRowin interfaceRecordReader- Throws:
IOException
-
encodeTranslatedSargColumn
-
mapTranslatedSargColumns
public static int[] mapTranslatedSargColumns(List<OrcProto.Type> types, List<org.apache.hadoop.hive.ql.io.sarg.PredicateLeaf> sargLeaves) -
getCompressionCodec
-
getMaxDiskRangeChunkLimit
public int getMaxDiskRangeChunkLimit()
-