Package org.apache.orc.impl
Class ReaderImpl
java.lang.Object
org.apache.orc.impl.ReaderImpl
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Reader
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.orc.Reader
Reader.Options
-
Field Summary
Modifier and TypeFieldDescriptionprotected int
protected final CompressionKind
protected final Configuration
static final int
protected FSDataInputStream
protected final OrcFile.ReaderOptions
protected final Path
protected final int
protected List<OrcProto.StripeStatistics>
protected final OrcTail
protected final List<OrcProto.Type>
protected final boolean
-
Constructor Summary
ConstructorDescriptionReaderImpl
(Path path, OrcFile.ReaderOptions options) Constructor that let's the user specify additional options. -
Method Summary
Modifier and TypeMethodDescriptionprotected static void
checkOrcVersion
(Path path, OrcProto.PostScript postscript) Check to see if this ORC file is from a future version and if so, warn the user that we may not be able to read all of the column encodings.void
close()
deserializeStats
(TypeDescription schema, List<OrcProto.ColumnStatistics> fileStats) protected static void
ensureOrcFooter
(ByteBuffer buffer, int psLen) Deprecated.protected static void
ensureOrcFooter
(FSDataInputStream in, Path path, int psLen, ByteBuffer buffer) Ensure this is an ORC file to prevent users from trying to read text files or RC files as ORC files.static OrcTail
extractFileTail
(ByteBuffer buffer) Deprecated.UseextractFileTail(FileSystem, Path, long)
instead.static OrcTail
extractFileTail
(ByteBuffer buffer, long fileLen, long modificationTime) Deprecated.UseextractFileTail(FileSystem, Path, long)
instead.protected OrcTail
extractFileTail
(FileSystem fs, Path path, long maxFileLength) static OrcProto.Metadata
extractMetadata
(ByteBuffer bb, int metadataAbsPos, int metadataSize, InStream.StreamOptions options) Get the list of encryption keys for column encryption.static int
getCompressionBlockSize
(OrcProto.PostScript postScript) Read compression block size from the postscript if it is set; otherwise, use the same 256k default the C++ implementation uses.Get the compression kind.int
Get the buffer size for the compression.long
Get the length of the file.boolean
Should the returned values use the proleptic Gregorian calendar?Get the data masks for the unencrypted variant of the data.Internal access to our view of the encryption.Get the list of encryption variants for the data.protected FileSystem
protected Supplier<FileSystem>
Get the file tail (footer + postscript)Get the file format version.static OrcFile.Version
getFileVersion
(List<Integer> versionList) Get the user metadata keys.int
getMetadataValue
(String key) Get a user metadata value.long
Get the number of rows in the file.long
Get the deserialized data size of the filelong
getRawDataSizeFromColIndices
(List<Integer> colIndices) Get the deserialized data size of the specified columns idsstatic long
getRawDataSizeFromColIndices
(List<Integer> colIndices, List<OrcProto.Type> types, List<OrcProto.ColumnStatistics> stats) long
getRawDataSizeOfColumns
(List<String> colNames) Get the deserialized data size of the specified columnsint
Get the number of rows per a entry in the row index.Get the type of rows in this ORC file.Get the implementation and version of the software that wrote the file.Get the statistics about the columns in the file.Get the list of stripes.Get the stripe statistics for all of the columns.getStripeStatistics
(boolean[] included) Get the stripe statistics from the file.getTypes()
Get the list of types contained in the file.Get the stripe statistics for a given variant.Get the version of the writer of this file.static OrcFile.WriterVersion
getWriterVersion
(int writerVersion) Get the WriterVersion based on the ORC file postscript.boolean
hasMetadataValue
(String key) Did the user set the given metadata value.options()
Create a default options object that can be customized for creating a RecordReader.rows()
Create a RecordReader that reads everything with the default options.rows
(Reader.Options options) Create a RecordReader that uses the options given.takeFile()
Take the file from the reader.toString()
boolean
Was the file written using the proleptic Gregorian calendar.
-
Field Details
-
DEFAULT_COMPRESSION_BLOCK_SIZE
public static final int DEFAULT_COMPRESSION_BLOCK_SIZE- See Also:
-
path
-
options
-
compressionKind
-
file
-
bufferSize
protected int bufferSize -
stripeStatistics
-
types
-
rowIndexStride
protected final int rowIndexStride -
conf
-
useUTCTimestamp
protected final boolean useUTCTimestamp -
tail
-
-
Constructor Details
-
ReaderImpl
Constructor that let's the user specify additional options.- Parameters:
path
- pathname for fileoptions
- options for reading- Throws:
IOException
-
-
Method Details
-
getNumberOfRows
public long getNumberOfRows()Description copied from interface:Reader
Get the number of rows in the file.- Specified by:
getNumberOfRows
in interfaceReader
- Returns:
- the number of rows
-
getMetadataKeys
Description copied from interface:Reader
Get the user metadata keys.- Specified by:
getMetadataKeys
in interfaceReader
- Returns:
- the set of metadata keys
-
getMetadataValue
Description copied from interface:Reader
Get a user metadata value.- Specified by:
getMetadataValue
in interfaceReader
- Parameters:
key
- a key given by the user- Returns:
- the bytes associated with the given key
-
hasMetadataValue
Description copied from interface:Reader
Did the user set the given metadata value.- Specified by:
hasMetadataValue
in interfaceReader
- Parameters:
key
- the key to check- Returns:
- true if the metadata value was set
-
getCompressionKind
Description copied from interface:Reader
Get the compression kind.- Specified by:
getCompressionKind
in interfaceReader
- Returns:
- the kind of compression in the file
-
getCompressionSize
public int getCompressionSize()Description copied from interface:Reader
Get the buffer size for the compression.- Specified by:
getCompressionSize
in interfaceReader
- Returns:
- number of bytes to buffer for the compression codec.
-
getStripes
Description copied from interface:Reader
Get the list of stripes.- Specified by:
getStripes
in interfaceReader
- Returns:
- the information about the stripes in order
-
getContentLength
public long getContentLength()Description copied from interface:Reader
Get the length of the file.- Specified by:
getContentLength
in interfaceReader
- Returns:
- the number of bytes in the file
-
getTypes
Description copied from interface:Reader
Get the list of types contained in the file. The root type is the first type in the list. -
getFileVersion
-
getFileVersion
Description copied from interface:Reader
Get the file format version.- Specified by:
getFileVersion
in interfaceReader
-
getWriterVersion
Description copied from interface:Reader
Get the version of the writer of this file.- Specified by:
getWriterVersion
in interfaceReader
-
getSoftwareVersion
Description copied from interface:Reader
Get the implementation and version of the software that wrote the file. It defaults to "ORC Java" for old files. For current files, we include the version also.- Specified by:
getSoftwareVersion
in interfaceReader
- Returns:
- returns the writer implementation and hopefully the version of the software
-
getFileTail
Description copied from interface:Reader
Get the file tail (footer + postscript)- Specified by:
getFileTail
in interfaceReader
- Returns:
- - file tail
-
getColumnEncryptionKeys
Description copied from interface:Reader
Get the list of encryption keys for column encryption.- Specified by:
getColumnEncryptionKeys
in interfaceReader
- Returns:
- the set of encryption keys
-
getDataMasks
Description copied from interface:Reader
Get the data masks for the unencrypted variant of the data.- Specified by:
getDataMasks
in interfaceReader
- Returns:
- the lists of data masks
-
getEncryptionVariants
Description copied from interface:Reader
Get the list of encryption variants for the data.- Specified by:
getEncryptionVariants
in interfaceReader
-
getVariantStripeStatistics
public List<StripeStatistics> getVariantStripeStatistics(EncryptionVariant variant) throws IOException Description copied from interface:Reader
Get the stripe statistics for a given variant. The StripeStatistics will have 1 entry for each column in the variant. This enables the user to get the stripe statistics for each variant regardless of which keys are available.- Specified by:
getVariantStripeStatistics
in interfaceReader
- Parameters:
variant
- the encryption variant or null for unencrypted- Returns:
- a list of stripe statistics (one per a stripe)
- Throws:
IOException
- if the required key is not available
-
getEncryption
Internal access to our view of the encryption.- Returns:
- the encryption information for this reader.
-
getRowIndexStride
public int getRowIndexStride()Description copied from interface:Reader
Get the number of rows per a entry in the row index.- Specified by:
getRowIndexStride
in interfaceReader
- Returns:
- the number of rows per an entry in the row index or 0 if there is no row index.
-
getStatistics
Description copied from interface:Reader
Get the statistics about the columns in the file.- Specified by:
getStatistics
in interfaceReader
- Returns:
- the information about the column
-
deserializeStats
public ColumnStatistics[] deserializeStats(TypeDescription schema, List<OrcProto.ColumnStatistics> fileStats) -
getSchema
Description copied from interface:Reader
Get the type of rows in this ORC file. -
checkOrcVersion
Check to see if this ORC file is from a future version and if so, warn the user that we may not be able to read all of the column encodings.- Parameters:
path
- the data source path for error messagespostscript
- the parsed postscript- Throws:
IOException
-
getFileSystem
- Throws:
IOException
-
getFileSystemSupplier
-
getWriterVersion
Get the WriterVersion based on the ORC file postscript.- Parameters:
writerVersion
- the integer writer version- Returns:
- the version of the software that produced the file
-
extractMetadata
public static OrcProto.Metadata extractMetadata(ByteBuffer bb, int metadataAbsPos, int metadataSize, InStream.StreamOptions options) throws IOException - Throws:
IOException
-
extractFileTail
Deprecated.UseextractFileTail(FileSystem, Path, long)
instead. This is for backward compatibility.- Throws:
IOException
-
getCompressionBlockSize
Read compression block size from the postscript if it is set; otherwise, use the same 256k default the C++ implementation uses. -
extractFileTail
public static OrcTail extractFileTail(ByteBuffer buffer, long fileLen, long modificationTime) throws IOException Deprecated.UseextractFileTail(FileSystem, Path, long)
instead. This is for backward compatibility.- Throws:
IOException
-
extractFileTail
- Throws:
IOException
-
writerUsedProlepticGregorian
public boolean writerUsedProlepticGregorian()Description copied from interface:Reader
Was the file written using the proleptic Gregorian calendar.- Specified by:
writerUsedProlepticGregorian
in interfaceReader
-
getConvertToProlepticGregorian
public boolean getConvertToProlepticGregorian()Description copied from interface:Reader
Should the returned values use the proleptic Gregorian calendar?- Specified by:
getConvertToProlepticGregorian
in interfaceReader
-
options
Description copied from interface:Reader
Create a default options object that can be customized for creating a RecordReader. -
rows
Description copied from interface:Reader
Create a RecordReader that reads everything with the default options.- Specified by:
rows
in interfaceReader
- Returns:
- a new RecordReader
- Throws:
IOException
-
rows
Description copied from interface:Reader
Create a RecordReader that uses the options given. This method can't be named rows, because many callers used rows(null) before the rows() method was introduced.- Specified by:
rows
in interfaceReader
- Parameters:
options
- the options to read with- Returns:
- a new RecordReader
- Throws:
IOException
-
getRawDataSize
public long getRawDataSize()Description copied from interface:Reader
Get the deserialized data size of the file- Specified by:
getRawDataSize
in interfaceReader
- Returns:
- raw data size
-
getRawDataSizeFromColIndices
Description copied from interface:Reader
Get the deserialized data size of the specified columns ids- Specified by:
getRawDataSizeFromColIndices
in interfaceReader
- Parameters:
colIndices
- - internal column id (check orcfiledump for column ids)- Returns:
- raw data size of columns
-
getRawDataSizeFromColIndices
public static long getRawDataSizeFromColIndices(List<Integer> colIndices, List<OrcProto.Type> types, List<OrcProto.ColumnStatistics> stats) throws FileFormatException - Throws:
FileFormatException
-
getRawDataSizeOfColumns
Description copied from interface:Reader
Get the deserialized data size of the specified columns- Specified by:
getRawDataSizeOfColumns
in interfaceReader
- Parameters:
colNames
- the list of column names- Returns:
- raw data size of columns
-
getOrcProtoStripeStatistics
- Specified by:
getOrcProtoStripeStatistics
in interfaceReader
- Returns:
- Stripe statistics, in original protobuf form.
-
getOrcProtoFileStatistics
- Specified by:
getOrcProtoFileStatistics
in interfaceReader
- Returns:
- File statistics, in original protobuf form.
-
getStripeStatistics
Description copied from interface:Reader
Get the stripe statistics for all of the columns.- Specified by:
getStripeStatistics
in interfaceReader
- Returns:
- a list of the statistics for each stripe in the file
- Throws:
IOException
-
getStripeStatistics
Description copied from interface:Reader
Get the stripe statistics from the file.- Specified by:
getStripeStatistics
in interfaceReader
- Parameters:
included
- null for all columns or an array where the required columns are selected- Returns:
- a list of the statistics for each stripe in the file
- Throws:
IOException
-
getVersionList
- Specified by:
getVersionList
in interfaceReader
- Returns:
- List of integers representing version of the file, in order from major to minor.
-
getMetadataSize
public int getMetadataSize()- Specified by:
getMetadataSize
in interfaceReader
- Returns:
- Gets the size of metadata, in bytes.
-
toString
-
close
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
takeFile
Take the file from the reader. This allows the first RecordReader to use the same file, but additional RecordReaders will open new handles.- Returns:
- a file handle, if one is available
-
ensureOrcFooter(FSDataInputStream, Path, int, ByteBuffer)
instead.