Package org.apache.orc.impl
Class ReaderImpl
java.lang.Object
org.apache.orc.impl.ReaderImpl
- All Implemented Interfaces:
Closeable,AutoCloseable,Reader
-
Nested Class Summary
Nested ClassesNested classes/interfaces inherited from interface org.apache.orc.Reader
Reader.Options -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected intprotected final CompressionKindprotected final Configurationstatic final intprotected FSDataInputStreamprotected final OrcFile.ReaderOptionsprotected final Pathprotected final intprotected List<OrcProto.StripeStatistics>protected final OrcTailprotected final List<OrcProto.Type>protected final boolean -
Constructor Summary
ConstructorsConstructorDescriptionReaderImpl(Path path, OrcFile.ReaderOptions options) Constructor that let's the user specify additional options. -
Method Summary
Modifier and TypeMethodDescriptionprotected static voidcheckOrcVersion(Path path, OrcProto.PostScript postscript) Check to see if this ORC file is from a future version and if so, warn the user that we may not be able to read all of the column encodings.voidclose()deserializeStats(TypeDescription schema, List<OrcProto.ColumnStatistics> fileStats) protected static voidensureOrcFooter(ByteBuffer buffer, int psLen) Deprecated.protected static voidensureOrcFooter(FSDataInputStream in, Path path, int psLen, ByteBuffer buffer) Ensure this is an ORC file to prevent users from trying to read text files or RC files as ORC files.static OrcTailextractFileTail(ByteBuffer buffer) Deprecated.UseextractFileTail(FileSystem, Path, long)instead.static OrcTailextractFileTail(ByteBuffer buffer, long fileLen, long modificationTime) Deprecated.UseextractFileTail(FileSystem, Path, long)instead.protected OrcTailextractFileTail(FileSystem fs, Path path, long maxFileLength) static OrcProto.MetadataextractMetadata(ByteBuffer bb, int metadataAbsPos, int metadataSize, InStream.StreamOptions options) Get the list of encryption keys for column encryption.static intgetCompressionBlockSize(OrcProto.PostScript postScript) Read compression block size from the postscript if it is set; otherwise, use the same 256k default the C++ implementation uses.Get the compression kind.intGet the buffer size for the compression.longGet the length of the file.booleanShould the returned values use the proleptic Gregorian calendar?Get the data masks for the unencrypted variant of the data.Internal access to our view of the encryption.Get the list of encryption variants for the data.protected FileSystemprotected Supplier<FileSystem>Get the file tail (footer + postscript)Get the file format version.static OrcFile.VersiongetFileVersion(List<Integer> versionList) Get the user metadata keys.intgetMetadataValue(String key) Get a user metadata value.longGet the number of rows in the file.longGet the deserialized data size of the filelonggetRawDataSizeFromColIndices(List<Integer> colIndices) Get the deserialized data size of the specified columns idsstatic longgetRawDataSizeFromColIndices(List<Integer> colIndices, List<OrcProto.Type> types, List<OrcProto.ColumnStatistics> stats) longgetRawDataSizeOfColumns(List<String> colNames) Get the deserialized data size of the specified columnsintGet the number of rows per a entry in the row index.Get the type of rows in this ORC file.Get the implementation and version of the software that wrote the file.Get the statistics about the columns in the file.Get the list of stripes.Get the stripe statistics for all of the columns.getStripeStatistics(boolean[] included) Get the stripe statistics from the file.getTypes()Get the list of types contained in the file.Get the stripe statistics for a given variant.Get the version of the writer of this file.static OrcFile.WriterVersiongetWriterVersion(int writerVersion) Get the WriterVersion based on the ORC file postscript.booleanhasMetadataValue(String key) Did the user set the given metadata value.options()Create a default options object that can be customized for creating a RecordReader.rows()Create a RecordReader that reads everything with the default options.rows(Reader.Options options) Create a RecordReader that uses the options given.takeFile()Take the file from the reader.toString()booleanWas the file written using the proleptic Gregorian calendar.
-
Field Details
-
DEFAULT_COMPRESSION_BLOCK_SIZE
public static final int DEFAULT_COMPRESSION_BLOCK_SIZE- See Also:
-
path
-
options
-
compressionKind
-
file
-
bufferSize
protected int bufferSize -
stripeStatistics
-
types
-
rowIndexStride
protected final int rowIndexStride -
conf
-
useUTCTimestamp
protected final boolean useUTCTimestamp -
tail
-
-
Constructor Details
-
ReaderImpl
Constructor that let's the user specify additional options.- Parameters:
path- pathname for fileoptions- options for reading- Throws:
IOException
-
-
Method Details
-
getNumberOfRows
public long getNumberOfRows()Description copied from interface:ReaderGet the number of rows in the file.- Specified by:
getNumberOfRowsin interfaceReader- Returns:
- the number of rows
-
getMetadataKeys
Description copied from interface:ReaderGet the user metadata keys.- Specified by:
getMetadataKeysin interfaceReader- Returns:
- the set of metadata keys
-
getMetadataValue
Description copied from interface:ReaderGet a user metadata value.- Specified by:
getMetadataValuein interfaceReader- Parameters:
key- a key given by the user- Returns:
- the bytes associated with the given key
-
hasMetadataValue
Description copied from interface:ReaderDid the user set the given metadata value.- Specified by:
hasMetadataValuein interfaceReader- Parameters:
key- the key to check- Returns:
- true if the metadata value was set
-
getCompressionKind
Description copied from interface:ReaderGet the compression kind.- Specified by:
getCompressionKindin interfaceReader- Returns:
- the kind of compression in the file
-
getCompressionSize
public int getCompressionSize()Description copied from interface:ReaderGet the buffer size for the compression.- Specified by:
getCompressionSizein interfaceReader- Returns:
- number of bytes to buffer for the compression codec.
-
getStripes
Description copied from interface:ReaderGet the list of stripes.- Specified by:
getStripesin interfaceReader- Returns:
- the information about the stripes in order
-
getContentLength
public long getContentLength()Description copied from interface:ReaderGet the length of the file.- Specified by:
getContentLengthin interfaceReader- Returns:
- the number of bytes in the file
-
getTypes
Description copied from interface:ReaderGet the list of types contained in the file. The root type is the first type in the list. -
getFileVersion
-
getFileVersion
Description copied from interface:ReaderGet the file format version.- Specified by:
getFileVersionin interfaceReader
-
getWriterVersion
Description copied from interface:ReaderGet the version of the writer of this file.- Specified by:
getWriterVersionin interfaceReader
-
getSoftwareVersion
Description copied from interface:ReaderGet the implementation and version of the software that wrote the file. It defaults to "ORC Java" for old files. For current files, we include the version also.- Specified by:
getSoftwareVersionin interfaceReader- Returns:
- returns the writer implementation and hopefully the version of the software
-
getFileTail
Description copied from interface:ReaderGet the file tail (footer + postscript)- Specified by:
getFileTailin interfaceReader- Returns:
- - file tail
-
getColumnEncryptionKeys
Description copied from interface:ReaderGet the list of encryption keys for column encryption.- Specified by:
getColumnEncryptionKeysin interfaceReader- Returns:
- the set of encryption keys
-
getDataMasks
Description copied from interface:ReaderGet the data masks for the unencrypted variant of the data.- Specified by:
getDataMasksin interfaceReader- Returns:
- the lists of data masks
-
getEncryptionVariants
Description copied from interface:ReaderGet the list of encryption variants for the data.- Specified by:
getEncryptionVariantsin interfaceReader
-
getVariantStripeStatistics
public List<StripeStatistics> getVariantStripeStatistics(EncryptionVariant variant) throws IOException Description copied from interface:ReaderGet the stripe statistics for a given variant. The StripeStatistics will have 1 entry for each column in the variant. This enables the user to get the stripe statistics for each variant regardless of which keys are available.- Specified by:
getVariantStripeStatisticsin interfaceReader- Parameters:
variant- the encryption variant or null for unencrypted- Returns:
- a list of stripe statistics (one per a stripe)
- Throws:
IOException- if the required key is not available
-
getEncryption
Internal access to our view of the encryption.- Returns:
- the encryption information for this reader.
-
getRowIndexStride
public int getRowIndexStride()Description copied from interface:ReaderGet the number of rows per a entry in the row index.- Specified by:
getRowIndexStridein interfaceReader- Returns:
- the number of rows per an entry in the row index or 0 if there is no row index.
-
getStatistics
Description copied from interface:ReaderGet the statistics about the columns in the file.- Specified by:
getStatisticsin interfaceReader- Returns:
- the information about the column
-
deserializeStats
public ColumnStatistics[] deserializeStats(TypeDescription schema, List<OrcProto.ColumnStatistics> fileStats) -
getSchema
Description copied from interface:ReaderGet the type of rows in this ORC file. -
checkOrcVersion
Check to see if this ORC file is from a future version and if so, warn the user that we may not be able to read all of the column encodings.- Parameters:
path- the data source path for error messagespostscript- the parsed postscript- Throws:
IOException
-
getFileSystem
- Throws:
IOException
-
getFileSystemSupplier
-
getWriterVersion
Get the WriterVersion based on the ORC file postscript.- Parameters:
writerVersion- the integer writer version- Returns:
- the version of the software that produced the file
-
extractMetadata
public static OrcProto.Metadata extractMetadata(ByteBuffer bb, int metadataAbsPos, int metadataSize, InStream.StreamOptions options) throws IOException - Throws:
IOException
-
extractFileTail
Deprecated.UseextractFileTail(FileSystem, Path, long)instead. This is for backward compatibility.- Throws:
IOException
-
getCompressionBlockSize
Read compression block size from the postscript if it is set; otherwise, use the same 256k default the C++ implementation uses. -
extractFileTail
@Deprecated public static OrcTail extractFileTail(ByteBuffer buffer, long fileLen, long modificationTime) throws IOException Deprecated.UseextractFileTail(FileSystem, Path, long)instead. This is for backward compatibility.- Throws:
IOException
-
extractFileTail
- Throws:
IOException
-
writerUsedProlepticGregorian
public boolean writerUsedProlepticGregorian()Description copied from interface:ReaderWas the file written using the proleptic Gregorian calendar.- Specified by:
writerUsedProlepticGregorianin interfaceReader
-
getConvertToProlepticGregorian
public boolean getConvertToProlepticGregorian()Description copied from interface:ReaderShould the returned values use the proleptic Gregorian calendar?- Specified by:
getConvertToProlepticGregorianin interfaceReader
-
options
Description copied from interface:ReaderCreate a default options object that can be customized for creating a RecordReader. -
rows
Description copied from interface:ReaderCreate a RecordReader that reads everything with the default options.- Specified by:
rowsin interfaceReader- Returns:
- a new RecordReader
- Throws:
IOException
-
rows
Description copied from interface:ReaderCreate a RecordReader that uses the options given. This method can't be named rows, because many callers used rows(null) before the rows() method was introduced.- Specified by:
rowsin interfaceReader- Parameters:
options- the options to read with- Returns:
- a new RecordReader
- Throws:
IOException
-
getRawDataSize
public long getRawDataSize()Description copied from interface:ReaderGet the deserialized data size of the file- Specified by:
getRawDataSizein interfaceReader- Returns:
- raw data size
-
getRawDataSizeFromColIndices
Description copied from interface:ReaderGet the deserialized data size of the specified columns ids- Specified by:
getRawDataSizeFromColIndicesin interfaceReader- Parameters:
colIndices- - internal column id (check orcfiledump for column ids)- Returns:
- raw data size of columns
-
getRawDataSizeFromColIndices
public static long getRawDataSizeFromColIndices(List<Integer> colIndices, List<OrcProto.Type> types, List<OrcProto.ColumnStatistics> stats) throws FileFormatException - Throws:
FileFormatException
-
getRawDataSizeOfColumns
Description copied from interface:ReaderGet the deserialized data size of the specified columns- Specified by:
getRawDataSizeOfColumnsin interfaceReader- Parameters:
colNames- the list of column names- Returns:
- raw data size of columns
-
getOrcProtoStripeStatistics
- Specified by:
getOrcProtoStripeStatisticsin interfaceReader- Returns:
- Stripe statistics, in original protobuf form.
-
getOrcProtoFileStatistics
- Specified by:
getOrcProtoFileStatisticsin interfaceReader- Returns:
- File statistics, in original protobuf form.
-
getStripeStatistics
Description copied from interface:ReaderGet the stripe statistics for all of the columns.- Specified by:
getStripeStatisticsin interfaceReader- Returns:
- a list of the statistics for each stripe in the file
- Throws:
IOException
-
getStripeStatistics
Description copied from interface:ReaderGet the stripe statistics from the file.- Specified by:
getStripeStatisticsin interfaceReader- Parameters:
included- null for all columns or an array where the required columns are selected- Returns:
- a list of the statistics for each stripe in the file
- Throws:
IOException
-
getVersionList
- Specified by:
getVersionListin interfaceReader- Returns:
- List of integers representing version of the file, in order from major to minor.
-
getMetadataSize
public int getMetadataSize()- Specified by:
getMetadataSizein interfaceReader- Returns:
- Gets the size of metadata, in bytes.
-
toString
-
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException
-
takeFile
Take the file from the reader. This allows the first RecordReader to use the same file, but additional RecordReaders will open new handles.- Returns:
- a file handle, if one is available
-
ensureOrcFooter(FSDataInputStream, Path, int, ByteBuffer)instead.