Package org.apache.orc
Interface Reader
- All Superinterfaces:
AutoCloseable
,Closeable
- All Known Implementing Classes:
ReaderImpl
The interface for reading ORC files.
One Reader can support multiple concurrent RecordReader.
- Since:
- 1.1.0
-
Nested Class Summary
Modifier and TypeInterfaceDescriptionstatic class
Options for creating a RecordReader. -
Method Summary
Modifier and TypeMethodDescriptionGet the list of encryption keys for column encryption.Get the compression kind.int
Get the buffer size for the compression.long
Get the length of the file.boolean
Should the returned values use the proleptic Gregorian calendar?Get the data masks for the unencrypted variant of the data.Get the list of encryption variants for the data.Get the file tail (footer + postscript)Get the file format version.Get the user metadata keys.int
getMetadataValue
(String key) Get a user metadata value.long
Get the number of rows in the file.Deprecated.Deprecated.UsegetStripeStatistics()
instead.long
Get the deserialized data size of the filelong
getRawDataSizeFromColIndices
(List<Integer> colIds) Get the deserialized data size of the specified columns idslong
getRawDataSizeOfColumns
(List<String> colNames) Get the deserialized data size of the specified columnsint
Get the number of rows per a entry in the row index.Get the type of rows in this ORC file.Get the implementation and version of the software that wrote the file.Get the statistics about the columns in the file.Get the list of stripes.Get the stripe statistics for all of the columns.getStripeStatistics
(boolean[] include) Get the stripe statistics from the file.getTypes()
Deprecated.use getSchema insteadGet the stripe statistics for a given variant.Get the version of the writer of this file.boolean
hasMetadataValue
(String key) Did the user set the given metadata value.options()
Create a default options object that can be customized for creating a RecordReader.rows()
Create a RecordReader that reads everything with the default options.rows
(Reader.Options options) Create a RecordReader that uses the options given.boolean
Was the file written using the proleptic Gregorian calendar.
-
Method Details
-
getNumberOfRows
long getNumberOfRows()Get the number of rows in the file.- Returns:
- the number of rows
- Since:
- 1.1.0
-
getRawDataSize
long getRawDataSize()Get the deserialized data size of the file- Returns:
- raw data size
- Since:
- 1.1.0
-
getRawDataSizeOfColumns
Get the deserialized data size of the specified columns- Parameters:
colNames
- the list of column names- Returns:
- raw data size of columns
- Since:
- 1.1.0
-
getRawDataSizeFromColIndices
Get the deserialized data size of the specified columns ids- Parameters:
colIds
- - internal column id (check orcfiledump for column ids)- Returns:
- raw data size of columns
- Since:
- 1.1.0
-
getMetadataKeys
Get the user metadata keys.- Returns:
- the set of metadata keys
- Since:
- 1.1.0
-
getMetadataValue
Get a user metadata value.- Parameters:
key
- a key given by the user- Returns:
- the bytes associated with the given key
- Since:
- 1.1.0
-
hasMetadataValue
Did the user set the given metadata value.- Parameters:
key
- the key to check- Returns:
- true if the metadata value was set
- Since:
- 1.1.0
-
getCompressionKind
CompressionKind getCompressionKind()Get the compression kind.- Returns:
- the kind of compression in the file
- Since:
- 1.1.0
-
getCompressionSize
int getCompressionSize()Get the buffer size for the compression.- Returns:
- number of bytes to buffer for the compression codec.
- Since:
- 1.1.0
-
getRowIndexStride
int getRowIndexStride()Get the number of rows per a entry in the row index.- Returns:
- the number of rows per an entry in the row index or 0 if there is no row index.
- Since:
- 1.1.0
-
getStripes
List<StripeInformation> getStripes()Get the list of stripes.- Returns:
- the information about the stripes in order
- Since:
- 1.1.0
-
getContentLength
long getContentLength()Get the length of the file.- Returns:
- the number of bytes in the file
- Since:
- 1.1.0
-
getStatistics
ColumnStatistics[] getStatistics()Get the statistics about the columns in the file.- Returns:
- the information about the column
- Since:
- 1.1.0
-
getSchema
TypeDescription getSchema()Get the type of rows in this ORC file.- Since:
- 1.1.0
-
getTypes
List<OrcProto.Type> getTypes()Deprecated.use getSchema insteadGet the list of types contained in the file. The root type is the first type in the list.- Returns:
- the list of flattened types
- Since:
- 1.1.0
-
getFileVersion
OrcFile.Version getFileVersion()Get the file format version.- Since:
- 1.1.0
-
getWriterVersion
OrcFile.WriterVersion getWriterVersion()Get the version of the writer of this file.- Since:
- 1.1.0
-
getSoftwareVersion
String getSoftwareVersion()Get the implementation and version of the software that wrote the file. It defaults to "ORC Java" for old files. For current files, we include the version also.- Returns:
- returns the writer implementation and hopefully the version of the software
- Since:
- 1.5.13
-
getFileTail
OrcProto.FileTail getFileTail()Get the file tail (footer + postscript)- Returns:
- - file tail
- Since:
- 1.1.0
-
getColumnEncryptionKeys
EncryptionKey[] getColumnEncryptionKeys()Get the list of encryption keys for column encryption.- Returns:
- the set of encryption keys
- Since:
- 1.6.0
-
getDataMasks
DataMaskDescription[] getDataMasks()Get the data masks for the unencrypted variant of the data.- Returns:
- the lists of data masks
- Since:
- 1.6.0
-
getEncryptionVariants
EncryptionVariant[] getEncryptionVariants()Get the list of encryption variants for the data.- Since:
- 1.6.0
-
getVariantStripeStatistics
Get the stripe statistics for a given variant. The StripeStatistics will have 1 entry for each column in the variant. This enables the user to get the stripe statistics for each variant regardless of which keys are available.- Parameters:
variant
- the encryption variant or null for unencrypted- Returns:
- a list of stripe statistics (one per a stripe)
- Throws:
IOException
- if the required key is not available- Since:
- 1.6.0
-
options
Reader.Options options()Create a default options object that can be customized for creating a RecordReader.- Returns:
- a new default Options object
- Since:
- 1.2.0
-
rows
Create a RecordReader that reads everything with the default options.- Returns:
- a new RecordReader
- Throws:
IOException
- Since:
- 1.1.0
-
rows
Create a RecordReader that uses the options given. This method can't be named rows, because many callers used rows(null) before the rows() method was introduced.- Parameters:
options
- the options to read with- Returns:
- a new RecordReader
- Throws:
IOException
- Since:
- 1.1.0
-
getVersionList
- Returns:
- List of integers representing version of the file, in order from major to minor.
- Since:
- 1.1.0
-
getMetadataSize
int getMetadataSize()- Returns:
- Gets the size of metadata, in bytes.
- Since:
- 1.1.0
-
getOrcProtoStripeStatistics
List<OrcProto.StripeStatistics> getOrcProtoStripeStatistics()Deprecated.UsegetStripeStatistics()
instead.- Returns:
- Stripe statistics, in original protobuf form.
- Since:
- 1.1.0
-
getStripeStatistics
Get the stripe statistics for all of the columns.- Returns:
- a list of the statistics for each stripe in the file
- Throws:
IOException
- Since:
- 1.2.0
-
getStripeStatistics
Get the stripe statistics from the file.- Parameters:
include
- null for all columns or an array where the required columns are selected- Returns:
- a list of the statistics for each stripe in the file
- Throws:
IOException
- Since:
- 1.6.0
-
getOrcProtoFileStatistics
List<OrcProto.ColumnStatistics> getOrcProtoFileStatistics()Deprecated.UsegetStatistics()
instead.- Returns:
- File statistics, in original protobuf form.
- Since:
- 1.1.0
-
writerUsedProlepticGregorian
boolean writerUsedProlepticGregorian()Was the file written using the proleptic Gregorian calendar.- Since:
- 1.5.9
-
getConvertToProlepticGregorian
boolean getConvertToProlepticGregorian()Should the returned values use the proleptic Gregorian calendar?- Since:
- 1.5.9
-
getStatistics()
instead.