Interface Reader

All Superinterfaces:
AutoCloseable, Closeable
All Known Implementing Classes:
ReaderImpl

public interface Reader extends Closeable
The interface for reading ORC files.

One Reader can support multiple concurrent RecordReader.

Since:
1.1.0
  • Method Details

    • getNumberOfRows

      long getNumberOfRows()
      Get the number of rows in the file.
      Returns:
      the number of rows
      Since:
      1.1.0
    • getRawDataSize

      long getRawDataSize()
      Get the deserialized data size of the file
      Returns:
      raw data size
      Since:
      1.1.0
    • getRawDataSizeOfColumns

      long getRawDataSizeOfColumns(List<String> colNames)
      Get the deserialized data size of the specified columns
      Parameters:
      colNames - the list of column names
      Returns:
      raw data size of columns
      Since:
      1.1.0
    • getRawDataSizeFromColIndices

      long getRawDataSizeFromColIndices(List<Integer> colIds)
      Get the deserialized data size of the specified columns ids
      Parameters:
      colIds - - internal column id (check orcfiledump for column ids)
      Returns:
      raw data size of columns
      Since:
      1.1.0
    • getMetadataKeys

      List<String> getMetadataKeys()
      Get the user metadata keys.
      Returns:
      the set of metadata keys
      Since:
      1.1.0
    • getMetadataValue

      ByteBuffer getMetadataValue(String key)
      Get a user metadata value.
      Parameters:
      key - a key given by the user
      Returns:
      the bytes associated with the given key
      Since:
      1.1.0
    • hasMetadataValue

      boolean hasMetadataValue(String key)
      Did the user set the given metadata value.
      Parameters:
      key - the key to check
      Returns:
      true if the metadata value was set
      Since:
      1.1.0
    • getCompressionKind

      CompressionKind getCompressionKind()
      Get the compression kind.
      Returns:
      the kind of compression in the file
      Since:
      1.1.0
    • getCompressionSize

      int getCompressionSize()
      Get the buffer size for the compression.
      Returns:
      number of bytes to buffer for the compression codec.
      Since:
      1.1.0
    • getRowIndexStride

      int getRowIndexStride()
      Get the number of rows per a entry in the row index.
      Returns:
      the number of rows per an entry in the row index or 0 if there is no row index.
      Since:
      1.1.0
    • getStripes

      List<StripeInformation> getStripes()
      Get the list of stripes.
      Returns:
      the information about the stripes in order
      Since:
      1.1.0
    • getContentLength

      long getContentLength()
      Get the length of the file.
      Returns:
      the number of bytes in the file
      Since:
      1.1.0
    • getStatistics

      ColumnStatistics[] getStatistics()
      Get the statistics about the columns in the file.
      Returns:
      the information about the column
      Since:
      1.1.0
    • getSchema

      TypeDescription getSchema()
      Get the type of rows in this ORC file.
      Since:
      1.1.0
    • getTypes

      List<OrcProto.Type> getTypes()
      Deprecated.
      use getSchema instead
      Get the list of types contained in the file. The root type is the first type in the list.
      Returns:
      the list of flattened types
      Since:
      1.1.0
    • getFileVersion

      OrcFile.Version getFileVersion()
      Get the file format version.
      Since:
      1.1.0
    • getWriterVersion

      OrcFile.WriterVersion getWriterVersion()
      Get the version of the writer of this file.
      Since:
      1.1.0
    • getSoftwareVersion

      String getSoftwareVersion()
      Get the implementation and version of the software that wrote the file. It defaults to "ORC Java" for old files. For current files, we include the version also.
      Returns:
      returns the writer implementation and hopefully the version of the software
      Since:
      1.5.13
    • getFileTail

      OrcProto.FileTail getFileTail()
      Get the file tail (footer + postscript)
      Returns:
      - file tail
      Since:
      1.1.0
    • getColumnEncryptionKeys

      EncryptionKey[] getColumnEncryptionKeys()
      Get the list of encryption keys for column encryption.
      Returns:
      the set of encryption keys
      Since:
      1.6.0
    • getDataMasks

      DataMaskDescription[] getDataMasks()
      Get the data masks for the unencrypted variant of the data.
      Returns:
      the lists of data masks
      Since:
      1.6.0
    • getEncryptionVariants

      EncryptionVariant[] getEncryptionVariants()
      Get the list of encryption variants for the data.
      Since:
      1.6.0
    • getVariantStripeStatistics

      List<StripeStatistics> getVariantStripeStatistics(EncryptionVariant variant) throws IOException
      Get the stripe statistics for a given variant. The StripeStatistics will have 1 entry for each column in the variant. This enables the user to get the stripe statistics for each variant regardless of which keys are available.
      Parameters:
      variant - the encryption variant or null for unencrypted
      Returns:
      a list of stripe statistics (one per a stripe)
      Throws:
      IOException - if the required key is not available
      Since:
      1.6.0
    • options

      Reader.Options options()
      Create a default options object that can be customized for creating a RecordReader.
      Returns:
      a new default Options object
      Since:
      1.2.0
    • rows

      RecordReader rows() throws IOException
      Create a RecordReader that reads everything with the default options.
      Returns:
      a new RecordReader
      Throws:
      IOException
      Since:
      1.1.0
    • rows

      RecordReader rows(Reader.Options options) throws IOException
      Create a RecordReader that uses the options given. This method can't be named rows, because many callers used rows(null) before the rows() method was introduced.
      Parameters:
      options - the options to read with
      Returns:
      a new RecordReader
      Throws:
      IOException
      Since:
      1.1.0
    • getVersionList

      List<Integer> getVersionList()
      Returns:
      List of integers representing version of the file, in order from major to minor.
      Since:
      1.1.0
    • getMetadataSize

      int getMetadataSize()
      Returns:
      Gets the size of metadata, in bytes.
      Since:
      1.1.0
    • getOrcProtoStripeStatistics

      List<OrcProto.StripeStatistics> getOrcProtoStripeStatistics()
      Deprecated.
      Returns:
      Stripe statistics, in original protobuf form.
      Since:
      1.1.0
    • getStripeStatistics

      List<StripeStatistics> getStripeStatistics() throws IOException
      Get the stripe statistics for all of the columns.
      Returns:
      a list of the statistics for each stripe in the file
      Throws:
      IOException
      Since:
      1.2.0
    • getStripeStatistics

      List<StripeStatistics> getStripeStatistics(boolean[] include) throws IOException
      Get the stripe statistics from the file.
      Parameters:
      include - null for all columns or an array where the required columns are selected
      Returns:
      a list of the statistics for each stripe in the file
      Throws:
      IOException
      Since:
      1.6.0
    • getOrcProtoFileStatistics

      List<OrcProto.ColumnStatistics> getOrcProtoFileStatistics()
      Deprecated.
      Use getStatistics() instead.
      Returns:
      File statistics, in original protobuf form.
      Since:
      1.1.0
    • getSerializedFileFooter

      ByteBuffer getSerializedFileFooter()
      Returns:
      Serialized file metadata read from disk for the purposes of caching, etc.
      Since:
      1.1.0
    • writerUsedProlepticGregorian

      boolean writerUsedProlepticGregorian()
      Was the file written using the proleptic Gregorian calendar.
      Since:
      1.5.9
    • getConvertToProlepticGregorian

      boolean getConvertToProlepticGregorian()
      Should the returned values use the proleptic Gregorian calendar?
      Since:
      1.5.9