Class ReaderImpl

java.lang.Object
org.apache.orc.impl.ReaderImpl
All Implemented Interfaces:
Closeable, AutoCloseable, Reader

public class ReaderImpl extends Object implements Reader
  • Field Details

  • Constructor Details

    • ReaderImpl

      public ReaderImpl(Path path, OrcFile.ReaderOptions options) throws IOException
      Constructor that let's the user specify additional options.
      Parameters:
      path - pathname for file
      options - options for reading
      Throws:
      IOException
  • Method Details

    • getNumberOfRows

      public long getNumberOfRows()
      Description copied from interface: Reader
      Get the number of rows in the file.
      Specified by:
      getNumberOfRows in interface Reader
      Returns:
      the number of rows
    • getMetadataKeys

      public List<String> getMetadataKeys()
      Description copied from interface: Reader
      Get the user metadata keys.
      Specified by:
      getMetadataKeys in interface Reader
      Returns:
      the set of metadata keys
    • getMetadataValue

      public ByteBuffer getMetadataValue(String key)
      Description copied from interface: Reader
      Get a user metadata value.
      Specified by:
      getMetadataValue in interface Reader
      Parameters:
      key - a key given by the user
      Returns:
      the bytes associated with the given key
    • hasMetadataValue

      public boolean hasMetadataValue(String key)
      Description copied from interface: Reader
      Did the user set the given metadata value.
      Specified by:
      hasMetadataValue in interface Reader
      Parameters:
      key - the key to check
      Returns:
      true if the metadata value was set
    • getCompressionKind

      public CompressionKind getCompressionKind()
      Description copied from interface: Reader
      Get the compression kind.
      Specified by:
      getCompressionKind in interface Reader
      Returns:
      the kind of compression in the file
    • getCompressionSize

      public int getCompressionSize()
      Description copied from interface: Reader
      Get the buffer size for the compression.
      Specified by:
      getCompressionSize in interface Reader
      Returns:
      number of bytes to buffer for the compression codec.
    • getStripes

      public List<StripeInformation> getStripes()
      Description copied from interface: Reader
      Get the list of stripes.
      Specified by:
      getStripes in interface Reader
      Returns:
      the information about the stripes in order
    • getContentLength

      public long getContentLength()
      Description copied from interface: Reader
      Get the length of the file.
      Specified by:
      getContentLength in interface Reader
      Returns:
      the number of bytes in the file
    • getTypes

      public List<OrcProto.Type> getTypes()
      Description copied from interface: Reader
      Get the list of types contained in the file. The root type is the first type in the list.
      Specified by:
      getTypes in interface Reader
      Returns:
      the list of flattened types
    • getFileVersion

      public static OrcFile.Version getFileVersion(List<Integer> versionList)
    • getFileVersion

      public OrcFile.Version getFileVersion()
      Description copied from interface: Reader
      Get the file format version.
      Specified by:
      getFileVersion in interface Reader
    • getWriterVersion

      public OrcFile.WriterVersion getWriterVersion()
      Description copied from interface: Reader
      Get the version of the writer of this file.
      Specified by:
      getWriterVersion in interface Reader
    • getSoftwareVersion

      public String getSoftwareVersion()
      Description copied from interface: Reader
      Get the implementation and version of the software that wrote the file. It defaults to "ORC Java" for old files. For current files, we include the version also.
      Specified by:
      getSoftwareVersion in interface Reader
      Returns:
      returns the writer implementation and hopefully the version of the software
    • getFileTail

      public OrcProto.FileTail getFileTail()
      Description copied from interface: Reader
      Get the file tail (footer + postscript)
      Specified by:
      getFileTail in interface Reader
      Returns:
      - file tail
    • getColumnEncryptionKeys

      public EncryptionKey[] getColumnEncryptionKeys()
      Description copied from interface: Reader
      Get the list of encryption keys for column encryption.
      Specified by:
      getColumnEncryptionKeys in interface Reader
      Returns:
      the set of encryption keys
    • getDataMasks

      public DataMaskDescription[] getDataMasks()
      Description copied from interface: Reader
      Get the data masks for the unencrypted variant of the data.
      Specified by:
      getDataMasks in interface Reader
      Returns:
      the lists of data masks
    • getEncryptionVariants

      public ReaderEncryptionVariant[] getEncryptionVariants()
      Description copied from interface: Reader
      Get the list of encryption variants for the data.
      Specified by:
      getEncryptionVariants in interface Reader
    • getVariantStripeStatistics

      public List<StripeStatistics> getVariantStripeStatistics(EncryptionVariant variant) throws IOException
      Description copied from interface: Reader
      Get the stripe statistics for a given variant. The StripeStatistics will have 1 entry for each column in the variant. This enables the user to get the stripe statistics for each variant regardless of which keys are available.
      Specified by:
      getVariantStripeStatistics in interface Reader
      Parameters:
      variant - the encryption variant or null for unencrypted
      Returns:
      a list of stripe statistics (one per a stripe)
      Throws:
      IOException - if the required key is not available
    • getEncryption

      public ReaderEncryption getEncryption()
      Internal access to our view of the encryption.
      Returns:
      the encryption information for this reader.
    • getRowIndexStride

      public int getRowIndexStride()
      Description copied from interface: Reader
      Get the number of rows per a entry in the row index.
      Specified by:
      getRowIndexStride in interface Reader
      Returns:
      the number of rows per an entry in the row index or 0 if there is no row index.
    • getStatistics

      public ColumnStatistics[] getStatistics()
      Description copied from interface: Reader
      Get the statistics about the columns in the file.
      Specified by:
      getStatistics in interface Reader
      Returns:
      the information about the column
    • deserializeStats

      public ColumnStatistics[] deserializeStats(TypeDescription schema, List<OrcProto.ColumnStatistics> fileStats)
    • getSchema

      public TypeDescription getSchema()
      Description copied from interface: Reader
      Get the type of rows in this ORC file.
      Specified by:
      getSchema in interface Reader
    • ensureOrcFooter

      protected static void ensureOrcFooter(FSDataInputStream in, Path path, int psLen, ByteBuffer buffer) throws IOException
      Ensure this is an ORC file to prevent users from trying to read text files or RC files as ORC files.
      Parameters:
      in - the file being read
      path - the filename for error messages
      psLen - the postscript length
      buffer - the tail of the file
      Throws:
      IOException
    • ensureOrcFooter

      protected static void ensureOrcFooter(ByteBuffer buffer, int psLen) throws IOException
      Ensure this is an ORC file to prevent users from trying to read text files or RC files as ORC files.
      Parameters:
      psLen - the postscript length
      buffer - the tail of the file
      Throws:
      IOException
    • checkOrcVersion

      protected static void checkOrcVersion(Path path, OrcProto.PostScript postscript) throws IOException
      Check to see if this ORC file is from a future version and if so, warn the user that we may not be able to read all of the column encodings.
      Parameters:
      path - the data source path for error messages
      postscript - the parsed postscript
      Throws:
      IOException
    • getFileSystem

      protected FileSystem getFileSystem() throws IOException
      Throws:
      IOException
    • getFileSystemSupplier

      protected Supplier<FileSystem> getFileSystemSupplier()
    • getWriterVersion

      public static OrcFile.WriterVersion getWriterVersion(int writerVersion)
      Get the WriterVersion based on the ORC file postscript.
      Parameters:
      writerVersion - the integer writer version
      Returns:
      the version of the software that produced the file
    • extractMetadata

      public static OrcProto.Metadata extractMetadata(ByteBuffer bb, int metadataAbsPos, int metadataSize, InStream.StreamOptions options) throws IOException
      Throws:
      IOException
    • extractFileTail

      public static OrcTail extractFileTail(ByteBuffer buffer) throws IOException
      Deprecated.
      Use extractFileTail(FileSystem, Path, long) instead. This is for backward compatibility.
      Throws:
      IOException
    • getCompressionBlockSize

      public static int getCompressionBlockSize(OrcProto.PostScript postScript)
      Read compression block size from the postscript if it is set; otherwise, use the same 256k default the C++ implementation uses.
    • extractFileTail

      public static OrcTail extractFileTail(ByteBuffer buffer, long fileLen, long modificationTime) throws IOException
      Deprecated.
      Use extractFileTail(FileSystem, Path, long) instead. This is for backward compatibility.
      Throws:
      IOException
    • extractFileTail

      protected OrcTail extractFileTail(FileSystem fs, Path path, long maxFileLength) throws IOException
      Throws:
      IOException
    • getSerializedFileFooter

      public ByteBuffer getSerializedFileFooter()
      Specified by:
      getSerializedFileFooter in interface Reader
      Returns:
      Serialized file metadata read from disk for the purposes of caching, etc.
    • writerUsedProlepticGregorian

      public boolean writerUsedProlepticGregorian()
      Description copied from interface: Reader
      Was the file written using the proleptic Gregorian calendar.
      Specified by:
      writerUsedProlepticGregorian in interface Reader
    • getConvertToProlepticGregorian

      public boolean getConvertToProlepticGregorian()
      Description copied from interface: Reader
      Should the returned values use the proleptic Gregorian calendar?
      Specified by:
      getConvertToProlepticGregorian in interface Reader
    • options

      public Reader.Options options()
      Description copied from interface: Reader
      Create a default options object that can be customized for creating a RecordReader.
      Specified by:
      options in interface Reader
      Returns:
      a new default Options object
    • rows

      public RecordReader rows() throws IOException
      Description copied from interface: Reader
      Create a RecordReader that reads everything with the default options.
      Specified by:
      rows in interface Reader
      Returns:
      a new RecordReader
      Throws:
      IOException
    • rows

      public RecordReader rows(Reader.Options options) throws IOException
      Description copied from interface: Reader
      Create a RecordReader that uses the options given. This method can't be named rows, because many callers used rows(null) before the rows() method was introduced.
      Specified by:
      rows in interface Reader
      Parameters:
      options - the options to read with
      Returns:
      a new RecordReader
      Throws:
      IOException
    • getRawDataSize

      public long getRawDataSize()
      Description copied from interface: Reader
      Get the deserialized data size of the file
      Specified by:
      getRawDataSize in interface Reader
      Returns:
      raw data size
    • getRawDataSizeFromColIndices

      public long getRawDataSizeFromColIndices(List<Integer> colIndices)
      Description copied from interface: Reader
      Get the deserialized data size of the specified columns ids
      Specified by:
      getRawDataSizeFromColIndices in interface Reader
      Parameters:
      colIndices - - internal column id (check orcfiledump for column ids)
      Returns:
      raw data size of columns
    • getRawDataSizeFromColIndices

      public static long getRawDataSizeFromColIndices(List<Integer> colIndices, List<OrcProto.Type> types, List<OrcProto.ColumnStatistics> stats) throws FileFormatException
      Throws:
      FileFormatException
    • getRawDataSizeOfColumns

      public long getRawDataSizeOfColumns(List<String> colNames)
      Description copied from interface: Reader
      Get the deserialized data size of the specified columns
      Specified by:
      getRawDataSizeOfColumns in interface Reader
      Parameters:
      colNames - the list of column names
      Returns:
      raw data size of columns
    • getOrcProtoStripeStatistics

      public List<OrcProto.StripeStatistics> getOrcProtoStripeStatistics()
      Specified by:
      getOrcProtoStripeStatistics in interface Reader
      Returns:
      Stripe statistics, in original protobuf form.
    • getOrcProtoFileStatistics

      public List<OrcProto.ColumnStatistics> getOrcProtoFileStatistics()
      Specified by:
      getOrcProtoFileStatistics in interface Reader
      Returns:
      File statistics, in original protobuf form.
    • getStripeStatistics

      public List<StripeStatistics> getStripeStatistics() throws IOException
      Description copied from interface: Reader
      Get the stripe statistics for all of the columns.
      Specified by:
      getStripeStatistics in interface Reader
      Returns:
      a list of the statistics for each stripe in the file
      Throws:
      IOException
    • getStripeStatistics

      public List<StripeStatistics> getStripeStatistics(boolean[] included) throws IOException
      Description copied from interface: Reader
      Get the stripe statistics from the file.
      Specified by:
      getStripeStatistics in interface Reader
      Parameters:
      included - null for all columns or an array where the required columns are selected
      Returns:
      a list of the statistics for each stripe in the file
      Throws:
      IOException
    • getVersionList

      public List<Integer> getVersionList()
      Specified by:
      getVersionList in interface Reader
      Returns:
      List of integers representing version of the file, in order from major to minor.
    • getMetadataSize

      public int getMetadataSize()
      Specified by:
      getMetadataSize in interface Reader
      Returns:
      Gets the size of metadata, in bytes.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException
    • takeFile

      public FSDataInputStream takeFile()
      Take the file from the reader. This allows the first RecordReader to use the same file, but additional RecordReaders will open new handles.
      Returns:
      a file handle, if one is available