Interface Writer

All Superinterfaces:
AutoCloseable, Closeable
All Known Subinterfaces:
WriterInternal
All Known Implementing Classes:
WriterImpl, WriterImplV2

public interface Writer extends Closeable
The interface for writing ORC files.
Since:
1.1.0
  • Method Details

    • getSchema

      TypeDescription getSchema()
      Get the schema for this writer
      Returns:
      the file schema
      Since:
      1.1.0
    • addUserMetadata

      void addUserMetadata(String key, ByteBuffer value)
      Add arbitrary meta-data to the ORC file. This may be called at any point until the Writer is closed. If the same key is passed a second time, the second value will replace the first.
      Parameters:
      key - a key to label the data with.
      value - the contents of the metadata.
      Since:
      1.1.0
    • addRowBatch

      void addRowBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch) throws IOException
      Add a row batch to the ORC file.
      Parameters:
      batch - the rows to add
      Throws:
      IOException
      Since:
      1.1.0
    • close

      void close() throws IOException
      Flush all of the buffers and close the file. No methods on this writer should be called afterwards.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException
      Since:
      1.1.0
    • getRawDataSize

      long getRawDataSize()
      Return the deserialized data size. Raw data size will be compute when writing the file footer. Hence raw data size value will be available only after closing the writer.
      Returns:
      raw data size
      Since:
      1.1.0
    • getNumberOfRows

      long getNumberOfRows()
      Return the number of rows in file. Row count gets updated when flushing the stripes. To get accurate row count this method should be called after closing the writer.
      Returns:
      row count
      Since:
      1.1.0
    • writeIntermediateFooter

      long writeIntermediateFooter() throws IOException
      Write an intermediate footer on the file such that if the file is truncated to the returned offset, it would be a valid ORC file.
      Returns:
      the offset that would be a valid end location for an ORC file
      Throws:
      IOException
      Since:
      1.1.0
    • appendStripe

      void appendStripe(byte[] stripe, int offset, int length, StripeInformation stripeInfo, OrcProto.StripeStatistics stripeStatistics) throws IOException
      Fast stripe append to ORC file. This interface is used for fast ORC file merge with other ORC files. When merging, the file to be merged should pass stripe in binary form along with stripe information and stripe statistics. After appending last stripe of a file, use appendUserMetadata() to append any user metadata. This form only supports files with no column encryption. Use appendStripe(byte[], int, int, StripeInformation, StripeStatistics[]) for files with encryption.
      Parameters:
      stripe - - stripe as byte array
      offset - - offset within byte array
      length - - length of stripe within byte array
      stripeInfo - - stripe information
      stripeStatistics - - unencrypted stripe statistics
      Throws:
      IOException
      Since:
      1.1.0
    • appendStripe

      void appendStripe(byte[] stripe, int offset, int length, StripeInformation stripeInfo, StripeStatistics[] stripeStatistics) throws IOException
      Fast stripe append to ORC file. This interface is used for fast ORC file merge with other ORC files. When merging, the file to be merged should pass stripe in binary form along with stripe information and stripe statistics. After appending last stripe of a file, use addUserMetadata(String, ByteBuffer) to append any user metadata.
      Parameters:
      stripe - - stripe as byte array
      offset - - offset within byte array
      length - - length of stripe within byte array
      stripeInfo - - stripe information
      stripeStatistics - - stripe statistics with the last one being for the unencrypted data and the others being for each encryption variant.
      Throws:
      IOException
      Since:
      1.6.0
    • appendUserMetadata

      void appendUserMetadata(List<OrcProto.UserMetadataItem> userMetadata)
      Deprecated.
      Update the current user metadata with a list of new values.
      Parameters:
      userMetadata - - user metadata
      Since:
      1.1.0
    • getStatistics

      ColumnStatistics[] getStatistics() throws IOException
      Get the statistics about the columns in the file. The output of this is based on the time at which it is called. It shall use all of the currently written data to provide the statistics. Please note there are costs involved with invoking this method and should be used judiciously.
      Returns:
      the information about the column
      Throws:
      IOException
      Since:
      1.1.0
    • getStripes

      List<StripeInformation> getStripes() throws IOException
      Get the stripe information about the file. The output of this is based on the time at which it is called. It shall return stripes that have been completed. After the writer is closed this shall give the complete stripe information.
      Returns:
      stripe information
      Throws:
      IOException
      Since:
      1.6.8
    • estimateMemory

      long estimateMemory()
      Estimate the memory currently used by the writer to buffer the stripe. `This method help write engine to control the refresh policy of the ORC.`
      Returns:
      the number of bytes