Package org.apache.orc
Interface Writer
- All Superinterfaces:
AutoCloseable,Closeable
- All Known Subinterfaces:
WriterInternal
- All Known Implementing Classes:
WriterImpl,WriterImplV2
The interface for writing ORC files.
- Since:
- 1.1.0
-
Method Summary
Modifier and TypeMethodDescriptionvoidaddRowBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch) Add a row batch to the ORC file.voidaddUserMetadata(String key, ByteBuffer value) Add arbitrary meta-data to the ORC file.voidappendStripe(byte[] stripe, int offset, int length, StripeInformation stripeInfo, OrcProto.StripeStatistics stripeStatistics) Fast stripe append to ORC file.voidappendStripe(byte[] stripe, int offset, int length, StripeInformation stripeInfo, StripeStatistics[] stripeStatistics) Fast stripe append to ORC file.voidappendUserMetadata(List<OrcProto.UserMetadataItem> userMetadata) Deprecated.voidclose()Flush all of the buffers and close the file.longEstimate the memory currently used by the writer to buffer the stripe.longReturn the number of rows in file.longReturn the deserialized data size.Get the schema for this writerGet the statistics about the columns in the file.Get the stripe information about the file.longWrite an intermediate footer on the file such that if the file is truncated to the returned offset, it would be a valid ORC file.
-
Method Details
-
getSchema
TypeDescription getSchema()Get the schema for this writer- Returns:
- the file schema
- Since:
- 1.1.0
-
addUserMetadata
Add arbitrary meta-data to the ORC file. This may be called at any point until the Writer is closed. If the same key is passed a second time, the second value will replace the first.- Parameters:
key- a key to label the data with.value- the contents of the metadata.- Since:
- 1.1.0
-
addRowBatch
Add a row batch to the ORC file.- Parameters:
batch- the rows to add- Throws:
IOException- Since:
- 1.1.0
-
close
Flush all of the buffers and close the file. No methods on this writer should be called afterwards.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException- Since:
- 1.1.0
-
getRawDataSize
long getRawDataSize()Return the deserialized data size. Raw data size will be compute when writing the file footer. Hence raw data size value will be available only after closing the writer.- Returns:
- raw data size
- Since:
- 1.1.0
-
getNumberOfRows
long getNumberOfRows()Return the number of rows in file. Row count gets updated when flushing the stripes. To get accurate row count this method should be called after closing the writer.- Returns:
- row count
- Since:
- 1.1.0
-
appendStripe
void appendStripe(byte[] stripe, int offset, int length, StripeInformation stripeInfo, OrcProto.StripeStatistics stripeStatistics) throws IOException Fast stripe append to ORC file. This interface is used for fast ORC file merge with other ORC files. When merging, the file to be merged should pass stripe in binary form along with stripe information and stripe statistics. After appending last stripe of a file, use appendUserMetadata() to append any user metadata. This form only supports files with no column encryption. UseappendStripe(byte[], int, int, StripeInformation, StripeStatistics[])for files with encryption.- Parameters:
stripe- - stripe as byte arrayoffset- - offset within byte arraylength- - length of stripe within byte arraystripeInfo- - stripe informationstripeStatistics- - unencrypted stripe statistics- Throws:
IOException- Since:
- 1.1.0
-
appendStripe
void appendStripe(byte[] stripe, int offset, int length, StripeInformation stripeInfo, StripeStatistics[] stripeStatistics) throws IOException Fast stripe append to ORC file. This interface is used for fast ORC file merge with other ORC files. When merging, the file to be merged should pass stripe in binary form along with stripe information and stripe statistics. After appending last stripe of a file, useaddUserMetadata(String, ByteBuffer)to append any user metadata.- Parameters:
stripe- - stripe as byte arrayoffset- - offset within byte arraylength- - length of stripe within byte arraystripeInfo- - stripe informationstripeStatistics- - stripe statistics with the last one being for the unencrypted data and the others being for each encryption variant.- Throws:
IOException- Since:
- 1.6.0
-
appendUserMetadata
Deprecated.useaddUserMetadata(String, ByteBuffer)insteadUpdate the current user metadata with a list of new values.- Parameters:
userMetadata- - user metadata- Since:
- 1.1.0
-
getStatistics
Get the statistics about the columns in the file. The output of this is based on the time at which it is called. It shall use all of the currently written data to provide the statistics. Please note there are costs involved with invoking this method and should be used judiciously.- Returns:
- the information about the column
- Throws:
IOException- Since:
- 1.1.0
-
getStripes
Get the stripe information about the file. The output of this is based on the time at which it is called. It shall return stripes that have been completed. After the writer is closed this shall give the complete stripe information.- Returns:
- stripe information
- Throws:
IOException- Since:
- 1.6.8
-
estimateMemory
long estimateMemory()Estimate the memory currently used by the writer to buffer the stripe. `This method help write engine to control the refresh policy of the ORC.`- Returns:
- the number of bytes
-
addUserMetadata(String, ByteBuffer)instead