Package org.apache.orc
Interface Writer
- All Superinterfaces:
AutoCloseable
,Closeable
- All Known Subinterfaces:
WriterInternal
- All Known Implementing Classes:
WriterImpl
,WriterImplV2
The interface for writing ORC files.
- Since:
- 1.1.0
-
Method Summary
Modifier and TypeMethodDescriptionvoid
addRowBatch
(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch) Add a row batch to the ORC file.void
addUserMetadata
(String key, ByteBuffer value) Add arbitrary meta-data to the ORC file.void
appendStripe
(byte[] stripe, int offset, int length, StripeInformation stripeInfo, OrcProto.StripeStatistics stripeStatistics) Fast stripe append to ORC file.void
appendStripe
(byte[] stripe, int offset, int length, StripeInformation stripeInfo, StripeStatistics[] stripeStatistics) Fast stripe append to ORC file.void
appendUserMetadata
(List<OrcProto.UserMetadataItem> userMetadata) Deprecated.void
close()
Flush all of the buffers and close the file.long
Estimate the memory currently used by the writer to buffer the stripe.long
Return the number of rows in file.long
Return the deserialized data size.Get the schema for this writerGet the statistics about the columns in the file.Get the stripe information about the file.long
Write an intermediate footer on the file such that if the file is truncated to the returned offset, it would be a valid ORC file.
-
Method Details
-
getSchema
TypeDescription getSchema()Get the schema for this writer- Returns:
- the file schema
- Since:
- 1.1.0
-
addUserMetadata
Add arbitrary meta-data to the ORC file. This may be called at any point until the Writer is closed. If the same key is passed a second time, the second value will replace the first.- Parameters:
key
- a key to label the data with.value
- the contents of the metadata.- Since:
- 1.1.0
-
addRowBatch
Add a row batch to the ORC file.- Parameters:
batch
- the rows to add- Throws:
IOException
- Since:
- 1.1.0
-
close
Flush all of the buffers and close the file. No methods on this writer should be called afterwards.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
- Since:
- 1.1.0
-
getRawDataSize
long getRawDataSize()Return the deserialized data size. Raw data size will be compute when writing the file footer. Hence raw data size value will be available only after closing the writer.- Returns:
- raw data size
- Since:
- 1.1.0
-
getNumberOfRows
long getNumberOfRows()Return the number of rows in file. Row count gets updated when flushing the stripes. To get accurate row count this method should be called after closing the writer.- Returns:
- row count
- Since:
- 1.1.0
-
appendStripe
void appendStripe(byte[] stripe, int offset, int length, StripeInformation stripeInfo, OrcProto.StripeStatistics stripeStatistics) throws IOException Fast stripe append to ORC file. This interface is used for fast ORC file merge with other ORC files. When merging, the file to be merged should pass stripe in binary form along with stripe information and stripe statistics. After appending last stripe of a file, use appendUserMetadata() to append any user metadata. This form only supports files with no column encryption. UseappendStripe(byte[], int, int, StripeInformation, StripeStatistics[])
for files with encryption.- Parameters:
stripe
- - stripe as byte arrayoffset
- - offset within byte arraylength
- - length of stripe within byte arraystripeInfo
- - stripe informationstripeStatistics
- - unencrypted stripe statistics- Throws:
IOException
- Since:
- 1.1.0
-
appendStripe
void appendStripe(byte[] stripe, int offset, int length, StripeInformation stripeInfo, StripeStatistics[] stripeStatistics) throws IOException Fast stripe append to ORC file. This interface is used for fast ORC file merge with other ORC files. When merging, the file to be merged should pass stripe in binary form along with stripe information and stripe statistics. After appending last stripe of a file, useaddUserMetadata(String, ByteBuffer)
to append any user metadata.- Parameters:
stripe
- - stripe as byte arrayoffset
- - offset within byte arraylength
- - length of stripe within byte arraystripeInfo
- - stripe informationstripeStatistics
- - stripe statistics with the last one being for the unencrypted data and the others being for each encryption variant.- Throws:
IOException
- Since:
- 1.6.0
-
appendUserMetadata
Deprecated.useaddUserMetadata(String, ByteBuffer)
insteadUpdate the current user metadata with a list of new values.- Parameters:
userMetadata
- - user metadata- Since:
- 1.1.0
-
getStatistics
Get the statistics about the columns in the file. The output of this is based on the time at which it is called. It shall use all of the currently written data to provide the statistics. Please note there are costs involved with invoking this method and should be used judiciously.- Returns:
- the information about the column
- Throws:
IOException
- Since:
- 1.1.0
-
getStripes
Get the stripe information about the file. The output of this is based on the time at which it is called. It shall return stripes that have been completed. After the writer is closed this shall give the complete stripe information.- Returns:
- stripe information
- Throws:
IOException
- Since:
- 1.6.8
-
estimateMemory
long estimateMemory()Estimate the memory currently used by the writer to buffer the stripe. `This method help write engine to control the refresh policy of the ORC.`- Returns:
- the number of bytes
-
addUserMetadata(String, ByteBuffer)
instead