Package org.apache.orc.impl
Class WriterImpl
java.lang.Object
org.apache.orc.impl.WriterImpl
- All Implemented Interfaces:
Closeable
,AutoCloseable
,WriterInternal
,MemoryManager.Callback
,Writer
- Direct Known Subclasses:
WriterImplV2
An ORC file writer. The file is divided into stripes, which is the natural
unit of work when reading. Each stripe is buffered in memory until the
memory reaches the stripe size and then it is written out broken down by
columns. Each column is written by a TreeWriter that is specific to that
type of column. TreeWriters may have children TreeWriters that handle the
sub-types. Each of the TreeWriters writes the column's data as a set of
streams.
This class is unsynchronized like most Stream objects, so from the creation of an OrcFile and all access to a single instance has to be from a single thread.
There are no known cases where these happen between different threads today.
Caveat: the MemoryManager is created during WriterOptions create, that has to be confined to a single thread as well.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
addRowBatch
(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch) Add a row batch to the ORC file.void
addUserMetadata
(String name, ByteBuffer value) Add arbitrary meta-data to the ORC file.void
appendStripe
(byte[] stripe, int offset, int length, StripeInformation stripeInfo, OrcProto.StripeStatistics stripeStatistics) Fast stripe append to ORC file.void
appendStripe
(byte[] stripe, int offset, int length, StripeInformation stripeInfo, StripeStatistics[] stripeStatistics) Fast stripe append to ORC file.void
appendUserMetadata
(List<OrcProto.UserMetadataItem> userMetadata) Update the current user metadata with a list of new values.boolean
checkMemory
(double newScale) The scale factor for the stripe size has changed and thus the writer should adjust their desired size appropriately.void
close()
Flush all of the buffers and close the file.static CompressionCodec
createCodec
(CompressionKind kind) long
Estimate the memory currently used by the writer to buffer the stripe.static int
getEstimatedBufferSize
(long stripeSize, int numColumns, int bs) long
Row count gets updated when flushing the stripes.long
Raw data size will be compute when writing the file footer.Get the schema for this writerGet the statistics about the columns in the file.Get the stripe information about the file.void
increaseCompressionSize
(int newSize) Increase the buffer size for this writer.long
Write an intermediate footer on the file such that if the file is truncated to the returned offset, it would be a valid ORC file.
-
Constructor Details
-
WriterImpl
- Throws:
IOException
-
-
Method Details
-
getEstimatedBufferSize
public static int getEstimatedBufferSize(long stripeSize, int numColumns, int bs) -
increaseCompressionSize
public void increaseCompressionSize(int newSize) Description copied from interface:WriterInternal
Increase the buffer size for this writer. This function is internal only and should only be called by the ORC file merger.- Specified by:
increaseCompressionSize
in interfaceWriterInternal
- Parameters:
newSize
- the new buffer size.
-
createCodec
-
checkMemory
Description copied from interface:MemoryManager.Callback
The scale factor for the stripe size has changed and thus the writer should adjust their desired size appropriately.- Specified by:
checkMemory
in interfaceMemoryManager.Callback
- Parameters:
newScale
- the current scale factor for memory allocations- Returns:
- true if the writer was over the limit
- Throws:
IOException
-
getSchema
Description copied from interface:Writer
Get the schema for this writer -
addUserMetadata
Description copied from interface:Writer
Add arbitrary meta-data to the ORC file. This may be called at any point until the Writer is closed. If the same key is passed a second time, the second value will replace the first.- Specified by:
addUserMetadata
in interfaceWriter
- Parameters:
name
- a key to label the data with.value
- the contents of the metadata.
-
addRowBatch
public void addRowBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch) throws IOException Description copied from interface:Writer
Add a row batch to the ORC file.- Specified by:
addRowBatch
in interfaceWriter
- Parameters:
batch
- the rows to add- Throws:
IOException
-
close
Description copied from interface:Writer
Flush all of the buffers and close the file. No methods on this writer should be called afterwards.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in interfaceWriter
- Throws:
IOException
-
getRawDataSize
public long getRawDataSize()Raw data size will be compute when writing the file footer. Hence raw data size value will be available only after closing the writer.- Specified by:
getRawDataSize
in interfaceWriter
- Returns:
- raw data size
-
getNumberOfRows
public long getNumberOfRows()Row count gets updated when flushing the stripes. To get accurate row count call this method after writer is closed.- Specified by:
getNumberOfRows
in interfaceWriter
- Returns:
- row count
-
appendStripe
public void appendStripe(byte[] stripe, int offset, int length, StripeInformation stripeInfo, OrcProto.StripeStatistics stripeStatistics) throws IOException Description copied from interface:Writer
Fast stripe append to ORC file. This interface is used for fast ORC file merge with other ORC files. When merging, the file to be merged should pass stripe in binary form along with stripe information and stripe statistics. After appending last stripe of a file, use appendUserMetadata() to append any user metadata. This form only supports files with no column encryption. UseWriter.appendStripe(byte[], int, int, StripeInformation, StripeStatistics[])
for files with encryption.- Specified by:
appendStripe
in interfaceWriter
- Parameters:
stripe
- - stripe as byte arrayoffset
- - offset within byte arraylength
- - length of stripe within byte arraystripeInfo
- - stripe informationstripeStatistics
- - unencrypted stripe statistics- Throws:
IOException
-
appendStripe
public void appendStripe(byte[] stripe, int offset, int length, StripeInformation stripeInfo, StripeStatistics[] stripeStatistics) throws IOException Description copied from interface:Writer
Fast stripe append to ORC file. This interface is used for fast ORC file merge with other ORC files. When merging, the file to be merged should pass stripe in binary form along with stripe information and stripe statistics. After appending last stripe of a file, useWriter.addUserMetadata(String, ByteBuffer)
to append any user metadata.- Specified by:
appendStripe
in interfaceWriter
- Parameters:
stripe
- - stripe as byte arrayoffset
- - offset within byte arraylength
- - length of stripe within byte arraystripeInfo
- - stripe informationstripeStatistics
- - stripe statistics with the last one being for the unencrypted data and the others being for each encryption variant.- Throws:
IOException
-
appendUserMetadata
Description copied from interface:Writer
Update the current user metadata with a list of new values.- Specified by:
appendUserMetadata
in interfaceWriter
- Parameters:
userMetadata
- - user metadata
-
getStatistics
Description copied from interface:Writer
Get the statistics about the columns in the file. The output of this is based on the time at which it is called. It shall use all of the currently written data to provide the statistics. Please note there are costs involved with invoking this method and should be used judiciously.- Specified by:
getStatistics
in interfaceWriter
- Returns:
- the information about the column
-
getStripes
Description copied from interface:Writer
Get the stripe information about the file. The output of this is based on the time at which it is called. It shall return stripes that have been completed. After the writer is closed this shall give the complete stripe information.- Specified by:
getStripes
in interfaceWriter
- Returns:
- stripe information
- Throws:
IOException
-
getCompressionCodec
-
estimateMemory
public long estimateMemory()Description copied from interface:Writer
Estimate the memory currently used by the writer to buffer the stripe. `This method help write engine to control the refresh policy of the ORC.`- Specified by:
estimateMemory
in interfaceWriter
- Returns:
- the number of bytes
-