Package org.apache.orc.impl.writer
Interface TreeWriter
- All Known Implementing Classes:
BinaryTreeWriter
,BooleanTreeWriter
,ByteTreeWriter
,CharTreeWriter
,DateTreeWriter
,Decimal64TreeWriter
,DecimalTreeWriter
,DoubleTreeWriter
,EncryptionTreeWriter
,FloatTreeWriter
,IntegerTreeWriter
,ListTreeWriter
,MapTreeWriter
,StringBaseTreeWriter
,StringTreeWriter
,StructTreeWriter
,TimestampTreeWriter
,TreeWriterBase
,UnionTreeWriter
,VarcharTreeWriter
public interface TreeWriter
The writers for the specific writers of each type. This provides
the generic API that they must all implement.
-
Nested Class Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
addStripeStatistics
(StripeStatistics[] stripeStatistics) During a stripe append, we need to handle the stripe statistics.void
Create a row index entry at the current point in the stripe.long
Estimate the memory currently used to buffer the stripe.void
Flush the TreeWriter streamvoid
getCurrentStatistics
(ColumnStatistics[] output) Get the current file statistics for each column.long
Estimate the memory used if the file was read into Hive's Writable types.void
prepareStripe
(int stripeId) Set up for the next stripe.void
writeBatch
(org.apache.hadoop.hive.ql.exec.vector.ColumnVector vector, int offset, int length) Write a ColumnVector to the file.void
Write the FileStatistics for each column in each encryption variant.void
writeRootBatch
(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch, int offset, int length) Write a VectorizedRowBatch to the file.void
writeStripe
(int requiredIndexEntries) Write the stripe out to the file.
-
Method Details
-
estimateMemory
long estimateMemory()Estimate the memory currently used to buffer the stripe.- Returns:
- the number of bytes
-
getRawDataSize
long getRawDataSize()Estimate the memory used if the file was read into Hive's Writable types. This is used as an estimate for the query optimizer.- Returns:
- the number of bytes
-
prepareStripe
void prepareStripe(int stripeId) Set up for the next stripe.- Parameters:
stripeId
- the next stripe id
-
writeRootBatch
void writeRootBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch, int offset, int length) throws IOException Write a VectorizedRowBatch to the file. This is called by the WriterImplV2 at the top level.- Parameters:
batch
- the list of all of the columnsoffset
- the first row from the batch to writelength
- the number of rows to write- Throws:
IOException
-
writeBatch
void writeBatch(org.apache.hadoop.hive.ql.exec.vector.ColumnVector vector, int offset, int length) throws IOException Write a ColumnVector to the file. This is called recursively by writeRootBatch.- Parameters:
vector
- the data to writeoffset
- the first value offset to write.length
- the number of values to write- Throws:
IOException
-
createRowIndexEntry
Create a row index entry at the current point in the stripe.- Throws:
IOException
-
flushStreams
Flush the TreeWriter stream- Throws:
IOException
-
writeStripe
Write the stripe out to the file.- Parameters:
requiredIndexEntries
- the number of index entries that are required. this is to check to make sure the row index is well formed.- Throws:
IOException
-
addStripeStatistics
During a stripe append, we need to handle the stripe statistics.- Parameters:
stripeStatistics
- the statistics for the new stripe across the encryption variants- Throws:
IOException
-
writeFileStatistics
Write the FileStatistics for each column in each encryption variant.- Throws:
IOException
-
getCurrentStatistics
Get the current file statistics for each column. If a column is encrypted, the encrypted variant statistics are used.- Parameters:
output
- an array that is filled in with the results
-