Package org.apache.orc.impl.writer
Interface TreeWriter
- All Known Implementing Classes:
BinaryTreeWriter,BooleanTreeWriter,ByteTreeWriter,CharTreeWriter,DateTreeWriter,Decimal64TreeWriter,DecimalTreeWriter,DoubleTreeWriter,EncryptionTreeWriter,FloatTreeWriter,GeospatialTreeWriter,IntegerTreeWriter,ListTreeWriter,MapTreeWriter,StringBaseTreeWriter,StringTreeWriter,StructTreeWriter,TimestampTreeWriter,TreeWriterBase,UnionTreeWriter,VarcharTreeWriter
public interface TreeWriter
The writers for the specific writers of each type. This provides
the generic API that they must all implement.
-
Nested Class Summary
Nested Classes -
Method Summary
Modifier and TypeMethodDescriptionvoidaddStripeStatistics(StripeStatistics[] stripeStatistics) During a stripe append, we need to handle the stripe statistics.voidCreate a row index entry at the current point in the stripe.longEstimate the memory currently used to buffer the stripe.voidFlush the TreeWriter streamvoidgetCurrentStatistics(ColumnStatistics[] output) Get the current file statistics for each column.longEstimate the memory used if the file was read into Hive's Writable types.voidprepareStripe(int stripeId) Set up for the next stripe.voidwriteBatch(org.apache.hadoop.hive.ql.exec.vector.ColumnVector vector, int offset, int length) Write a ColumnVector to the file.voidWrite the FileStatistics for each column in each encryption variant.voidwriteRootBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch, int offset, int length) Write a VectorizedRowBatch to the file.voidwriteStripe(int requiredIndexEntries) Write the stripe out to the file.
-
Method Details
-
estimateMemory
long estimateMemory()Estimate the memory currently used to buffer the stripe.- Returns:
- the number of bytes
-
getRawDataSize
long getRawDataSize()Estimate the memory used if the file was read into Hive's Writable types. This is used as an estimate for the query optimizer.- Returns:
- the number of bytes
-
prepareStripe
void prepareStripe(int stripeId) Set up for the next stripe.- Parameters:
stripeId- the next stripe id
-
writeRootBatch
void writeRootBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch, int offset, int length) throws IOException Write a VectorizedRowBatch to the file. This is called by the WriterImplV2 at the top level.- Parameters:
batch- the list of all of the columnsoffset- the first row from the batch to writelength- the number of rows to write- Throws:
IOException
-
writeBatch
void writeBatch(org.apache.hadoop.hive.ql.exec.vector.ColumnVector vector, int offset, int length) throws IOException Write a ColumnVector to the file. This is called recursively by writeRootBatch.- Parameters:
vector- the data to writeoffset- the first value offset to write.length- the number of values to write- Throws:
IOException
-
createRowIndexEntry
Create a row index entry at the current point in the stripe.- Throws:
IOException
-
flushStreams
Flush the TreeWriter stream- Throws:
IOException
-
writeStripe
Write the stripe out to the file.- Parameters:
requiredIndexEntries- the number of index entries that are required. this is to check to make sure the row index is well formed.- Throws:
IOException
-
addStripeStatistics
During a stripe append, we need to handle the stripe statistics.- Parameters:
stripeStatistics- the statistics for the new stripe across the encryption variants- Throws:
IOException
-
writeFileStatistics
Write the FileStatistics for each column in each encryption variant.- Throws:
IOException
-
getCurrentStatistics
Get the current file statistics for each column. If a column is encrypted, the encrypted variant statistics are used.- Parameters:
output- an array that is filled in with the results
-