Package org.apache.orc.impl.writer
Class TreeWriterBase
java.lang.Object
org.apache.orc.impl.writer.TreeWriterBase
- All Implemented Interfaces:
TreeWriter
- Direct Known Subclasses:
BinaryTreeWriter,BooleanTreeWriter,ByteTreeWriter,DateTreeWriter,Decimal64TreeWriter,DecimalTreeWriter,DoubleTreeWriter,FloatTreeWriter,GeospatialTreeWriter,IntegerTreeWriter,ListTreeWriter,MapTreeWriter,StringBaseTreeWriter,StructTreeWriter,TimestampTreeWriter,UnionTreeWriter
The parent class of all of the writers for each column. Each column
is written by an instance of this class. The compound types (struct,
list, map, and union) have children tree writers that write the children
types.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.orc.impl.writer.TreeWriter
TreeWriter.Factory -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final BloomFilterprotected final OrcProto.BloomFilter.Builderprotected final BloomFilterUtf8protected final WriterContextprotected final booleanprotected final WriterEncryptionVariantprotected final ColumnStatisticsImplprotected final intprotected final ColumnStatisticsImplprotected final BitFieldWriterprotected final TypeDescriptionprotected final ColumnStatisticsImpl -
Method Summary
Modifier and TypeMethodDescriptionvoidaddStripeStatistics(StripeStatistics[] stats) During a stripe append, we need to handle the stripe statistics.voidCreate a row index entry with the previous location and the current index statistics.longEstimate how much memory the writer is consuming excluding the streams.voidFlush the TreeWriter streamvoidgetCurrentStatistics(ColumnStatistics[] output) Get the current file statistics for each column.protected OrcProto.RowIndex.Builderprotected OrcProto.RowIndexEntry.Builderprotected ColumnStatisticsImplvoidprepareStripe(int stripeId) Set up for the next stripe.voidwriteBatch(org.apache.hadoop.hive.ql.exec.vector.ColumnVector vector, int offset, int length) Write the values from the given vector from offset for length elements.voidWrite the FileStatistics for each column in each encryption variant.voidwriteRootBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch, int offset, int length) Handle the top level object write.voidwriteStripe(int requiredIndexEntries) Write the stripe out to the file.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.orc.impl.writer.TreeWriter
getRawDataSize
-
Field Details
-
id
protected final int id -
isPresent
-
schema
-
encryption
-
indexStatistics
-
stripeColStatistics
-
fileStatistics
-
rowIndexPosition
-
bloomFilter
-
bloomFilterUtf8
-
createBloomFilter
protected final boolean createBloomFilter -
bloomFilterEntry
-
context
-
-
Method Details
-
getRowIndex
-
getStripeStatistics
-
getRowIndexEntry
-
writeRootBatch
public void writeRootBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch, int offset, int length) throws IOException Handle the top level object write. This default method is used for all types except structs, which are the typical case. VectorizedRowBatch assumes the top level object is a struct, so we use the first column for all other types.- Specified by:
writeRootBatchin interfaceTreeWriter- Parameters:
batch- the batch to write fromoffset- the row to start onlength- the number of rows to write- Throws:
IOException
-
writeBatch
public void writeBatch(org.apache.hadoop.hive.ql.exec.vector.ColumnVector vector, int offset, int length) throws IOException Write the values from the given vector from offset for length elements.- Specified by:
writeBatchin interfaceTreeWriter- Parameters:
vector- the vector to write fromoffset- the first value from the vector to writelength- the number of values from the vector to write- Throws:
IOException
-
prepareStripe
public void prepareStripe(int stripeId) Description copied from interface:TreeWriterSet up for the next stripe.- Specified by:
prepareStripein interfaceTreeWriter- Parameters:
stripeId- the next stripe id
-
flushStreams
Description copied from interface:TreeWriterFlush the TreeWriter stream- Specified by:
flushStreamsin interfaceTreeWriter- Throws:
IOException
-
writeStripe
Description copied from interface:TreeWriterWrite the stripe out to the file.- Specified by:
writeStripein interfaceTreeWriter- Parameters:
requiredIndexEntries- the number of index entries that are required. this is to check to make sure the row index is well formed.- Throws:
IOException
-
createRowIndexEntry
Create a row index entry with the previous location and the current index statistics. Also merges the index statistics into the file statistics before they are cleared. Finally, it records the start of the next index and ensures all of the children columns also create an entry.- Specified by:
createRowIndexEntryin interfaceTreeWriter- Throws:
IOException
-
addStripeStatistics
Description copied from interface:TreeWriterDuring a stripe append, we need to handle the stripe statistics.- Specified by:
addStripeStatisticsin interfaceTreeWriter- Parameters:
stats- the statistics for the new stripe across the encryption variants- Throws:
IOException
-
estimateMemory
public long estimateMemory()Estimate how much memory the writer is consuming excluding the streams.- Specified by:
estimateMemoryin interfaceTreeWriter- Returns:
- the number of bytes.
-
writeFileStatistics
Description copied from interface:TreeWriterWrite the FileStatistics for each column in each encryption variant.- Specified by:
writeFileStatisticsin interfaceTreeWriter- Throws:
IOException
-
getCurrentStatistics
Description copied from interface:TreeWriterGet the current file statistics for each column. If a column is encrypted, the encrypted variant statistics are used.- Specified by:
getCurrentStatisticsin interfaceTreeWriter- Parameters:
output- an array that is filled in with the results
-