Package org.apache.orc.impl.writer
Class TreeWriterBase
java.lang.Object
org.apache.orc.impl.writer.TreeWriterBase
- All Implemented Interfaces:
TreeWriter
- Direct Known Subclasses:
BinaryTreeWriter
,BooleanTreeWriter
,ByteTreeWriter
,DateTreeWriter
,Decimal64TreeWriter
,DecimalTreeWriter
,DoubleTreeWriter
,FloatTreeWriter
,IntegerTreeWriter
,ListTreeWriter
,MapTreeWriter
,StringBaseTreeWriter
,StructTreeWriter
,TimestampTreeWriter
,UnionTreeWriter
The parent class of all of the writers for each column. Each column
is written by an instance of this class. The compound types (struct,
list, map, and union) have children tree writers that write the children
types.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.orc.impl.writer.TreeWriter
TreeWriter.Factory
-
Field Summary
Modifier and TypeFieldDescriptionprotected final BloomFilter
protected final OrcProto.BloomFilter.Builder
protected final BloomFilterUtf8
protected final WriterContext
protected final boolean
protected final WriterEncryptionVariant
protected final ColumnStatisticsImpl
protected final int
protected final ColumnStatisticsImpl
protected final BitFieldWriter
protected final TypeDescription
protected final ColumnStatisticsImpl
-
Method Summary
Modifier and TypeMethodDescriptionvoid
addStripeStatistics
(StripeStatistics[] stats) During a stripe append, we need to handle the stripe statistics.void
Create a row index entry with the previous location and the current index statistics.long
Estimate how much memory the writer is consuming excluding the streams.void
Flush the TreeWriter streamvoid
getCurrentStatistics
(ColumnStatistics[] output) Get the current file statistics for each column.protected OrcProto.RowIndex.Builder
protected OrcProto.RowIndexEntry.Builder
protected ColumnStatisticsImpl
void
prepareStripe
(int stripeId) Set up for the next stripe.void
writeBatch
(org.apache.hadoop.hive.ql.exec.vector.ColumnVector vector, int offset, int length) Write the values from the given vector from offset for length elements.void
Write the FileStatistics for each column in each encryption variant.void
writeRootBatch
(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch, int offset, int length) Handle the top level object write.void
writeStripe
(int requiredIndexEntries) Write the stripe out to the file.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.orc.impl.writer.TreeWriter
getRawDataSize
-
Field Details
-
id
protected final int id -
isPresent
-
schema
-
encryption
-
indexStatistics
-
stripeColStatistics
-
fileStatistics
-
rowIndexPosition
-
bloomFilter
-
bloomFilterUtf8
-
createBloomFilter
protected final boolean createBloomFilter -
bloomFilterEntry
-
context
-
-
Method Details
-
getRowIndex
-
getStripeStatistics
-
getRowIndexEntry
-
writeRootBatch
public void writeRootBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch, int offset, int length) throws IOException Handle the top level object write. This default method is used for all types except structs, which are the typical case. VectorizedRowBatch assumes the top level object is a struct, so we use the first column for all other types.- Specified by:
writeRootBatch
in interfaceTreeWriter
- Parameters:
batch
- the batch to write fromoffset
- the row to start onlength
- the number of rows to write- Throws:
IOException
-
writeBatch
public void writeBatch(org.apache.hadoop.hive.ql.exec.vector.ColumnVector vector, int offset, int length) throws IOException Write the values from the given vector from offset for length elements.- Specified by:
writeBatch
in interfaceTreeWriter
- Parameters:
vector
- the vector to write fromoffset
- the first value from the vector to writelength
- the number of values from the vector to write- Throws:
IOException
-
prepareStripe
public void prepareStripe(int stripeId) Description copied from interface:TreeWriter
Set up for the next stripe.- Specified by:
prepareStripe
in interfaceTreeWriter
- Parameters:
stripeId
- the next stripe id
-
flushStreams
Description copied from interface:TreeWriter
Flush the TreeWriter stream- Specified by:
flushStreams
in interfaceTreeWriter
- Throws:
IOException
-
writeStripe
Description copied from interface:TreeWriter
Write the stripe out to the file.- Specified by:
writeStripe
in interfaceTreeWriter
- Parameters:
requiredIndexEntries
- the number of index entries that are required. this is to check to make sure the row index is well formed.- Throws:
IOException
-
createRowIndexEntry
Create a row index entry with the previous location and the current index statistics. Also merges the index statistics into the file statistics before they are cleared. Finally, it records the start of the next index and ensures all of the children columns also create an entry.- Specified by:
createRowIndexEntry
in interfaceTreeWriter
- Throws:
IOException
-
addStripeStatistics
Description copied from interface:TreeWriter
During a stripe append, we need to handle the stripe statistics.- Specified by:
addStripeStatistics
in interfaceTreeWriter
- Parameters:
stats
- the statistics for the new stripe across the encryption variants- Throws:
IOException
-
estimateMemory
public long estimateMemory()Estimate how much memory the writer is consuming excluding the streams.- Specified by:
estimateMemory
in interfaceTreeWriter
- Returns:
- the number of bytes.
-
writeFileStatistics
Description copied from interface:TreeWriter
Write the FileStatistics for each column in each encryption variant.- Specified by:
writeFileStatistics
in interfaceTreeWriter
- Throws:
IOException
-
getCurrentStatistics
Description copied from interface:TreeWriter
Get the current file statistics for each column. If a column is encrypted, the encrypted variant statistics are used.- Specified by:
getCurrentStatistics
in interfaceTreeWriter
- Parameters:
output
- an array that is filled in with the results
-