org.apache.orc.impl.writer.TreeWriterBase

All Implemented Interfaces:: TreeWriter

Direct Known Subclasses:: BinaryTreeWriter, BooleanTreeWriter, ByteTreeWriter, DateTreeWriter, Decimal64TreeWriter, DecimalTreeWriter, DoubleTreeWriter, FloatTreeWriter, IntegerTreeWriter, ListTreeWriter, MapTreeWriter, StringBaseTreeWriter, StructTreeWriter, TimestampTreeWriter, UnionTreeWriter

public abstract class TreeWriterBase extends Object implements TreeWriter

The parent class of all of the writers for each column. Each column is written by an instance of this class. The compound types (struct, list, map, and union) have children tree writers that write the children types.

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.orc.impl.writer.TreeWriter
TreeWriter.Factory
Field Summary

Fields

Modifier and Type

Field

Description

protected final BloomFilter

bloomFilter

protected final OrcProto.BloomFilter.Builder

bloomFilterEntry

protected final BloomFilterUtf8

bloomFilterUtf8

protected final WriterContext

context

protected final boolean

createBloomFilter

protected final WriterEncryptionVariant

encryption

protected final ColumnStatisticsImpl

fileStatistics

protected final int

id

protected final ColumnStatisticsImpl

indexStatistics

protected final BitFieldWriter

isPresent

protected final org.apache.orc.impl.writer.TreeWriterBase.RowIndexPositionRecorder

rowIndexPosition

protected final TypeDescription

schema

protected final ColumnStatisticsImpl

stripeColStatistics
Method Summary

Modifier and Type

Method

Description

void

addStripeStatistics(StripeStatistics[] stats)

During a stripe append, we need to handle the stripe statistics.

void

createRowIndexEntry()

Create a row index entry with the previous location and the current index statistics.

long

estimateMemory()

Estimate how much memory the writer is consuming excluding the streams.

void

flushStreams()

Flush the TreeWriter stream

void

getCurrentStatistics(ColumnStatistics[] output)

Get the current file statistics for each column.

protected OrcProto.RowIndex.Builder

getRowIndex()

protected OrcProto.RowIndexEntry.Builder

getRowIndexEntry()

protected ColumnStatisticsImpl

getStripeStatistics()

void

prepareStripe(int stripeId)

Set up for the next stripe.

void

writeBatch(org.apache.hadoop.hive.ql.exec.vector.ColumnVector vector, int offset, int length)

Write the values from the given vector from offset for length elements.

void

writeFileStatistics()

Write the FileStatistics for each column in each encryption variant.

void

writeRootBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch, int offset, int length)

Handle the top level object write.

void

writeStripe(int requiredIndexEntries)

Write the stripe out to the file.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.orc.impl.writer.TreeWriter
getRawDataSize

Field Details
- id
  
  protected final int id
- isPresent
  
  protected final BitFieldWriter isPresent
- schema
  
  protected final TypeDescription schema
- encryption
  
  protected final WriterEncryptionVariant encryption
- indexStatistics
  
  protected final ColumnStatisticsImpl indexStatistics
- stripeColStatistics
  
  protected final ColumnStatisticsImpl stripeColStatistics
- fileStatistics
  
  protected final ColumnStatisticsImpl fileStatistics
- rowIndexPosition
  
  protected final org.apache.orc.impl.writer.TreeWriterBase.RowIndexPositionRecorder rowIndexPosition
- bloomFilter
  
  protected final BloomFilter bloomFilter
- bloomFilterUtf8
  
  protected final BloomFilterUtf8 bloomFilterUtf8
- createBloomFilter
  
  protected final boolean createBloomFilter
- bloomFilterEntry
  
  protected final OrcProto.BloomFilter.Builder bloomFilterEntry
- context
  
  protected final WriterContext context
Method Details
- getRowIndex
  
  protected OrcProto.RowIndex.Builder getRowIndex()
- getStripeStatistics
  
  protected ColumnStatisticsImpl getStripeStatistics()
- getRowIndexEntry
  
  protected OrcProto.RowIndexEntry.Builder getRowIndexEntry()
- writeRootBatch
  
  public void writeRootBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch, int offset, int length) throws IOException
  
  Handle the top level object write. This default method is used for all types except structs, which are the typical case. VectorizedRowBatch assumes the top level object is a struct, so we use the first column for all other types.
  
  Specified by:
  
  writeRootBatch in interface TreeWriter
  
  Parameters:
  
  batch - the batch to write from
  
  offset - the row to start on
  
  length - the number of rows to write
  
  Throws:
  
  IOException
- writeBatch
  
  public void writeBatch(org.apache.hadoop.hive.ql.exec.vector.ColumnVector vector, int offset, int length) throws IOException
  
  Write the values from the given vector from offset for length elements.
  
  Specified by:
  
  writeBatch in interface TreeWriter
  
  Parameters:
  
  vector - the vector to write from
  
  offset - the first value from the vector to write
  
  length - the number of values from the vector to write
  
  Throws:
  
  IOException
- prepareStripe
  
  public void prepareStripe(int stripeId)
  
  Description copied from interface: TreeWriter
  
  Set up for the next stripe.
  
  Specified by:
  
  prepareStripe in interface TreeWriter
  
  Parameters:
  
  stripeId - the next stripe id
- flushStreams
  
  public void flushStreams() throws IOException
  
  Description copied from interface: TreeWriter
  
  Flush the TreeWriter stream
  
  Specified by:
  
  flushStreams in interface TreeWriter
  
  Throws:
  
  IOException
- writeStripe
  
  public void writeStripe(int requiredIndexEntries) throws IOException
  
  Description copied from interface: TreeWriter
  
  Write the stripe out to the file.
  
  Specified by:
  
  writeStripe in interface TreeWriter
  
  Parameters:
  
  requiredIndexEntries - the number of index entries that are required. this is to check to make sure the row index is well formed.
  
  Throws:
  
  IOException
- createRowIndexEntry
  
  public void createRowIndexEntry() throws IOException
  
  Create a row index entry with the previous location and the current index statistics. Also merges the index statistics into the file statistics before they are cleared. Finally, it records the start of the next index and ensures all of the children columns also create an entry.
  
  Specified by:
  
  createRowIndexEntry in interface TreeWriter
  
  Throws:
  
  IOException
- addStripeStatistics
  
  public void addStripeStatistics(StripeStatistics[] stats) throws IOException
  
  Description copied from interface: TreeWriter
  
  During a stripe append, we need to handle the stripe statistics.
  
  Specified by:
  
  addStripeStatistics in interface TreeWriter
  
  Parameters:
  
  stats - the statistics for the new stripe across the encryption variants
  
  Throws:
  
  IOException
- estimateMemory
  
  public long estimateMemory()
  
  Estimate how much memory the writer is consuming excluding the streams.
  
  Specified by:
  
  estimateMemory in interface TreeWriter
  
  Returns:
  
  the number of bytes.
- writeFileStatistics
  
  public void writeFileStatistics() throws IOException
  
  Description copied from interface: TreeWriter
  
  Write the FileStatistics for each column in each encryption variant.
  
  Specified by:
  
  writeFileStatistics in interface TreeWriter
  
  Throws:
  
  IOException
- getCurrentStatistics
  
  public void getCurrentStatistics(ColumnStatistics[] output)
  
  Description copied from interface: TreeWriter
  
  Get the current file statistics for each column. If a column is encrypted, the encrypted variant statistics are used.
  
  Specified by:
  
  getCurrentStatistics in interface TreeWriter
  
  Parameters:
  
  output - an array that is filled in with the results

Class TreeWriterBase

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.orc.impl.writer.TreeWriter

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.orc.impl.writer.TreeWriter

Field Details

id

isPresent

schema

encryption

indexStatistics

stripeColStatistics

fileStatistics

rowIndexPosition

bloomFilter

bloomFilterUtf8

createBloomFilter

bloomFilterEntry

context

Method Details

getRowIndex

getStripeStatistics

getRowIndexEntry

writeRootBatch

writeBatch

prepareStripe

flushStreams

writeStripe

createRowIndexEntry

addStripeStatistics

estimateMemory

writeFileStatistics

getCurrentStatistics