Class OrcFile.WriterOptions

java.lang.Object
org.apache.orc.OrcFile.WriterOptions
All Implemented Interfaces:
Cloneable
Enclosing class:
OrcFile

public static class OrcFile.WriterOptions extends Object implements Cloneable
Options for creating ORC file writers.
  • Constructor Details

  • Method Details

    • clone

      public OrcFile.WriterOptions clone()
      Overrides:
      clone in class Object
      Returns:
      a SHALLOW clone
    • fileSystem

      public OrcFile.WriterOptions fileSystem(FileSystem value)
      Provide the filesystem for the path, if the client has it available. If it is not provided, it will be found from the path.
    • overwrite

      public OrcFile.WriterOptions overwrite(boolean value)
      If the output file already exists, should it be overwritten? If it is not provided, write operation will fail if the file already exists.
    • stripeSize

      public OrcFile.WriterOptions stripeSize(long value)
      Set the stripe size for the file. The writer stores the contents of the stripe in memory until this memory limit is reached and the stripe is flushed to the HDFS file and the next stripe started.
    • blockSize

      public OrcFile.WriterOptions blockSize(long value)
      Set the file system block size for the file. For optimal performance, set the block size to be multiple factors of stripe size.
    • rowIndexStride

      public OrcFile.WriterOptions rowIndexStride(int value)
      Set the distance between entries in the row index. The minimum value is 1000 to prevent the index from overwhelming the data. If the stride is set to 0, no indexes will be included in the file.
    • buildIndex

      public OrcFile.WriterOptions buildIndex(boolean value)
      Sets whether build the index. The default value is true. If the value is set to false, rowIndexStrideValue will be set to zero.
    • bufferSize

      public OrcFile.WriterOptions bufferSize(int value)
      The size of the memory buffers used for compressing and storing the stripe in memory. NOTE: ORC writer may choose to use smaller buffer size based on stripe size and number of columns for efficient stripe writing and memory utilization. To enforce writer to use the requested buffer size use enforceBufferSize().
    • enforceBufferSize

      public OrcFile.WriterOptions enforceBufferSize()
      Enforce writer to use requested buffer size instead of estimating buffer size based on stripe size and number of columns. See bufferSize() method for more info. Default: false
    • blockPadding

      public OrcFile.WriterOptions blockPadding(boolean value)
      Sets whether the HDFS blocks are padded to prevent stripes from straddling blocks. Padding improves locality and thus the speed of reading, but costs space.
    • encodingStrategy

      public OrcFile.WriterOptions encodingStrategy(OrcFile.EncodingStrategy strategy)
      Sets the encoding strategy that is used to encode the data.
    • paddingTolerance

      public OrcFile.WriterOptions paddingTolerance(double value)
      Sets the tolerance for block padding as a percentage of stripe size.
    • bloomFilterColumns

      public OrcFile.WriterOptions bloomFilterColumns(String columns)
      Comma separated values of column names for which bloom filter is to be created.
    • bloomFilterFpp

      public OrcFile.WriterOptions bloomFilterFpp(double fpp)
      Specify the false positive probability for bloom filter.
      Parameters:
      fpp - - false positive probability
      Returns:
      this
    • compress

      public OrcFile.WriterOptions compress(CompressionKind value)
      Sets the generic compression that is used to compress the data.
    • setSchema

      public OrcFile.WriterOptions setSchema(TypeDescription schema)
      Set the schema for the file. This is a required parameter.
      Parameters:
      schema - the schema for the file.
      Returns:
      this
    • version

      public OrcFile.WriterOptions version(OrcFile.Version value)
      Sets the version of the file that will be written.
    • callback

      public OrcFile.WriterOptions callback(OrcFile.WriterCallback callback)
      Add a listener for when the stripe and file are about to be closed.
      Parameters:
      callback - the object to be called when the stripe is closed
      Returns:
      this
    • bloomFilterVersion

      @Deprecated public OrcFile.WriterOptions bloomFilterVersion(OrcFile.BloomFilterVersion version)
      Deprecated.
      Set the version of the bloom filters to write.
    • physicalWriter

      public OrcFile.WriterOptions physicalWriter(PhysicalWriter writer)
      Change the physical writer of the ORC file.

      SHOULD ONLY BE USED BY LLAP.

      Parameters:
      writer - the writer to control the layout and persistence
      Returns:
      this
    • memory

      public OrcFile.WriterOptions memory(MemoryManager value)
      A public option to set the memory manager.
    • writeVariableLengthBlocks

      public OrcFile.WriterOptions writeVariableLengthBlocks(boolean value)
      Should the ORC file writer use HDFS variable length blocks, if they are available?
      Parameters:
      value - the new value
      Returns:
      this
    • setShims

      public OrcFile.WriterOptions setShims(HadoopShims value)
      Set the HadoopShims to use. This is only for testing.
      Parameters:
      value - the new value
      Returns:
      this
    • writerVersion

      protected OrcFile.WriterOptions writerVersion(OrcFile.WriterVersion version)
      Manually set the writer version. This is an internal API.
      Parameters:
      version - the version to write
      Returns:
      this
    • useUTCTimestamp

      public OrcFile.WriterOptions useUTCTimestamp(boolean value)
      Manually set the time zone for the writer to utc. If not defined, system time zone is assumed.
    • directEncodingColumns

      public OrcFile.WriterOptions directEncodingColumns(String value)
      Set the comma-separated list of columns that should be direct encoded.
      Parameters:
      value - the value to set
      Returns:
      this
    • encrypt

      public OrcFile.WriterOptions encrypt(String value)
      Encrypt a set of columns with a key. Format of the string is a key-list.
      • key-list = key (';' key-list)?
      • key = key-name ':' field-list
      • field-list = field-name ( ',' field-list )?
      • field-name = number | field-part ('.' field-name)?
      • field-part = quoted string | simple name
      Parameters:
      value - a key-list of which columns to encrypt
      Returns:
      this
    • masks

      public OrcFile.WriterOptions masks(String value)
      Set the masks for the unencrypted data. Format of the string is a mask-list.
      • mask-list = mask (';' mask-list)?
      • mask = mask-name (',' parameter)* ':' field-list
      • field-list = field-name ( ',' field-list )?
      • field-name = number | field-part ('.' field-name)?
      • field-part = quoted string | simple name
      Parameters:
      value - a list of the masks and column names
      Returns:
      this
    • setKeyVersion

      public OrcFile.WriterOptions setKeyVersion(String keyName, int version, EncryptionAlgorithm algorithm)
      For users that need to override the current version of a key, this method allows them to define the version and algorithm for a given key. This will mostly be used for ORC file merging where the writer has to use the same version of the key that the original files used.
      Parameters:
      keyName - the key name
      version - the version of the key to use
      algorithm - the algorithm for the given key version
      Returns:
      this
    • setKeyProvider

      public OrcFile.WriterOptions setKeyProvider(KeyProvider provider)
      Set the key provider for column encryption.
      Parameters:
      provider - the object that holds the master secrets
      Returns:
      this
    • setProlepticGregorian

      public OrcFile.WriterOptions setProlepticGregorian(boolean newValue)
      Should the writer use the proleptic Gregorian calendar for times and dates.
      Parameters:
      newValue - true if we should use the proleptic calendar
      Returns:
      this
    • getKeyProvider

      public KeyProvider getKeyProvider()
    • getBlockPadding

      public boolean getBlockPadding()
    • getBlockSize

      public long getBlockSize()
    • getBloomFilterColumns

      public String getBloomFilterColumns()
    • getOverwrite

      public boolean getOverwrite()
    • getFileSystem

      public FileSystem getFileSystem()
    • getConfiguration

      public Configuration getConfiguration()
    • getSchema

      public TypeDescription getSchema()
    • getStripeSize

      public long getStripeSize()
    • getStripeRowCountValue

      public long getStripeRowCountValue()
    • getCompress

      public CompressionKind getCompress()
    • getCallback

      public OrcFile.WriterCallback getCallback()
    • getVersion

      public OrcFile.Version getVersion()
    • getMemoryManager

      public MemoryManager getMemoryManager()
    • getBufferSize

      public int getBufferSize()
    • isEnforceBufferSize

      public boolean isEnforceBufferSize()
    • getRowIndexStride

      public int getRowIndexStride()
    • isBuildIndex

      public boolean isBuildIndex()
    • getCompressionStrategy

      public OrcFile.CompressionStrategy getCompressionStrategy()
    • getEncodingStrategy

      public OrcFile.EncodingStrategy getEncodingStrategy()
    • getZstdCompressOptions

      public OrcFile.ZstdCompressOptions getZstdCompressOptions()
    • getPaddingTolerance

      public double getPaddingTolerance()
    • getBloomFilterFpp

      public double getBloomFilterFpp()
    • getBloomFilterVersion

      @Deprecated public OrcFile.BloomFilterVersion getBloomFilterVersion()
      Deprecated.
    • getPhysicalWriter

      public PhysicalWriter getPhysicalWriter()
    • getWriterVersion

      public OrcFile.WriterVersion getWriterVersion()
    • getWriteVariableLengthBlocks

      public boolean getWriteVariableLengthBlocks()
    • getHadoopShims

      public HadoopShims getHadoopShims()
    • getUseUTCTimestamp

      public boolean getUseUTCTimestamp()
    • getDirectEncodingColumns

      public String getDirectEncodingColumns()
    • getEncryption

      public String getEncryption()
    • getMasks

      public String getMasks()
    • getKeyOverrides

      public Map<String,HadoopShims.KeyMetadata> getKeyOverrides()
    • getProlepticGregorian

      public boolean getProlepticGregorian()