Package org.apache.orc
Class OrcFile.WriterOptions
java.lang.Object
org.apache.orc.OrcFile.WriterOptions
- All Implemented Interfaces:
Cloneable
- Enclosing class:
- OrcFile
Options for creating ORC file writers.
-
Constructor Summary
ModifierConstructorDescriptionprotected
WriterOptions
(Properties tableProperties, Configuration conf) -
Method Summary
Modifier and TypeMethodDescriptionblockPadding
(boolean value) Sets whether the HDFS blocks are padded to prevent stripes from straddling blocks.blockSize
(long value) Set the file system block size for the file.bloomFilterColumns
(String columns) Comma separated values of column names for which bloom filter is to be created.bloomFilterFpp
(double fpp) Specify the false positive probability for bloom filter.Deprecated.bufferSize
(int value) The size of the memory buffers used for compressing and storing the stripe in memory.buildIndex
(boolean value) Sets whether build the index.callback
(OrcFile.WriterCallback callback) Add a listener for when the stripe and file are about to be closed.clone()
compress
(CompressionKind value) Sets the generic compression that is used to compress the data.directEncodingColumns
(String value) Set the comma-separated list of columns that should be direct encoded.encodingStrategy
(OrcFile.EncodingStrategy strategy) Sets the encoding strategy that is used to encode the data.Encrypt a set of columns with a key.Enforce writer to use requested buffer size instead of estimating buffer size based on stripe size and number of columns.fileSystem
(FileSystem value) Provide the filesystem for the path, if the client has it available.boolean
long
double
Deprecated.int
getMasks()
boolean
double
boolean
int
long
long
boolean
boolean
boolean
boolean
Set the masks for the unencrypted data.memory
(MemoryManager value) A public option to set the memory manager.overwrite
(boolean value) If the output file already exists, should it be overwritten? If it is not provided, write operation will fail if the file already exists.paddingTolerance
(double value) Sets the tolerance for block padding as a percentage of stripe size.physicalWriter
(PhysicalWriter writer) Change the physical writer of the ORC file.rowIndexStride
(int value) Set the distance between entries in the row index.setKeyProvider
(KeyProvider provider) Set the key provider for column encryption.setKeyVersion
(String keyName, int version, EncryptionAlgorithm algorithm) For users that need to override the current version of a key, this method allows them to define the version and algorithm for a given key.setProlepticGregorian
(boolean newValue) Should the writer use the proleptic Gregorian calendar for times and dates.setSchema
(TypeDescription schema) Set the schema for the file.setShims
(HadoopShims value) Set the HadoopShims to use.stripeSize
(long value) Set the stripe size for the file.useUTCTimestamp
(boolean value) Manually set the time zone for the writer to utc.version
(OrcFile.Version value) Sets the version of the file that will be written.protected OrcFile.WriterOptions
writerVersion
(OrcFile.WriterVersion version) Manually set the writer version.writeVariableLengthBlocks
(boolean value) Should the ORC file writer use HDFS variable length blocks, if they are available?
-
Constructor Details
-
WriterOptions
-
-
Method Details
-
clone
-
fileSystem
Provide the filesystem for the path, if the client has it available. If it is not provided, it will be found from the path. -
overwrite
If the output file already exists, should it be overwritten? If it is not provided, write operation will fail if the file already exists. -
stripeSize
Set the stripe size for the file. The writer stores the contents of the stripe in memory until this memory limit is reached and the stripe is flushed to the HDFS file and the next stripe started. -
blockSize
Set the file system block size for the file. For optimal performance, set the block size to be multiple factors of stripe size. -
rowIndexStride
Set the distance between entries in the row index. The minimum value is 1000 to prevent the index from overwhelming the data. If the stride is set to 0, no indexes will be included in the file. -
buildIndex
Sets whether build the index. The default value is true. If the value is set to false, rowIndexStrideValue will be set to zero. -
bufferSize
The size of the memory buffers used for compressing and storing the stripe in memory. NOTE: ORC writer may choose to use smaller buffer size based on stripe size and number of columns for efficient stripe writing and memory utilization. To enforce writer to use the requested buffer size use enforceBufferSize(). -
enforceBufferSize
Enforce writer to use requested buffer size instead of estimating buffer size based on stripe size and number of columns. See bufferSize() method for more info. Default: false -
blockPadding
Sets whether the HDFS blocks are padded to prevent stripes from straddling blocks. Padding improves locality and thus the speed of reading, but costs space. -
encodingStrategy
Sets the encoding strategy that is used to encode the data. -
paddingTolerance
Sets the tolerance for block padding as a percentage of stripe size. -
bloomFilterColumns
Comma separated values of column names for which bloom filter is to be created. -
bloomFilterFpp
Specify the false positive probability for bloom filter.- Parameters:
fpp
- - false positive probability- Returns:
- this
-
compress
Sets the generic compression that is used to compress the data. -
setSchema
Set the schema for the file. This is a required parameter.- Parameters:
schema
- the schema for the file.- Returns:
- this
-
version
Sets the version of the file that will be written. -
callback
Add a listener for when the stripe and file are about to be closed.- Parameters:
callback
- the object to be called when the stripe is closed- Returns:
- this
-
bloomFilterVersion
Deprecated.Set the version of the bloom filters to write. -
physicalWriter
Change the physical writer of the ORC file.SHOULD ONLY BE USED BY LLAP.
- Parameters:
writer
- the writer to control the layout and persistence- Returns:
- this
-
memory
A public option to set the memory manager. -
writeVariableLengthBlocks
Should the ORC file writer use HDFS variable length blocks, if they are available?- Parameters:
value
- the new value- Returns:
- this
-
setShims
Set the HadoopShims to use. This is only for testing.- Parameters:
value
- the new value- Returns:
- this
-
writerVersion
Manually set the writer version. This is an internal API.- Parameters:
version
- the version to write- Returns:
- this
-
useUTCTimestamp
Manually set the time zone for the writer to utc. If not defined, system time zone is assumed. -
directEncodingColumns
Set the comma-separated list of columns that should be direct encoded.- Parameters:
value
- the value to set- Returns:
- this
-
encrypt
Encrypt a set of columns with a key. Format of the string is a key-list.- key-list = key (';' key-list)?
- key = key-name ':' field-list
- field-list = field-name ( ',' field-list )?
- field-name = number | field-part ('.' field-name)?
- field-part = quoted string | simple name
- Parameters:
value
- a key-list of which columns to encrypt- Returns:
- this
-
masks
Set the masks for the unencrypted data. Format of the string is a mask-list.- mask-list = mask (';' mask-list)?
- mask = mask-name (',' parameter)* ':' field-list
- field-list = field-name ( ',' field-list )?
- field-name = number | field-part ('.' field-name)?
- field-part = quoted string | simple name
- Parameters:
value
- a list of the masks and column names- Returns:
- this
-
setKeyVersion
public OrcFile.WriterOptions setKeyVersion(String keyName, int version, EncryptionAlgorithm algorithm) For users that need to override the current version of a key, this method allows them to define the version and algorithm for a given key. This will mostly be used for ORC file merging where the writer has to use the same version of the key that the original files used.- Parameters:
keyName
- the key nameversion
- the version of the key to usealgorithm
- the algorithm for the given key version- Returns:
- this
-
setKeyProvider
Set the key provider for column encryption.- Parameters:
provider
- the object that holds the master secrets- Returns:
- this
-
setProlepticGregorian
Should the writer use the proleptic Gregorian calendar for times and dates.- Parameters:
newValue
- true if we should use the proleptic calendar- Returns:
- this
-
getKeyProvider
-
getBlockPadding
public boolean getBlockPadding() -
getBlockSize
public long getBlockSize() -
getBloomFilterColumns
-
getOverwrite
public boolean getOverwrite() -
getFileSystem
-
getConfiguration
-
getSchema
-
getStripeSize
public long getStripeSize() -
getStripeRowCountValue
public long getStripeRowCountValue() -
getCompress
-
getCallback
-
getVersion
-
getMemoryManager
-
getBufferSize
public int getBufferSize() -
isEnforceBufferSize
public boolean isEnforceBufferSize() -
getRowIndexStride
public int getRowIndexStride() -
isBuildIndex
public boolean isBuildIndex() -
getCompressionStrategy
-
getEncodingStrategy
-
getZstdCompressOptions
-
getPaddingTolerance
public double getPaddingTolerance() -
getBloomFilterFpp
public double getBloomFilterFpp() -
getBloomFilterVersion
Deprecated. -
getPhysicalWriter
-
getWriterVersion
-
getWriteVariableLengthBlocks
public boolean getWriteVariableLengthBlocks() -
getHadoopShims
-
getUseUTCTimestamp
public boolean getUseUTCTimestamp() -
getDirectEncodingColumns
-
getEncryption
-
getMasks
-
getKeyOverrides
-
getProlepticGregorian
public boolean getProlepticGregorian()
-